-
File Structures. What is required to process variable length lines of data in
an arbitrary (random) order? How is the application program to
know where each line begins?
-
Device Structures. Some disk devices are organized into cylinders and tracks
instead of blocks. A disk may have a number of parallel platters;
a cylinder is the stack of tracks across the platters available
without moving the read-write head. A track is the data on one
circular section of a single disk platter. What advantages does
this have? What (if any) complexity could this lead to? How does
an application program specify the tracks and sectors to be
used?
Some disk devices are described as a simple sequence of
blocks, in no particular order. Each block has a unique numeric
identifier. What advantages could this have?
Some disk devices can be partitioned into a number of
"logical" devices. Each partition appears to be a separate device.
What (if any) relevance does this have to file processing?
-
Portfolio Position. We can create a simple CSV file that contains a description
of a block of stock. We'll call this the portfolio file. If we
have access to a spreadsheet, we can create a simple file with
four columns: stock, shares, purchase date and purchase price. We
can save this as a CSV file.
If we don't have access to a spreadsheet, we can create this
file in IDLE. Here's an example line.
stock,shares,"Purchase Date","Purchase Price"
"AAPL", 100, "10/1/95", 14.50
"GE", 100, "3/5/02", 38.56
We can read this file, multiply shares by purchase price, and
write a simple report showing our initial position in each
stock.
Note that each line will be a simple string. When we split
this string on the ,'s (using the string
split
method) we get a list of strings.
We'll still need to convert the number of shares and the purchase
price from strings to numbers in order to do the
multiplication.
-
Aggregated Portfolio Position. In Portfolio Position we
read a file and did a simple computation on each row to get the
purchase price. If we have multiple blocks of a given stock, these
will be reported as separate lines of detail. We'd like to combine
(or aggregate) any blocks of stock into an overall
position.
Programmers familiar with COBOL (or RPG) or similar languages
often use a Control-Break reporting design which
sorts the data into order by the keys, then reads the lines of data
looking for break in the keys. This design uses very little memory,
but is rather slow and complex.
It's far simpler to use a Python dictionary than it is to use
the Control-Break algorithm. Unless the number of distinct key
values is vast (on the order of hundreds of thousands of values)
most small computers will fit the entire summary in a simple
dictionary.
A program which produces summaries, then, would have the
following design pattern.
-
Create an empty dictionary.
-
Read the portfolio file. For each line in the file, do the
following.
-
Create a tuple from the key fields. If there's only
one key field, then this value can be the dictionary's
key.
-
If this key does not exist in the dictionary, insert
the necessary element, and provide a suitable initial value.
If you're computing one sum, a simple zero will do. If
you're computing multiple sums, a
tuple
of zeroes is
appropriate.
-
Locate the selected value from the dictionary,
accumulate new values into it. For the simplest case (one
key, one value being accumulated) this looks like
sum[key] =+ value
.
-
Write the dictionary keys and values as the final
report.
Some people like to see the aggregates sorted into order. This
is a matter of getting the dictionary keys into a list, sorting the
list, then iterating through this sorted list to write the final
report.
-
Portfolio Value. In the section called “Reading a File as a Sequence of Strings”, we looked at a
simple CSV-format file with stock symbols and prices. This file
has the stock symbol and last price, which serves as a daily quote
for this stock's price. We'll call this the stock-price
file.
We can now compute the aggregate value for our portfolio by
extracting prices from the stock price file and number of shares
from the portfolio file.
If you're familiar with SQL, this is called a join
operation; and most databases provide a number of
algorithms to match rows between two tables. If you're familiar with
COBOL, this is often done by creating a lookup
table, which is an in-memory array of values.
We'll create a dictionary from the stock-price file. We can
then read our portfolio, locate the price in our dictionary, and
write our final report of current value of the portfolio. This leads
to a program with the following design pattern.
-
Load the price mapping from the stock-price file.
-
Create an empty stock price dictionary.
-
Read the stock price file. For each line in the file,
populate the dictionary, using the stock name as the key,
and the most recent sale price is the value.
-
Process the position information from the portfolio file.
See Aggregated Portfolio Position and Portfolio Position for the skeleton of this
process.
In the case of a stock with no price, the program should
produce a "no price quote" line in the output report. It should
not produce a KeyError
exception.