Python - File Exercises

File Exercises
	Chapter 19. Files

File Exercises

File Structures. What is required to process variable length lines of data in an arbitrary (random) order? How is the application program to know where each line begins?
Device Structures. Some disk devices are organized into cylinders and tracks instead of blocks. A disk may have a number of parallel platters; a cylinder is the stack of tracks across the platters available without moving the read-write head. A track is the data on one circular section of a single disk platter. What advantages does this have? What (if any) complexity could this lead to? How does an application program specify the tracks and sectors to be used?

Some disk devices are described as a simple sequence of blocks, in no particular order. Each block has a unique numeric identifier. What advantages could this have?

Some disk devices can be partitioned into a number of "logical" devices. Each partition appears to be a separate device. What (if any) relevance does this have to file processing?
Portfolio Position. We can create a simple CSV file that contains a description of a block of stock. We'll call this the portfolio file. If we have access to a spreadsheet, we can create a simple file with four columns: stock, shares, purchase date and purchase price. We can save this as a CSV file.

If we don't have access to a spreadsheet, we can create this file in IDLE. Here's an example line.
```
stock,shares,"Purchase Date","Purchase Price"
"AAPL", 100, "10/1/95", 14.50
"GE", 100, "3/5/02", 38.56
```
We can read this file, multiply shares by purchase price, and write a simple report showing our initial position in each stock.

Note that each line will be a simple string. When we split this string on the ,'s (using the string split method) we get a list of strings. We'll still need to convert the number of shares and the purchase price from strings to numbers in order to do the multiplication.
Aggregated Portfolio Position. In Portfolio Position we read a file and did a simple computation on each row to get the purchase price. If we have multiple blocks of a given stock, these will be reported as separate lines of detail. We'd like to combine (or aggregate) any blocks of stock into an overall position.

Programmers familiar with COBOL (or RPG) or similar languages often use a Control-Break reporting design which sorts the data into order by the keys, then reads the lines of data looking for break in the keys. This design uses very little memory, but is rather slow and complex.

It's far simpler to use a Python dictionary than it is to use the Control-Break algorithm. Unless the number of distinct key values is vast (on the order of hundreds of thousands of values) most small computers will fit the entire summary in a simple dictionary.

A program which produces summaries, then, would have the following design pattern.
1. Create an empty dictionary.
2. Read the portfolio file. For each line in the file, do the following.
  1. Create a tuple from the key fields. If there's only one key field, then this value can be the dictionary's key.
  2. If this key does not exist in the dictionary, insert the necessary element, and provide a suitable initial value. If you're computing one sum, a simple zero will do. If you're computing multiple sums, a tuple of zeroes is appropriate.
  3. Locate the selected value from the dictionary, accumulate new values into it. For the simplest case (one key, one value being accumulated) this looks like sum[key] =+ value.
3. Write the dictionary keys and values as the final report.
Some people like to see the aggregates sorted into order. This is a matter of getting the dictionary keys into a list, sorting the list, then iterating through this sorted list to write the final report.
Portfolio Value. In the section called “Reading a File as a Sequence of Strings”, we looked at a simple CSV-format file with stock symbols and prices. This file has the stock symbol and last price, which serves as a daily quote for this stock's price. We'll call this the stock-price file.

We can now compute the aggregate value for our portfolio by extracting prices from the stock price file and number of shares from the portfolio file.

If you're familiar with SQL, this is called a join operation; and most databases provide a number of algorithms to match rows between two tables. If you're familiar with COBOL, this is often done by creating a lookup table, which is an in-memory array of values.

We'll create a dictionary from the stock-price file. We can then read our portfolio, locate the price in our dictionary, and write our final report of current value of the portfolio. This leads to a program with the following design pattern.
1. Load the price mapping from the stock-price file.
  1. Create an empty stock price dictionary.
  2. Read the stock price file. For each line in the file, populate the dictionary, using the stock name as the key, and the most recent sale price is the value.
2. Process the position information from the portfolio file. See Aggregated Portfolio Position and Portfolio Position for the skeleton of this process.
  
  In the case of a stock with no price, the program should produce a "no price quote" line in the output report. It should not produce a KeyError exception.


Several Examples		Chapter 20. Advanced Sequences