Python - Several Examples

Reading a Text File

The following program will examine a standard unix password file. We'll use the explicit readline method to show the processing in detail. We'll use the split method of the input string as an example of parsing a line of input.

Example 19.1. readpswd.py

pswd = file( "/etc/passwd", "r" )
for aLine in pswd
    fields= aLine.split( ":" )
    print fields[0], fields[1]
pswd.close()

	This program creates a `file` object, `pswd`, that represents the `/etc/passwd` file, opened for reading.
	A `file` is a sequence of lines. We can use a `file` in the for statement, and the `file` object will return each individual line in response to the `next` method.
	The input `string` is split into individual fields using `":"` boundaries. Two particular fields are printed. Field 0 is the username and field 1 is the password.
	Closing the file releases any resources used by the file processing.

For non-unix users, a password file looks like the following:

root:q.mJzTnu8icF.:0:10:God:/:/bin/csh
fred:6k/7KCFRPNVXg:508:10:% Fredericks:/usr2/fred:/bin/csh

Reading a File as a Sequence of Strings

This program shows us that a file is a sequence of individual lines. Because it is an iterable object, the for statement will provide the individual lines.

This file will have a CSV (Comma-Separated Values) file format that we will parse. The csv module does a far better job than this little program. We'll look at that module in the section called “Comma-Separated Values: The csv Module”.

A popular stock quoting service on the Internet will provide CSV files with current stock quotes. The files have comma-separated values in the following format:

stock, lastPrice, date, time, change, openPrice, daysHi, daysLo, volume

The stock, date and time are typically quoted strings. The other fields are numbers, typically in dollars or percents with two digits of precision. We can use the Python eval function on each column to gracefully evaluate each value, which will eliminate the quotes, and transform a string of digits into a floating-point price value. We'll look at dates in Chapter 32, Dates and Times: the time and datetime Modules .

This is an example of the file:

"^DJI",10623.64,"6/15/2001","4:09PM",-66.49,10680.81,10716.30,10566.55,N/A
"AAPL",20.44,"6/15/2001","4:01PM",+0.56,20.10,20.75,19.35,8122800
"CAPBX",10.81,"6/15/2001","5:57PM",+0.01,N/A,N/A,N/A,N/A

The first line shows a quote for an index: the Dow-Jones Industrial average. The trading volume doesn't apply to an index, so it is "N/A". The second line shows a regular stock (Apple Computer) that traded 8,122,800 shares on June 15, 2001. The third line shows a mutual fund. The detailed opening price, day's high, day's low and volume are not reported for mutual funds.

After looking at the results on line, we clicked on the link to save the results as a CSV file. We called it quotes.csv. The following program will open and read the quotes.csv file after we download it from this service.

Example 19.2. readquotes.py

qFile= file( "quotes.csv", "r" )
for q in qFile:
    try:
        stock, price, date, time, change, opPrc, dHi, dLo, vol\
        = q.strip().split( "," )
        print eval(stock), float(price), date, time, change, vol
    except ValueError:
        pass
qFile.close()

	We open our quotes file, `quotes.csv`, for reading, creating an object named `qFile`.
	We use a for statement to iterate through the sequence of lines in the file.
	The quotes file typically has an empty line at the end, which splits into zero fields, so we surround this with a try statement. The empty line will raise a `ValueError` exception, which is caught in the except clause and ignored.
	Each stock quote, `q`, is a `string`. By using the `strip` operation of the `string`, we create a new string with excess whitespace characters removed. The `string` which is created then performs the `split`( `','` ) operation to separate the fields into a `list`. We use multiple assignment to assign each field to a relevant variable. Note that we strip this file into nine fields, leading to a long statement. We put a `\` to break the statement into two lines.
	The name of the stock is a string which includes quotes. In order to gracefully remove the quotes, we use the `eval` function. The price is a string. We use the `float` function to convert this string to a proper numeric value for further processing.

Read, Sort and Write

For COBOL expatriates, here's an example that shows a short way to read a file into an in-memory sequence, sort that sequence and print the results. This is a very common COBOL design pattern, and it tends to be rather long and complex in COBOL.

This example looks forward to some slightly more advanced techniques like list sorting. We'll delve into sorting in Chapter 20, Advanced Sequences .

Example 19.3. sortquotes.py

data= []
qFile= file( "quotes.csv", "r" )
for q in qFile:
    fields= tuple( q.strip().split( "," ) )
    if len(fields) == 9: data.append( fields )
qFile.close()
def priceVolume(a,b):
    return cmp(a[1],b[1]) or cmp(a[8],b[8])
data.sort( priceVolume )
for stock, price, date, time, change, opPrc, dHi,  dLo, vol in data:
    print stock, price, date, time, change, volume

	We create an empty sequence, `data`, to which we will append `tuple`s created from splitting each line into fields.
	We create file object that will read all the lines of our CSV-format file.
	This for loop will set `q` to each line in the file.
	The variable `field` is created by stripping whitespace from the line, `q`, breaking it up on the `","` boundaries into separate fields, and making the resulting sequence of field values into a `tuple`. If the line has the expected nine fields, the `tuple` of fields is appended to the `data` sequence. Lines with the wrong number of fields are typically the blank lines at the beginning or end of the file.
	To prepare for the sort, we define a comparison function. This will compare fields 1 and 8, price and volume. This relies on the behavior of the or operator: if the comparison of field 1 is equal, the value of `cmp` will be 0, which is equivalent to `False`; so field 8 must be compared.
	We can then sort the `data` sequence. The sort function will use our `priceVolume` function to compare records. This kind of sort is covered in depth in the section called “Advanced List Sorting”.
	Once the sequence of data elements is sorted, we can then print a report showing our stocks ranked by price, and for stocks of the same price, ranked by volume. We could expand on this by using the `%` operator to provide a nicer-looking report format.

Reading "Records"

In languages like C or COBOL a "record" or "struct" that describe the contents of a file. The advantage of a record is that the fields have names instead of numeric positions. In Python, we can acheive the same level of clarity using a dict for each line in the file.

For this, we'll download files from a web-based portfolio manager. This portfolio manager gives us stock information in a file called display.csv. Here is an example.

+/-,Ticker,Price,Price Change,Current Value,Links,# Shares,P/E,Purchase Price,
-0.0400,CAT,54.15,-0.04,2707.50,CAT,50,19,43.50,
-0.4700,DD,45.76,-0.47,2288.00,DD,50,23,42.80,
0.3000,EK,46.74,0.30,2337.00,EK,50,11,42.10,
-0.8600,GM,59.35,-0.86,2967.50,GM,50,16,53.90,

This file contains a header line that names the data columns, making processing considerably more reliable. We can use the column titles to create a dict for each line of data. By using each data line along with the column titles, we can make our program quite a bit more flexible. This shows a way of handling this kind of well-structured information.

Example 19.4. readportfolio.py

quotes=open( "display.csv", "rU" )
titles= quotes.next().strip().split( ',' )
invest= 0
current= 0
for q in quotes:
    values= q.strip().split( ',' )
    data= dict( zip(titles,values) )
    print data
    invest += float(data["Purchase Price"])*float(data["# Shares"])
    current += float(data["Price"])*float(data["# Shares"])
print invest, current, (current-invest)/invest

	We open our portfolio file, `display.csv`, for reading, creating a file object named `quotes`.
	The first line of input, `quotes.` `next`, is the set of column titles. We strip any extraneous whitespace characters from this line, creating a new `string`. We perform a `split`( `','` ) to create a `list` of individual column title `string`s. This `list` is saved in the variable `titles`.
	We also initialize two counters, `invest` and `current` to zero. These will accumulate our initial investment and the current value of this portfolio.
	We use a for statement to iterate through the remaining lines in `quotes` file. Each line is assigned to `q`.
	Each stock quote, `q`, is a `string`. We use the `strip` operation to remove excess whitespace characters; the `string` which is created then performs the `split`( `','` ) operation to separate the fields into a `list`. We assign this `list` to the variable `values`.
	We create a `dict`, `data`; the column titles in the `titles` `list` are the keys. The data fields from the current record, in `values` are used to fill this `dict`. The built-in `zip` function is designed for precisely this situation. This function interleaves values from each `list` to create a new `list` of `tuple`s. In this case, we will get a sequence of `tuple`s, each `tuple` will be a value from `titles` and the corresponding value from `values`. This `list` of 2-`tuple`s creates the `dict`. Now, we have access to each piece of data using it's proper column tile. The number of shares is in the column titled `"# Shares"`. We can find this information in `data["# Shares"]`.
	We perform some simple calculations on each `dict`. In this case, we convert the purchase price to a number, convert the number of shares to a number and multiply to determine how much we spent on this stock. We accumulate the sum of these products into `invest`. We also convert the current price to a number and multiply this by the number of shares to get the current value of this stock. We accumulate the sum of these products into `current`.
	When the loop has terminated, we can write out the two numbers, and compute the percent change.