Consider a list
of
tuple
s. We could get such a
list
when processing information that was
extracted from a spreadsheet program. For example, if we had a
spreadsheet with raw census data, we can easily transform it into a
sequence of tuple
s that look like the
following.
jobData= [
(001,'Albany','NY',162692),
(003,'Allegany','NY',11986),
...
(121,'Wyoming','NY',8722),
(123,'Yates','NY',5094)
]
Each tuple
can be built from a row of the
spreadsheet. In this case, we wrote a simple forumla in our spreadsheet
to make each row into a tuple
. We could have used
the csv
module to read a version of spreadsheet
saved as a .csv
file, but cutting and pasting is
pretty quick, also.
Once we have each row as a tuple
, we can
put some []
's around the tuple
s to
make a list
. We can then slap an assignment
statement around this list
of rows and turn our
spreadsheet into a Python statement.
Sorting this list
can be done trivially
with the list
sort
method.
jobData.sort()
Note that this updates the list
in place.
The sort
method specifically does not return a
result. A common mistake is to say something like: a=
b.sort()
. This always sets the variable a
to
None
.
This kind of sort will simply compare each
tuple
with each other
tuple
. While easy to use in this form, it doesn't
permit more sophisticated sorting. Let's say we wanted to sort by state
name, the third element in the tuple
. We have two
strategies for sorting when we don't want the simplistic comparison of
elements in order.
-
We can provide a compare function to the
sort
method of a list. This compare
function must have the same signature as the built-in
cmp
function, but does the comparison we
want.
-
We can provide a "key extraction" function to the
sort
method. This will locate the key value
(or a tuple
of key values) within the given
objects.
-
We can decorate each element in the list, making it into a new
kind of 2-tuple with the fields on which we want to sort as the
first element of this tuple
and the original
data as the second element of the
tuple
.
Sorting With a Compare Function. The sort
method of a list can accept a
comparison function. While this is a very general solution, it is also
relatively low performance because of the overheads involved.
We must define a function that behaves like the built-in
cmp
function. In our example, we'll define a
comparison which works with the third element of our
jobData
tuple
.
def sort3( a, b ):
return cmp( a[2], b[2] )
jobData.sort( sort3 )
Note that we pass the function object to the
sort
method. A common mistake is to say
jobData.sort( sort3() )
. If we include the ()'s, we call
the function sort3
once, invoking the eval-apply
process. We don't want to call the function once: we want to provide the
function to sort
, so that
sort
can call the function as many times as
needed to sort the list
.
Another common process is to sort information by several key
fields. Continuing this example, lets sort the
list
by state name and then number of jobs. This
is sometimes called a multiple-key sort. We want our data in order by
state. When the states are equal, we want to use the area code to sort
the data.
This can be done in two ways. The most common technique is to
create a tuple of the various key fields. The other way is to make use
of the
or
operator.
Since tuples are compared element-by-element, we can create a
tuple
of the various key fields and turn the
built-in cmp
function loose on this
tuple
. This makes good use of Python's built-in
features.
The formal definition for
or
is given in the section called “Truth and Logic”. If the first argument is not true, the
second argument must be evaluated. The cmp
function
returns 0 when elements are equal; we can use this to do a series of
comparisons. Then the first fields are equal, we can then compare the
next field.
def cmpStJob1( a, b ):
aKey= ( a[2], a[3] )
bKey= ( b[2], b[3] )
return cmp( aKey, bKey )
def cmpStJob2( a, b ):
return cmp( a[2], b[2] ) or cmp( a[3], b[3] )
Sorting With Key Extraction. The sort
method of a list can accept a
keyword parameter,
key
, that provides a key
extraction function. This function returns a value which can be used
for comparison purposes. To sort our jobData by the third field, we
can use a function like the following.
def byState( a ):
return a[2]
jobData.sort( key=byState )
This byState function returns the selected key value, which is
then used by sort to order the tuples in the original list. If we want
to sort by a multi-part key, we cna do something like the
following.
def byStateJobs( a ):
return ( a[2], a[3] )
This function will create a two-value tuple
and use these two values for ordering the items in the
list
.
Sorting With List Decoration. Superficially, this method appears more complex. However it is
remarkably flexible, allowing you to combine
sort
, map
and
filter
operations into a single statement. The
idea is to transform the initial list
of values
into a new list
of 2-tuples, with the first
item being the key and the second item being the original
tuple
. The first item, used for sorting, is a
decoration placed in front of the original value.
In this example, we decorate our values with a 2-tuple of state
names and number of jobs. We can sort this temporary list of 2-tuples.
Then we can strip off the decoration and recover the original
values.
deco= [ ((a[2],a[3]),a) for a in jobData ]
deco.sort()
sorted= [ v for k,v in deco ]
When constructing the keys we can do map
- or
filter
-like operations. SImilarly, we can also to
additional calculations when we strip off the decorations.