Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Advanced List Sorting

Consider a list of tuples. We could get such a list when processing information that was extracted from a spreadsheet program. For example, if we had a spreadsheet with raw census data, we can easily transform it into a sequence of tuples that look like the following.

jobData= [ 
(001,'Albany','NY',162692),
(003,'Allegany','NY',11986),
...
(121,'Wyoming','NY',8722),
(123,'Yates','NY',5094)
]

Each tuple can be built from a row of the spreadsheet. In this case, we wrote a simple forumla in our spreadsheet to make each row into a tuple. We could have used the csv module to read a version of spreadsheet saved as a .csv file, but cutting and pasting is pretty quick, also.

Once we have each row as a tuple, we can put some []'s around the tuples to make a list. We can then slap an assignment statement around this list of rows and turn our spreadsheet into a Python statement.

Sorting this list can be done trivially with the list sort method.

jobData.sort()

Note that this updates the list in place. The sort method specifically does not return a result. A common mistake is to say something like: a= b.sort(). This always sets the variable a to None.

This kind of sort will simply compare each tuple with each other tuple. While easy to use in this form, it doesn't permit more sophisticated sorting. Let's say we wanted to sort by state name, the third element in the tuple. We have two strategies for sorting when we don't want the simplistic comparison of elements in order.

  1. We can provide a compare function to the sort method of a list. This compare function must have the same signature as the built-in cmp function, but does the comparison we want.

  2. We can provide a "key extraction" function to the sort method. This will locate the key value (or a tuple of key values) within the given objects.

  3. We can decorate each element in the list, making it into a new kind of 2-tuple with the fields on which we want to sort as the first element of this tuple and the original data as the second element of the tuple.

Sorting With a Compare Function. The sort method of a list can accept a comparison function. While this is a very general solution, it is also relatively low performance because of the overheads involved.

We must define a function that behaves like the built-in cmp function. In our example, we'll define a comparison which works with the third element of our jobData tuple.

def sort3( a, b ):
    return cmp( a[2], b[2] )
jobData.sort( sort3 )

Note that we pass the function object to the sort method. A common mistake is to say jobData.sort( sort3() ). If we include the ()'s, we call the function sort3 once, invoking the eval-apply process. We don't want to call the function once: we want to provide the function to sort, so that sort can call the function as many times as needed to sort the list.

Another common process is to sort information by several key fields. Continuing this example, lets sort the list by state name and then number of jobs. This is sometimes called a multiple-key sort. We want our data in order by state. When the states are equal, we want to use the area code to sort the data.

This can be done in two ways. The most common technique is to create a tuple of the various key fields. The other way is to make use of the or operator.

Since tuples are compared element-by-element, we can create a tuple of the various key fields and turn the built-in cmp function loose on this tuple. This makes good use of Python's built-in features.

The formal definition for or is given in the section called “Truth and Logic”. If the first argument is not true, the second argument must be evaluated. The cmp function returns 0 when elements are equal; we can use this to do a series of comparisons. Then the first fields are equal, we can then compare the next field.

def cmpStJob1( a, b ):
    aKey= ( a[2], a[3] )
    bKey= ( b[2], b[3] )
    return cmp( aKey, bKey )

def cmpStJob2( a, b ):
    return cmp( a[2], b[2] ) or cmp( a[3], b[3] )

Sorting With Key Extraction. The sort method of a list can accept a keyword parameter, key , that provides a key extraction function. This function returns a value which can be used for comparison purposes. To sort our jobData by the third field, we can use a function like the following.

def byState( a ):
    return a[2]

jobData.sort( key=byState )

This byState function returns the selected key value, which is then used by sort to order the tuples in the original list. If we want to sort by a multi-part key, we cna do something like the following.

def byStateJobs( a ):
    return ( a[2], a[3] )

This function will create a two-value tuple and use these two values for ordering the items in the list.

Sorting With List Decoration. Superficially, this method appears more complex. However it is remarkably flexible, allowing you to combine sort, map and filter operations into a single statement. The idea is to transform the initial list of values into a new list of 2-tuples, with the first item being the key and the second item being the original tuple. The first item, used for sorting, is a decoration placed in front of the original value.

In this example, we decorate our values with a 2-tuple of state names and number of jobs. We can sort this temporary list of 2-tuples. Then we can strip off the decoration and recover the original values.

deco= [ ((a[2],a[3]),a) for a in jobData ]
deco.sort()
sorted= [ v for k,v in deco ]

When constructing the keys we can do map- or filter-like operations. SImilarly, we can also to additional calculations when we strip off the decorations.


 
 
  Published under the terms of the Open Publication License Design by Interspire