We'll look at subclassing the built-in
list class, by creating a class,
Samples, which extends
list. You'll need to implement the following
methods in your new class.
__init__(*args)
The __init__ method saves a sequence
of samples. It could, at this time, also precompute a number of
useful values, like the sum, count, min and max of this set of
data. When no data is provided, these values would be set to
None.
__str__
The __str__ method should return
string with a summary of the data. An example is a string like
"%d values, min %g, max %g, mean %g" with the
number of data elements, the minimum, the maximum and the mean.
The superclass, list,
__repr__ function will return the raw
data.
mean
The mean method returns the sum
divided by the count.
mode
The mode returns the most popular of
the sample values. Below we've provided an algorithm that can be
used to locate the mode of a sequence of samples.
median
The median is the value in the middle of the sequence.
First, sort the sequence. If there is an odd number of elements,
pick the middle-most element. If there is an even number of
elements, average the two elements that are mid-most.
variance
The variance is a more complex
calculation than the simple mean. For each sample, compute the
difference between the sample and the mean, square this value,
and sum these squares. The number of samples minus 1 is the
degrees of freedom. The sum, divided by the degrees of freedom
is the variance. Note that you need two samples to meaningfully
compute a variance.
stdev
The stdev is the square root of the
variance.
Note that the list superclass already
works correctly with the built-in min and
max functions. In this case, this consequence of
using inheritance instead of wrapping turns out to be an
advantage.
Procedure 22.1. Computing the Mode
Initialize. Create an empty dictionary, freq, for
frequency distribution.
For Each Value. For each value, v, in the sequence of
sample values.
Unknown? If v is not a key in
freq, then add v to
the dictionary with a value of 0.
Increment Frequency. Increment the frequency count associated with
v in the dictionary.
Extract. Extract the dictionary items as a sequence of tuples (
value, frequency
).
Sort. Sort the sequence of tuples in ascending order by
frequency.
Result. The last value in the sorted sequence is a tuple with the
most common sample value and the frequency count.
Shuffling Method for the Deck
class
Shuffling is a matter of taking existing cards and putting them
into other positions. There are a many ways of doing this. We'll need
to try both to see which is faster. In essence, we need to create a
polymorphic family of classes that we can use to measure
performance.
Procedure 22.2. Shuffling Variation 1 - Exchange
For i in range 0 to the number of
cards
Generate a random number r in the
range 0 to the number of cards.
Use Multiple Assignement to swap cards at position
i and r.
Procedure 22.3. Shuffling Variation 2 - Build
Create an empty result sequence,
s.
While there are cards in the source
self.cards sequence.
Generate a random number r in the
range 0 to the number of cards.
Append card r to the result
sequence; delete object r from the source
self.cards sequence. The
pop method of a sequence can return a
selected element and delete it from a sequence
nicely.
Replace self.cards with the result
sequence, s.
Procedure 22.4. Shuffling Variation 3 - Sort
Create a function which returns a random value selected from
the sequence ( -1, 0, 1 ). This is similar to the
built-in cmp function.
Use the sort method of a
list with this random comparison-like
function.
The random module has a
shuffle method which can be used as
follows.
random.shuffle( self.cards )
Of these four algorithms, which is fastest? The best way to test
these is to create four separate subclasses of
Deck, each of which provides a different
implementation of the shuffle method. A main
program can then create an instance of each variation on
Deck and do several hundred shuffles.
We can create a timer using the time
module. The time.clock function will provide an
accurate time stamp. The difference between two calls to
time.clock is the elapsed time.
Because all of our variations on Deck are
polymorphic, our main program should look something like the
following.
d1= DeckExch()
d2= DeckBuild()
d3= DeckSortRandom()
d4= DeckShuffle()
for deck in ( d1, d2, d3, d4 ):
start= time.clock()
for i in range(100):
d.shuffle()
finish= time.clock()
Encapsulation
The previous exercise built several alternate solutions to a
problem. We are free to implement an algorithm with no change to the
interface of Deck. This is a important effect
of the principal of encapsulation: a class and the clients that use
that class are only coupled together by an interface defined by method
functions.
There are a variety of possible dependencies between a class and
its clients.
Interface Method Functions. A client can depend on method functions specifically
designated as an interface to a class. In Python, we can define
internal methods by prefixing their names with _.
Other names (without the leading _) define the
public interface.
All Method Functions. A client can depend on all method functions of a class.
This removes the complexity of hidden, internal methods.
Instance Variables. A client can depend on instance variables in addition to
method functions. This can remove the need to write method
functions that simply return the value of an instance
variable.
Global Variables. Both classes share global variables. The Python
global statement is one way to accomplish
this.
Implementation. A client can depend on the specific algorithm being
executed by a class. A client method can have expectations of
how a class is implemented.
What are the advantages and disadvantages of each kind of
dependency?
Class Responsibilities
Assigning responsibility to class can be challenging. A number
of reasons can be used to justify the functions and instance variables
that are combined in a single class.
Convenience. A class is defined to do things because — well — it's
convenient to write the program that way.
Similar Operations. A class is defined because it does all input, all output,
or all calculations.
Similar Time. A class is defined to handle all initialization, all
processing or all final cleanup.
Sequence. We identify some operations which are performed in a
simple sequence and bundle these into a single class.
Common Data. A class is defined because it has the operations which
isolate a data structure or algorithm.
What are the possible differences between theses? What are the
advantages and disadvantages of each?
Published under the terms of the Open Publication License