The map
, filter
, and
reduce
built-in functions are handy functions for
processing sequences. These owe much to the world of functional
programming languages. The idea is to take a small function you write
and apply it to all the elements of a sequence. This saves you writing
an explicit loop. The implicit loop within each of these functions may
be faster than an explicit
for
or
while
loop.
Additionally, each of these is a pure function, returning a result value. This allows the results of the functions to be combined into complex expressions relatively easily.
It is common to have multiple-step processes on
list
s of values. For instance, filtering a large
set of data to locate useful samples, transforming those samples and
then computing a sum or average. Rather than write three explicit loops,
the result can be computed in a single expression.
Let's say we have two sequences and we have a multi-step process like this:
Instead of these lengthy explicit loops, we can take a functional approach that look like this:
f= reduce( f3, zip( map(f1,seq1), map(f2,seq2) )
)
Where f1
, f2
, and
f3
are functions that define the body of the each
of the above loops.
Definitions. These functions transform lists. The map
and filter
each apply some function to a sequence
to create a new sequence. The reduce
function
applies a function which will reduce the sequence to a single value.
The zip
function interleaves values from lists to
create a list of tuples.
The map
, zip
and
filter
functions have no internal state, they
simply apply the function to each individual value of the sequence. The
reduce
function, in contrast, maintains an internal
state which is seeded from an initial value, passed to the function
along with each value of the sequence and returned as the final
result.
Here are the formal definitions.
map
(
function
,
sequence
, [
sequence...
]
) → list
Create a new list
from the results of
applying the given
function
to the items of
the the given
sequence
. If more than one
sequence is given, the function is called with multiple arguments,
consisting of the corresponding item of each sequence. If any
sequence is too short, None
is used for missing
value. If the
function
is
None
, map
will create
tuples from corresponding items in each list, much like the
zip
function.
filter
(
function
,
sequence
) → list
Return a list
containing those items
of
sequence
for which function
(
item
) is true. If
function
is None
, return
a list
of items that are equivalent to
True
.
reduce
(
function
,
sequence
,
[
initial
]) → value
Apply a
function
of two arguments
cumulatively to the items of a sequence, from left to right, so as
to reduce the sequence to a single value. For example, reduce(
lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5).
If
initial
is present, it is placed before
the items of the sequence in the calculation, and serves as a
default when the sequence is empty.
zip
(
seq1
,
[
seq2,...
] ) → [
(
seq1
[0],
seq2
[0],...), ... ]
Return a list
of
tuple
s, where each
tuple
contains the matching element from
each of the argument sequences. The returned
list
is truncated in length to the length
of the shortest argument sequence.
Costs and Benefits. What are the advantages? First, the functional version can be clearer. It's a single line of code that summarizes the processing. Second, and more important, Python can execute the sequence processing functions far faster than the equivalent explicit loop.
You can see that map
and
filter
are equivalent to simple list
comprehensions. This gives you two ways to specify these operations,
both of which have approximately equivalent performance. This means that
map
and filter
aren't
essential
to Python, but hey are widely
used.
The reduce
function is a bit of a problem. It
can have remarkably bad performance if it is misused. Consequently,
there is some debate about the value of having this function. We'll
present the function along with the caveat that it can lead to
remarkable slowness.
The map
Function. The map
function transforms a sequence into
another sequence by applying a function to each item in the sequence.
The idea is to apply a mapping tranformation to a sequence. This is is
a common design pattern within numerous kinds of programs. Generally,
the transformation is the interesting part of the programming, and the
loop is just another boring old loop.
The function call
map
(
aFunction
,
aSequence
) behaves as if it had the
following definition.
def map( aFunction, aSequence ): return [ aFunction(v) for v in aSequence ]
For example:
>>>
map( int, [ "10", "12", "14", 3.1415926, 5L ] )
[10, 12, 14, 3, 5]
This applies the int
function to each element
of the input sequence (a list
that contains some
string
s, a floating point value and a long
integer value) to create the output sequence (a
list
of integers).
The function used in map
can be a built-in
function, or a user-defined function created with the
def
statement (see Chapter 9, Functions
).
>>>
def oddn(x):
...
return x*2+1
...
>>>
map( oddn, range(6) )
[1, 3, 5, 7, 9, 11]
This example defines a function oddn
, which
creates an odd number from the input. The map
function applies our oddn
function to each value of
the sequence created by range
(
6
). We get the first 6 odd numbers.
The filter
Function. The filter
function chooses elements from
the input sequence where a supplied function is
True
. Elements for which the supplied function is
False
are discarded.
The function call filter
(
aFunction
,
aSequence
) behaves as if it had
the following definition.
def filter( aFunction, aSequence ): return [ v for v in aSequence if aFunction(v) ]
For example:
>>>
def gt2( a ):
...
return a > 2
...
>>>
filter( gt2, range(8) )
[3, 4, 5, 6, 7]
This example uses a function, gt2
, which
returns True
for inputs greater than 2. We create a
range from 0 to 7, but only keep the values for which the filter
function returns True
.
Here's another example that keeps all numbers that are evenly divisible by 3.
>>>
def div3( a ):
...
return a % 3 == 0
...
>>>
filter( div3, range(10) )
[0, 3, 6, 9]
Our function, div3
, returns
True
when the remainder of dividing a number by 3 is
exactly 0. This will keep numbers that are evenly divisible by 3. We
create a range, apply the function to each value in the range, and keep
only the values where the filter is True
.
The reduce
Function. The reduce
function can be used to
implement the common spread-sheet functions that compute sums and
products. This function works by seeding a result with an initial
value. It then calls the user-supplied function with this result and
each value of the sequence. This is remarkably common, but it can't be
done as a list comprehension.
The function call reduce
(
aFunction
,
aSequence
, [,
init
]) behaves as if it had the
following definition.
def reduce( aFunction, aSequence, init= 0 ): r= init for s in aSequence: r= aFunction( r, s ) return r
The important thing to note is that the function is applied to the
internal value, r
, and each element of the list to
compute a new internal value.
For example:
>>>
def add(a,b):
...
return a+b
...
>>>
reduce( add, range(10) )
45
This expression computes the sum of the 10 numbers from zero
through nine. The function we defined, add
, adds
the previous result and the next sequence value together.
Here's an interesting example that combines
reduce
and map
. This uses two
functions defined in earlier examples, add
and
oddn
.
for i in range(10): sq=reduce( add, map(oddn, range(i)), 0 ) print i, sq
Let's look at the evaluation from innermost to outermost. The
range
(
i
) generates a
list
of numbers from 0 to i
-1.
The map
applies the oddn
function form to create a sequence of i
odd numbers
from 1 to 2i
+1. The reduce
then
adds this sequence of odd numbers. Interestingly, these sums add to the
square of i
.
The zip
Function. The zip
function interleaves values from
two sequences to create a new sequence. The new sequence is a sequence
of tuple
s. Each item of a
tuple
is the corresponding values from from
each sequence.
>>>
zip( range(5), range(1,20,2) )
[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9)]
In this example, we zipped two sequences together. The first
sequence was range(5)
, which has five values. The second
sequence was range(1,20,2) which has 10 odd numbers from 1 to 19. Since
zip
truncates to the shorter list, we get five
tuples, each of which has the matching values from both lists.
The map
function behaves a little like
zip
when there is no function provided, just
sequences. However, map
does not truncate, it fills
the shorter list with None
values.
>>>
map( None, range(5), range(1,20,2) )
[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (None, 11), (None, 13), (None, 15), (None, 17), (None, 19)]
Published under the terms of the Open Publication License | Design by Interspire |