In this chapter, we'll look at the semantics of generators, their
close relationshp an iterable container, and the
for
statement. We'll look at some additional functions that we can use to
create and access data structures that support elegant iteration.
The easiest way to define an iterator (and the closely-related
concept of generator) is to look at the
for
statement.
Let's look at the following snippet of code.
for i in ( 1, 2, 3, 4, 5 ):
print i
Under the hood, the
for
statement engages in
the following sequence of interactions with an iterable object like the
sequence in the code snippet above.
-
The
for
statement requests an iterator from
the object; in this case the object is a
tuple
. The
for
statement
does this by evaluating the iter
function on
the given expression. The working definition of iterable is that the
object responds to the iter
function.
-
The
for
statement evaluates the the
iterator's next
method and assigns the
value to the target variable; in this case,
i
.
-
The
for
statement evaluates the suite of
statements; in this case, the suite is just the
print
statement.
-
The
for
statement continues steps 2 and 3
until an exception is raised. If the exception is a
StopIteration
, this is handled to
indicate that the loop has finished normally.
The other side of this relationship is the iterator, which must
define a next
method; this method either
returns the next item from a sequence (or other container) or it raises
the StopIteration
exception. Also, an iterator
must maintain some kind of internal state to know which item in the
sequence will be delivered next.
When we describe a container as iterable, we mean that it responds
to the iter
function by returning an iterator
object that can be used by the
for
statement. All of
the sequence containers return iterators; set
s
and files also return iterators. In the case of a
dict
, the iterator returns the
dict
's keys in no particular order.
Defining An Iterator. Generally, we don't directly create iterators, this can be
complex. Most often, we define a generator. A
generator is a function that can be used by the
for
statement as if it were an iterator. A generator looks like a
conventional function, with one important difference: a generator
includes the
yield
statement.
The essential relationship between a generator and the
for
statement is the same as between an iterator and
the
for
statement.
-
The
for
statement calls the generator. The
generator begins execution and executes statements up to the first
yield
statement.
-
The
for
statement assigns the value that
was returned by the
yield
to the target
variable.
-
The
for
statement evaluates the suite of
statements.
-
The
for
statement continues steps 2 and 3
until the generator executes a
return
statement.
In a generator, the
return
statement secretly
raises the StopIteration
exception.
When a StopIteration
is raised, it is
handled by the
for
statement.
What we Provide. Generator definition is similar to function definition (see
Chapter 9, Functions
); we provide three pieces of information:
the name of the generator, a list of zero or more parameters, and a
suite of statements that yields the output values.
We use a generator in a
for
statement by
following the function's name with ()
's. The Python
interpreter evaluates the argument values in the ()
's, then
applies the generator. This will execute the generator's suite up to the
first
yield
statement, which yields the first value
from the generator. When the for statement requests the next value,
Python will resume execution at the statement after the
yield
statement; the generator will work until it
yields another value to the
for
statement.
This back-and-forth between the
for
statement
and the generator means that the generator's local variables are all
preserved by the
yield
statement. A generator has a
peer relationship with the
for
statement; it's local
variables are kept when it yields, and disposed of when it returns. This
is distinct from ordinary functions, which have a context that is nested
within the context that evaluated the function; an ordinary function's
local variables are disposed of when it returns.
Example: Using a Generator to Consolidate Information. Lexical scanning and parsing are both tasks that compilers do to
discover the higher-level constructs that are present in streams of
lower-level elements. A lexical scanner discovers punctuation, literal
values, variables, keywords, comments, and the like in a file of
characters. A parser discovers expressions and statements in a
sequence of lexical elements.
Lexical scanning and parsing algorithms
consolidate
a number of characters into tokens or a
number of tokens into a statement. A characteristic of these algorithms
is that some state change is required to consolidate the inputs prior to
creating each output. A generator provides these characteristics by
preserving the generator's state each time an output is yielded.
In both lexical scanning and parsing, the generator function will
be looping through a sequence of input values, discovering a high-level
element, and then yielding that element. The
yield
statement returns the sequence of results from a generator function, and
also saves all the local variables, and even the location of the
yield
statement so that the generator's next request
will resume processing right after the
yield
.