The fileinput
module interacts with
sys.argv
. The fileinput.input
function opens files based on all the values of
sys.argv[1:]
. It carefully skips sys.argv[0]
,
which is the name of the Python script file. For each file, it reads all
of the lines as text, allowing a program to read and process multiple
files, like many standard Unix utilities.
The typical use case is:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in
sys.argv[1:]
, with a default of
sys.stdin
if the list
is
empty. If a filename is -
it is also replaced by
sys.stdin
at that position in the list of files. To
specify an alternative list of filenames, pass it as the argument to
input
. A single file name is also allowed in
addition to a list of file names.
While processing input, several functions are available in the
fileinput
module:
-
fileinput.
filename
→ string
-
the filename of the line that has just been read.
-
fileinput.
lineno
→ int
-
the cumulative line number of the line that has just been
read.
-
fileinput.
filelineno
→ int
-
the line number in the current file.
-
fileinput.
isfirstline
→ int
-
true if the line just read is the first line of its
file.
-
fileinput.
isstdin
→ int true
-
if the line was read from
sys.stdin
.
-
fileinput.
nextfile
-
close the current file so that the next iteration will read
the first line from the next file (if any); lines not read from
the file will not count towards the cumulative line count; the
filename is not changed until after the first line of the next
file has been read.
-
fileinput.
close
-
closes the sequence.
All files are opened in text mode. If an I/O error occurs during
opening or reading a file, the IOError
exception is raised.
This makes it easy to write a Python version of the common Unix
utility,
grep
. The
grep
utility
searches a list
of files for a given
pattern.
Example 33.1. greppy.py
#!/usr/bin/env python
import sys, re, fileinput
pattern= re.compile( sys.argv[1] )
for line in fileinput.input(sys.argv[2:]):
if pattern.match( line ):
print fileinput.filename(), fileinput.filelineno(), line
This contains the essential features of the
grep
. For non-Unix users, the
grep
utility looks for the given regular expression in any number of files.
The name grep is an acronym of Global Regular Expression Print.
The re
module provides the pattern
matching, and the fileinput
module makes
searching an arbitrary list
of files simple. We
cover the re
module in more depth in Chapter 31, Complex Strings: the re
Module
.
The first command line argument (sys.argv[0]
) is the
name of the script, which this program ignores. This program uses the
second command-line argument as the pattern that defines the target of
the search. The remaining command-line arguments are given to
fileinput.input
so that all files will be examined.
The pattern regular expression is matched against each individual input
line. If match
returns None
, the
line did not match. If match
returns an object, the
program prints the current file name, the current line number of the
file and the actual input line that matched.
After we do a chmod +x greppy.py
, we can use this
program as follows. Note that we have to provide quotes to prevent the
shell from doing globbing on our pattern
string
.
$
greppy.py 'import.*random' *.py
demorandom.py 2 import random
dice.py 1 import random
functions.py 2 import random