Command-Line Programs: Servers and Batch Processing
Many programs have minimal or no user interaction at all. They are
run from a command-line prompt, perform their function, and exit
gracefully. They may produce a log; they may return a status code to the
operating system to indicate success for failure.
Almost all of the core Linux utilities
(cp, rm,
mv, ln,
ls, df,
du, etc.) are programs that decode
command-line parameters, perform their processing function and return a
status code. Except for a few explicitly interactive programs like
editors (ex, vi,
emacs, etc.), almost all of the core elements
of Linux are filter-like programs.
There are two critical features that make a command-line program
well-behaved. First, the program should accept the arguments in a
standard manner. Second the program should generally limit output to the
standard output and standard error files created by the environment.
When any other files are written it must be by user request and possibly
require interactive confirmation.
Command Line Options and Operands. The standard handling of command-line arguments is given as 13
rules for UNIX commands, as shown in the intro
section of UNIX man pages. These rules describe the program names
(rules 1-2), simple options (rules 3-5), options that take argument
values (rules 6-8) and operands (rules 9 and 10) for the
program.
-
The program name should be between two and nine characters.
This is consistent with most file systems where the program name is
a file name. In the Python environment, the program file must have
extension of .py
.
-
The program name should include only lower-case letters and
digits. The objective is to keep names relatively simple and easy to
type correctly. Mixed-case names and names with punctuation marks
can introduce difficulties in typing the program name correctly. To
be used as a module or package in Python, the program file name
must
be just letters, digits and
_
's.
-
Option names should be one character long. This is difficult
to achieve in complex programs. Often, options have two forms: a
single-character short form and a multi-character long form.
-
Single-character options are preceded by -
.
Multiple-character options are preceeded by --
. All
options have a flag that indicates that this is an option, not an
operand. Single character options, again, are easier to type, but
may be hard to remember for new users of a program.
-
Options with no arguments may be grouped after a single
-
. This allows a series of one-character options to be
given in a simple cluster, for example ls -ldai bin
clusters the -l
, -d
, -a
and
-i
options.
-
Options that accept an argument value use a space separator.
The option arguments are not run together with the option. Without
this rule, it might be difficult to tell a option cluster from an
option with arguments. Without this rule cut -ds
could
be an argument value of s
for the -d
option, or it could be clustered single-character options
-d
and -s
.
-
Option-arguments cannot be optional. If an option requires an
argument value, presence of the option means that an argument value
will follow. If the presence of an option is somehow different from
supplying a value for the option, two separate options must be used
to specify these various conditions.
-
Groups of option-arguments following an option must be a
single word; either separated by commas or quoted. For example:
-d "9,10,56"
. A space would mean another option or the
beginning of the operands.
-
All options must precede any operands on the command line.
This basic principle assures a simple, easy to understand uniformity
to command processing.
-
The string --
may be used to indicate the end of
the options. This is particularly important when any of the operands
begin with -
and might be mistaken for an
option.
-
The order of the options relative to one another should not
matter. Generally, a program should absorb all of the options to set
up the processing.
-
The relative order of the operands may be significant. This
depends on what the operands mean and what the program does.
-
The operand -
preceded and followed by a space
character should only be used to mean standard input. This may be
passed as an operand, to indicate that the standard input file is
processed at this time. For example, cat file1 - file2
will process file1, standard input and file2.
These rules are handled by the getopt
(or
optparse
) module and the
sys.argv
variable in the sys
module.
Output Control. A well-behaved program does not overwrite data without an
explicit demand from a user. Programs with a assumed, default or
implicit output file are a pronblem waiting to happen. A well-behaved
program should work as follows.
-
A well-designed program has an obvious responsibility that is
usually tied to creating one specific output. This can be a report,
or a file of some kind. In a few cases we may find it necessary to
optimize processing so that a number of unrelated outputs are
produced by a single program.
-
The best policy for this output is to write the resulting file
to standard output (sys.stdout
, which is the
destination for the
print
statement.) Any
logging, status or error reporting is sent to
sys.stderr
. If this is done, then simple shell
redirection operators can be used to collect this output in an
obvious way.
python someProgram.py >this_file_gets_written
-
In some cases, there are actually two outputs: details and a
useful summary. In this case, the summary should go to standard
output, and an option specifies the destination of the
details.
python aProgram.py -o details.dat >summary.txt
Program Startup and the Operating System Interface. The essential operating system interface to our programs is
relatively simple. The operating system will start the Python program,
providing it with the three standard files (stdin, stdout, stderr; see
the section called “File Semantics” for more information), and the
command line arguments. In response, Python provides a status code
back to the operating system. Generally a status code of 0 means
things worked perfectly. Status codes which are non-zero indicate some
kind of problem or failure.
When we run something like
python casinosim.py -g craps
The operating system command processor (the Linux
shell or Windows
cmd.exe) breaks this line into a command
(
python
) and a sequence of argument values. The shell
finds the relevant executable file by searching it's
PATH
, and then starts the program, providing the rest of
the command line as argument values to that program.
A Python program will see that the command line arguments are
assigned to sys.argv
as ["casinosim.py",
"-g", "craps"]
. argv[0]
is the name of the
main module, the script Python is currently running.
When the script in casinosym.py
finishes
running, the Python interpreter also finishes, and returns a status code
of 0 to the operating system.
To return a non-zero status code, use the
sys.exit
function.
Reuse and The Main-Import Switch. In the section called “Module Use: The
import
Statement” we talked about the
Main-Import switch. The global __name__
variable is
essential for determing the context in which a module is used.
A well-written application module often includes numerous useful
class and function definitions. When combining modules to create
application programs, it may be desirable to take a module that had been
originally designed as a stand-alone program and combine it with others
to make a larger and more sophisticated program. In some cases, a module
may be both a main program for some use cases and a library module for
other use cases.
The __name__
variable defines the context in
which a module is being used. During evaluation of a file, when
__name__ == "__main__"
, this module is the
main
module, started by the Python interpreter.
Otherwise, __name__
will be the name of the file
being imported. If __name__
is not the string
"__main__"
, this module is being imported, and should
take no action of any kind.
This test is done with the as follows:
if __name__ == "__main__":
main()
This kind of reuse assures that programming is not duplicated. It
is notoriously difficult to maintain two separate files that are
supposed to contain the same program text. This kind of "cut and paste
reuse" is a terrible burden on programmers. Python encourages reuse
through both classes and modules. All modules can be cofigured as
importable and reusable programming.