Python - Command-Line Programs: Servers and Batch Processing

Command-Line Programs: Servers and Batch Processing
	Chapter 35. Programs: Standing Alone

Command-Line Programs: Servers and Batch Processing

Many programs have minimal or no user interaction at all. They are run from a command-line prompt, perform their function, and exit gracefully. They may produce a log; they may return a status code to the operating system to indicate success for failure.

Almost all of the core Linux utilities (cp, rm, mv, ln, ls, df, du, etc.) are programs that decode command-line parameters, perform their processing function and return a status code. Except for a few explicitly interactive programs like editors (ex, vi, emacs, etc.), almost all of the core elements of Linux are filter-like programs.

There are two critical features that make a command-line program well-behaved. First, the program should accept the arguments in a standard manner. Second the program should generally limit output to the standard output and standard error files created by the environment. When any other files are written it must be by user request and possibly require interactive confirmation.

Command Line Options and Operands. The standard handling of command-line arguments is given as 13 rules for UNIX commands, as shown in the intro section of UNIX man pages. These rules describe the program names (rules 1-2), simple options (rules 3-5), options that take argument values (rules 6-8) and operands (rules 9 and 10) for the program.

The program name should be between two and nine characters. This is consistent with most file systems where the program name is a file name. In the Python environment, the program file must have extension of .py.
The program name should include only lower-case letters and digits. The objective is to keep names relatively simple and easy to type correctly. Mixed-case names and names with punctuation marks can introduce difficulties in typing the program name correctly. To be used as a module or package in Python, the program file name must be just letters, digits and _'s.
Option names should be one character long. This is difficult to achieve in complex programs. Often, options have two forms: a single-character short form and a multi-character long form.
Single-character options are preceded by -. Multiple-character options are preceeded by --. All options have a flag that indicates that this is an option, not an operand. Single character options, again, are easier to type, but may be hard to remember for new users of a program.
Options with no arguments may be grouped after a single -. This allows a series of one-character options to be given in a simple cluster, for example ls -ldai bin clusters the -l, -d, -a and -i options.
Options that accept an argument value use a space separator. The option arguments are not run together with the option. Without this rule, it might be difficult to tell a option cluster from an option with arguments. Without this rule cut -ds could be an argument value of s for the -d option, or it could be clustered single-character options -d and -s.
Option-arguments cannot be optional. If an option requires an argument value, presence of the option means that an argument value will follow. If the presence of an option is somehow different from supplying a value for the option, two separate options must be used to specify these various conditions.
Groups of option-arguments following an option must be a single word; either separated by commas or quoted. For example: -d "9,10,56". A space would mean another option or the beginning of the operands.
All options must precede any operands on the command line. This basic principle assures a simple, easy to understand uniformity to command processing.
The string -- may be used to indicate the end of the options. This is particularly important when any of the operands begin with - and might be mistaken for an option.
The order of the options relative to one another should not matter. Generally, a program should absorb all of the options to set up the processing.
The relative order of the operands may be significant. This depends on what the operands mean and what the program does.
The operand - preceded and followed by a space character should only be used to mean standard input. This may be passed as an operand, to indicate that the standard input file is processed at this time. For example, cat file1 - file2 will process file1, standard input and file2.

These rules are handled by the getopt (or optparse) module and the sys.argv variable in the sys module.

Output Control. A well-behaved program does not overwrite data without an explicit demand from a user. Programs with a assumed, default or implicit output file are a pronblem waiting to happen. A well-behaved program should work as follows.

A well-designed program has an obvious responsibility that is usually tied to creating one specific output. This can be a report, or a file of some kind. In a few cases we may find it necessary to optimize processing so that a number of unrelated outputs are produced by a single program.
The best policy for this output is to write the resulting file to standard output (sys.stdout, which is the destination for the print statement.) Any logging, status or error reporting is sent to sys.stderr. If this is done, then simple shell redirection operators can be used to collect this output in an obvious way.
```
python someProgram.py >this_file_gets_written
```
In some cases, there are actually two outputs: details and a useful summary. In this case, the summary should go to standard output, and an option specifies the destination of the details.
```
python aProgram.py -o details.dat >summary.txt
```

Program Startup and the Operating System Interface. The essential operating system interface to our programs is relatively simple. The operating system will start the Python program, providing it with the three standard files (stdin, stdout, stderr; see the section called “File Semantics” for more information), and the command line arguments. In response, Python provides a status code back to the operating system. Generally a status code of 0 means things worked perfectly. Status codes which are non-zero indicate some kind of problem or failure.

When we run something like

python casinosim.py -g craps

The operating system command processor (the Linux shell or Windows cmd.exe) breaks this line into a command ( python ) and a sequence of argument values. The shell finds the relevant executable file by searching it's PATH, and then starts the program, providing the rest of the command line as argument values to that program.

A Python program will see that the command line arguments are assigned to sys.argv as ["casinosim.py", "-g", "craps"]. argv[0] is the name of the main module, the script Python is currently running.

When the script in casinosym.py finishes running, the Python interpreter also finishes, and returns a status code of 0 to the operating system.

To return a non-zero status code, use the sys.exit function.

Reuse and The Main-Import Switch. In the section called “Module Use: The import Statement” we talked about the Main-Import switch. The global __name__ variable is essential for determing the context in which a module is used.

A well-written application module often includes numerous useful class and function definitions. When combining modules to create application programs, it may be desirable to take a module that had been originally designed as a stand-alone program and combine it with others to make a larger and more sophisticated program. In some cases, a module may be both a main program for some use cases and a library module for other use cases.

The __name__ variable defines the context in which a module is being used. During evaluation of a file, when __name__ == "__main__", this module is the main module, started by the Python interpreter. Otherwise, __name__ will be the name of the file being imported. If __name__ is not the string "__main__", this module is being imported, and should take no action of any kind.

This test is done with the as follows:

if __name__ == "__main__":
    main()

This kind of reuse assures that programming is not duplicated. It is notoriously difficult to maintain two separate files that are supposed to contain the same program text. This kind of "cut and paste reuse" is a terrible burden on programmers. Python encourages reuse through both classes and modules. All modules can be cofigured as importable and reusable programming.


Chapter 35. Programs: Standing Alone		The `getopt` Module