Chapter 36. Programs: Clients, Servers, the Internet and the World Wide
Web
Web Servers and the HTTP protocol
The most widely-used protocol is probably HTTP. It is the backbone
of the World Wide Web. The HTTP protocol defines two parties: the client
(or browser) and the server. The browser is generally some piece of
software like FireFox, Opera or Safari. These products handle half the
HTTP conversation.
A web server handles the other half of the HTTP conversation. We
have a number of choices of how to handle this.
We can write our own from scratch. Python provides us three
seed modules from which we can build a working server. In some
applications, where the volume is low, this is entirely appropriate.
See the BaseHTTPServer,
SimpleHTTPServer and
CGIHTTPServer modules.
We can plug into the Apache server. Apache supports a wide
variety of plug-in technologies, including CGI, SCGI, FCGI.
We can plug into Apache using ModPython. The
ModPython project on Source Forge
contains a module which embeds a Python interpreter directly in
Apache. This embedded interpreter then runs your Python programs as
part of Apache's response to HTTP requests. This is very secure and
very fast.
We can use a web framework. In this case, the web framework
plugs into Apache, and the framework handles much of the details of
a web request. Using a web framework means that we do much, much
less programming. Python has dozens of popular, successful web
frameworks. You can look at Zope,
Django and
TurboGears for just three examples of
dozens of ways that the Python community has simplified the
construction of web applications.
We can't easily cover ModPython or any of the web frameworks in
this book. But we can take a quick look at
SImpleHTTPServer, just to show what's involved in
HTTP. We'll leverage this example to handle some additional requests in
subsequent sections.
About HTTP
There are two versions of HTTP, both widely used. Version 1.0
doesn't include cookies or some other features that are essential for
modern, interactive web applications. Version 1.1, however, requires
some more care in creating the response to the web browser.
An HTTP request includes a number of pieces of information. A
few of these pieces of information are of particular interest to a web
application.
command
The command is generally GET or POST. There are other
commands specified in the protocol (like HEAD or INDEX), but
they are rarely provided by browsers.
path
This is the path (after the host name and port). This can
include a query string, separated from the main part of the URI
by a "?".
headers
There are a number of headers which are included in the
query; these describe the browser, and what the browser is
capable of. The headers summarize some of the browser's
preferences, like the language which is preferred. They also
describe any additional data that is attached to the request.
The "content-length" header, in particular, tells you that form
input or a file upload is attached.
An HTTP reply includes a number of pieces of information. It
always begins with a MIME-type string that tells the browser what kind
of document will follow. This string us often TEXT/HTML
or TEXT/PLAIN. The reply also includes the status code
and a number of headers. Often the headers are version infromation
that the browser can reveal via the Page
Info menu item in the browser. Finally, the reply
includes the actual document, either plain text, HTML or an
image.
There are a number of HTTP status codes. Generally, a simple
page includes a status code of 200, indicating that request is
complete, and the page is being sent. The 30x status codes indicate
that the page was moved, the "Location" header
provides the URL to which the browser will redirect. The 40x status
codes indicate problems with the request. The 50x status codes
indicate problems with the server, problems that might clear up in the
future.
Building an HTTP Server
Your HTTP server has two parts. The control of services in
general is handled by
BaseHTTPServer.HTTPServer. This class has two
methods that are commonly used. In the following examples,
srvr is an instance of
BaseHTTPServer.HTTPServer.
HTTPServer(addresshandlerClass)
The address is a two-tuple, with server
name and port number, usually something like
('',8008). The handlerClass is the name of a
subclass of
BaseHTTPServer.BaseHTTPRequestHandler.
This server will create an instance of this class, and invoke
appropriate methods of that class to serve the requests.
srvr.handle_request
This method of a server will handle just one request. It's
handy for debugging.
srvr.serve_forever
This method of a server will handle requests until the
server is stopped.
The server requires a subclass of
BaseHTTPServer.BaseHTTPRequestHandler. The base
class does a number of standard operations related to handling web
service requests. Generally, you'll need to override just a few
methods. Since most browsers will only send GET or
POST requests, you only need to provide
do_GET and do_POST
methods. For each request, you'll need to provide a matching
do_X method function.
do_GET
Handle a GET request from a browser.
do_POST
Handle a POST request from a browser.
This class has several class variables. Generally, you'll want
to override server_version, with some
identification for your server. There are some additional class
variables that you might want to use. When using these, the instance
qualifier, self., is required.
self.server_version
A string to identify your server and version. This string
can have multiple clauses, each separated by whitespace. Each
clause is of the form product/version. The default is
'BaseHTTP/0.3'.
self.sys_version
This a version string to identify the overall system. It
has the form product/version. The default is
'Python/2.5.1'.
self.error_message_format
This is the web page to send back by the send_error
method. The send_error method uses the error code to create a
dictionary with three keys: "code",
"message" and "explain".
The "code" item in the dictionary has the
numeric error code. The "message" item is the
short message from the self.responses
dictionary. The "explain" method is the long message from the
self.responses dictionary. Since a dictionary
is provided, the formatting string for his error message can
include conversion strings: %(code)d,
%(message)s and
%(explain)s.
self.protocol_version
This is the HTTP version being used. This defaults to
'HTTP/1.0'. If you set this to
'HTTP/1.1', then you should also use the
"Content-Length" header to provide the
browser with the precise size of the page being sent.
self.responses
A dictionary, keyed by status code. Each entry is a
two-tuple with a short message and a long explanation. The
message for status code 200, for example, is
'OK'. The explanation is somewhat
longer.
This class has a number of instance variables which characterize
the specific request that is currently being handled. These are proper
instance variables, so the instance qualifier,
self., is required.
self.client_address
An internet address as used by Python. This is a 2-tuple:
(host address, port number).
self.command
The command in the request. This will usually be GET or
POST.
self.path
The requested path.
self.request_version
The protocol version string sent by the browser. Generally
it will be 'HTTP/1.0' or
'HTTP/1.1'.
self.headers
This is a collection of headers, usually an instance of
mimetools.Message. This is a mapping-like
class that gives you access to the individual headers in the
request. The header "cookie", for instance,
will have the cookies being sent back by the browser. You will
need to decode the value of the cookie, usually using the
Cookie module.
self.rfile
If there is an input stream, this is a file-like object
that can read that stream. Do not read this without providing a
specific size to read. Generally, you want to get
headers['Content-Length'] and read this number of
bytes. If you do not specify the number of bytes to read, and
there is no supplemental data, your program will wait for data
on the underlying socket. Data which will never appear.
self.wfile
This is the response socket, which the browser is reading.
The response protocol requires that it be used as
follows:
Use self.send_response( number ) or
self.send_response( number, text ). Usually you
simply send 200.
Use self.send_header( header, value ) to
send specific headers, like
"Content-type" or
"Content-length". The
"Set-cookie" header provides cookie
values to the browser. The "Location"
header is used for a 30x redirect response.
Use self.end_headers() to finish sending
headers and start sending the resulting page.
Then (and only then) you can use
self.wfile.write to send the page
content.
Use self.wfile.close() if this is a
HTTP/1.0 connection.
This class has a number of methods which you'll want to use from
within your do_GET and
do_POST methods. Since these are used from
within your methods, we'll use self. as the
instance qualifier.
self(.send_errornumber, 〈message〉)
Send an error response. By default, this is a complete,
small page that shows the code, message and explanation. If you
do not provide a message, the short message
from the
self.responses[number]
mapping will be used.
self(.send_responsenumber, 〈message〉)()
Sends a response in pieces. If you do not provide a
message, the short message from the
self.responses[number]
mapping will be used. This method is the first step in sending a
response. This must be followed by
self.send_header if any headers are
present. It must be followed by
self.end_headers. Then the page content
can be sent.
self(.send_headername, value)
Send one HTTP header and it's value. Use this to send
specific headers, like "Content-type" or
"Content-length". If you are doing a
redirect, you'll need to include the
"Location" header.
self.end_headers
Finish sending the headers; get ready to send the page
content. Generally, this is followed by writing to
self.wfile.
self(.log_requeststatus, 〈size〉)
Uses self.log_message to write an
entry into the log file fo\r a normal response. This is done
automatically by send_headers.
self.(log_errorformat, args...)
Uses self.log_message to write an
entry into the log file for an error response. This is done
automatically by send_error.
self.(log_messageformat, args...)
Writes an entry into the log file. You might want to
override this if you want a different format for the error log,
or you want it to go to a different destination than
sys.stderr.
Example HTTP Server
The following example shows the skeleton for a simple HTTP
server. This sever merely displays the GET or POST request that it
receives. A Python-based web server can't ever be fast enough to
replace Apache. However, for some applications, you might find it
convenient to develop a small, simple application which handles
HTTP.
You must create a subclass of
BaseHTTPServer.BaseHTTPRequestHandler.
Since most browsers will only send GET or
POST requests, we only provide
do_GET and
do_POST methods. Additionally, we
provide a value of server_version which
will be sent back to the browser.
The HTTP protocol allows our application to put the
input to a form either in the URL or in a separate data
stream. Generally, a forms will use a POST request; the data
is available
This is the start of a debugging routine that dumps the
complete request. This is handy for learning how HTTP
works.
This shows the proper sequence for sending a simple page
back to a browser. Thi s technique will work for files of all
types, including images. This method doesn't handle complex
headers, particularly cookies, very well.
This creates the server, srvr, as an
instance of BaseHTTPServer.HTTPServer
which uses MyHandler to process each
request.
Published under the terms of the Open Publication License