Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Web Servers and the HTTP protocol

The most widely-used protocol is probably HTTP. It is the backbone of the World Wide Web. The HTTP protocol defines two parties: the client (or browser) and the server. The browser is generally some piece of software like FireFox, Opera or Safari. These products handle half the HTTP conversation.

A web server handles the other half of the HTTP conversation. We have a number of choices of how to handle this.

  • We can write our own from scratch. Python provides us three seed modules from which we can build a working server. In some applications, where the volume is low, this is entirely appropriate. See the BaseHTTPServer, SimpleHTTPServer and CGIHTTPServer modules.

  • We can plug into the Apache server. Apache supports a wide variety of plug-in technologies, including CGI, SCGI, FCGI.

  • We can plug into Apache using ModPython. The ModPython project on Source Forge contains a module which embeds a Python interpreter directly in Apache. This embedded interpreter then runs your Python programs as part of Apache's response to HTTP requests. This is very secure and very fast.

  • We can use a web framework. In this case, the web framework plugs into Apache, and the framework handles much of the details of a web request. Using a web framework means that we do much, much less programming. Python has dozens of popular, successful web frameworks. You can look at Zope, Django and TurboGears for just three examples of dozens of ways that the Python community has simplified the construction of web applications.

We can't easily cover ModPython or any of the web frameworks in this book. But we can take a quick look at SImpleHTTPServer, just to show what's involved in HTTP. We'll leverage this example to handle some additional requests in subsequent sections.

About HTTP

There are two versions of HTTP, both widely used. Version 1.0 doesn't include cookies or some other features that are essential for modern, interactive web applications. Version 1.1, however, requires some more care in creating the response to the web browser.

An HTTP request includes a number of pieces of information. A few of these pieces of information are of particular interest to a web application.

command

The command is generally GET or POST. There are other commands specified in the protocol (like HEAD or INDEX), but they are rarely provided by browsers.

path

This is the path (after the host name and port). This can include a query string, separated from the main part of the URI by a "?".

headers

There are a number of headers which are included in the query; these describe the browser, and what the browser is capable of. The headers summarize some of the browser's preferences, like the language which is preferred. They also describe any additional data that is attached to the request. The "content-length" header, in particular, tells you that form input or a file upload is attached.

An HTTP reply includes a number of pieces of information. It always begins with a MIME-type string that tells the browser what kind of document will follow. This string us often TEXT/HTML or TEXT/PLAIN. The reply also includes the status code and a number of headers. Often the headers are version infromation that the browser can reveal via the Page Info menu item in the browser. Finally, the reply includes the actual document, either plain text, HTML or an image.

There are a number of HTTP status codes. Generally, a simple page includes a status code of 200, indicating that request is complete, and the page is being sent. The 30x status codes indicate that the page was moved, the "Location" header provides the URL to which the browser will redirect. The 40x status codes indicate problems with the request. The 50x status codes indicate problems with the server, problems that might clear up in the future.

Building an HTTP Server

Your HTTP server has two parts. The control of services in general is handled by BaseHTTPServer.HTTPServer. This class has two methods that are commonly used. In the following examples, srvr is an instance of BaseHTTPServer.HTTPServer.

HTTPServer ( address handlerClass )

The address is a two-tuple, with server name and port number, usually something like ('',8008). The handlerClass is the name of a subclass of BaseHTTPServer.BaseHTTPRequestHandler. This server will create an instance of this class, and invoke appropriate methods of that class to serve the requests.

srvr.handle_request

This method of a server will handle just one request. It's handy for debugging.

srvr.serve_forever

This method of a server will handle requests until the server is stopped.

The server requires a subclass of BaseHTTPServer.BaseHTTPRequestHandler. The base class does a number of standard operations related to handling web service requests. Generally, you'll need to override just a few methods. Since most browsers will only send GET or POST requests, you only need to provide do_GET and do_POST methods. For each request, you'll need to provide a matching do_X method function.

do_GET

Handle a GET request from a browser.

do_POST

Handle a POST request from a browser.

This class has several class variables. Generally, you'll want to override server_version, with some identification for your server. There are some additional class variables that you might want to use. When using these, the instance qualifier, self., is required.

self.server_version

A string to identify your server and version. This string can have multiple clauses, each separated by whitespace. Each clause is of the form product/version. The default is 'BaseHTTP/0.3'.

self.sys_version

This a version string to identify the overall system. It has the form product/version. The default is 'Python/2.5.1'.

self.error_message_format

This is the web page to send back by the send_error method. The send_error method uses the error code to create a dictionary with three keys: "code", "message" and "explain". The "code" item in the dictionary has the numeric error code. The "message" item is the short message from the self.responses dictionary. The "explain" method is the long message from the self.responses dictionary. Since a dictionary is provided, the formatting string for his error message can include conversion strings: %(code)d, %(message)s and %(explain)s.

self.protocol_version

This is the HTTP version being used. This defaults to 'HTTP/1.0'. If you set this to 'HTTP/1.1', then you should also use the "Content-Length" header to provide the browser with the precise size of the page being sent.

self.responses

A dictionary, keyed by status code. Each entry is a two-tuple with a short message and a long explanation. The message for status code 200, for example, is 'OK'. The explanation is somewhat longer.

This class has a number of instance variables which characterize the specific request that is currently being handled. These are proper instance variables, so the instance qualifier, self., is required.

self.client_address

An internet address as used by Python. This is a 2-tuple: (host address, port number).

self.command

The command in the request. This will usually be GET or POST.

self.path

The requested path.

self.request_version

The protocol version string sent by the browser. Generally it will be 'HTTP/1.0' or 'HTTP/1.1'.

self.headers

This is a collection of headers, usually an instance of mimetools.Message. This is a mapping-like class that gives you access to the individual headers in the request. The header "cookie", for instance, will have the cookies being sent back by the browser. You will need to decode the value of the cookie, usually using the Cookie module.

self.rfile

If there is an input stream, this is a file-like object that can read that stream. Do not read this without providing a specific size to read. Generally, you want to get headers['Content-Length'] and read this number of bytes. If you do not specify the number of bytes to read, and there is no supplemental data, your program will wait for data on the underlying socket. Data which will never appear.

self.wfile

This is the response socket, which the browser is reading. The response protocol requires that it be used as follows:

  1. Use self.send_response( number ) or self.send_response( number, text ). Usually you simply send 200.

  2. Use self.send_header( header, value ) to send specific headers, like "Content-type" or "Content-length". The "Set-cookie" header provides cookie values to the browser. The "Location" header is used for a 30x redirect response.

  3. Use self.end_headers() to finish sending headers and start sending the resulting page.

  4. Then (and only then) you can use self.wfile.write to send the page content.

  5. Use self.wfile.close() if this is a HTTP/1.0 connection.

This class has a number of methods which you'll want to use from within your do_GET and do_POST methods. Since these are used from within your methods, we'll use self. as the instance qualifier.

self (.send_error number , 〈 message 〉)

Send an error response. By default, this is a complete, small page that shows the code, message and explanation. If you do not provide a message, the short message from the self.responses[ number ] mapping will be used.

self (.send_r esponsenumber , 〈 message 〉)()

Sends a response in pieces. If you do not provide a message, the short message from the self.responses[ number ] mapping will be used. This method is the first step in sending a response. This must be followed by self.send_header if any headers are present. It must be followed by self.end_headers. Then the page content can be sent.

self (.send_header name , value )

Send one HTTP header and it's value. Use this to send specific headers, like "Content-type" or "Content-length". If you are doing a redirect, you'll need to include the "Location" header.

self.end_headers

Finish sending the headers; get ready to send the page content. Generally, this is followed by writing to self.wfile.

self (.log_request status , 〈 size 〉)

Uses self.log_message to write an entry into the log file fo\r a normal response. This is done automatically by send_headers.

self.(log_error format , args... )

Uses self.log_message to write an entry into the log file for an error response. This is done automatically by send_error.

self.(log_message format , args... )

Writes an entry into the log file. You might want to override this if you want a different format for the error log, or you want it to go to a different destination than sys.stderr.

Example HTTP Server

The following example shows the skeleton for a simple HTTP server. This sever merely displays the GET or POST request that it receives. A Python-based web server can't ever be fast enough to replace Apache. However, for some applications, you might find it convenient to develop a small, simple application which handles HTTP.

Example 36.1. webserver.py

import BaseHTTPServer

class MyHandler( BaseHTTPServer.BaseHTTPRequestHandler ):
    server_version= "MyHandler/1.1"
    def do_GET( self ):
        self.log_message( "Command: %s Path: %s Headers: %r"
                          % ( self.command, self.path, self.headers.items() ) )
        self.dumpReq( None )
    def do_POST( self ):
        self.log_message( "Command: %s Path: %s Headers: %r"
                          % ( self.command, self.path, self.headers.items() ) )
        if self.headers.has_key('content-length'):
            length= int( self.headers['content-length'] )
            self.dumpReq( self.rfile.read( length ) )
        else:
            self.dumpReq( None )
    def dumpReq( self, formInput=None ):
        response= "<html><head></head><body>"
        response+= "<p>HTTP Request</p>"
        response+= "<p>self.command= <tt>%s</tt></p>" % ( self.command )
        response+= "<p>self.path= <tt>%s</tt></p>" % ( self.path )
        response+= "</body></html>"
        self.sendPage( "text/html", response )
    def sendPage( self, type, body ):
        self.send_response( 200 )
        self.send_header( "Content-type", type )
        self.send_header( "Content-length", str(len(body)) )
        self.end_headers()
        self.wfile.write( body )

def httpd(handler_class=MyHandler, server_address = ('', 8008), ):
    srvr = BaseHTTPServer.HTTPServer(server_address, handler_class)
    srvr.handle_request() # serve_forever

if __name__ == "__main__":
    httpd( )
1

You must create a subclass of BaseHTTPServer.BaseHTTPRequestHandler. Since most browsers will only send GET or POST requests, we only provide do_GET and do_POST methods. Additionally, we provide a value of server_version which will be sent back to the browser.

2

The HTTP protocol allows our application to put the input to a form either in the URL or in a separate data stream. Generally, a forms will use a POST request; the data is available

3

This is the start of a debugging routine that dumps the complete request. This is handy for learning how HTTP works.

4

This shows the proper sequence for sending a simple page back to a browser. Thi s technique will work for files of all types, including images. This method doesn't handle complex headers, particularly cookies, very well.

5

This creates the server, srvr, as an instance of BaseHTTPServer.HTTPServer which uses MyHandler to process each request.


 
 
  Published under the terms of the Open Publication License Design by Interspire