Python - Socket Programming

Socket Programming
	Chapter 36. Programs: Clients, Servers, the Internet and the World Wide Web

Socket Programming

Socket-level programming isn't our first choice for solving client-server problems. Sockets are nicely supported by Python, however, giving us a way to create a new protocol when the vast collection of existing internetworking protocols are inadequate.

Client-server applications include a client-side program, a server, a connection and a protocol for communication betweem the two processes. One of the most popular and enduring suite of client-server protocols is based on the Internetworking protocol: TCP/IP. For more information in TCP/IP, see Internetworking with TCP/IP [Comer95].

All of the TCP/IP protocols are based on the basic socket. A socket is a handy metaphor for the way that the Transport Control Protocol (TCP) reliably moves a stream of bytes between two processes.

The socket module includes a number of functions to create and connect sockets. Once connected, a socket behaves essentially like a file: it can be read from and written to. When we are finished with a socket, we can close it, releasing the network resources that were tied up by our processing.

Client Programs

When a client application communicates with a server, the client does three things: it establishes the connection, it sends the request and it reads the reply from the server. For some client-server relationships, like a databsae server, there may be multiple requests and replies. For other client-server requests, for example, the HTTP protocol, a single request may involve a number of replies.

To establish a connection, the client needs two basic facts about the server: the IP address and a port number. The IP address identifies the specific computer (or host) that will handle the request. The port number identifies the application program that will process the request on that host. A typical host will respond to requests on numerous ports. The port numbers prevent requests from being sent to the wrong application program. Port numbers are defined by several standards. Examples include FTP (port 21) and HTTP (port 80).

A client program makes requests to a server by using the following outline of processing.

Develop the server's address. Fundamentally, an IP address is a 32-bit host number and a 16-bit port number. Since these are difficult to manage, a variety of coding schemes are used. In Python, an address is a 2-tuple with a string and a number. The string representes the IP address in dotted notation ("194.109.137.226") or as a domain name ("www.python.org"); the number is the port number from 0 to 65535.
Create a socket and connect it to this address. This is a series of function calls to the socket module. When this is complete, the socket is connected to the remote IP address and port and the server has accepted the connection.
Send the request. Many of the standard TCP/IP protocols expect the commands to be sent as strings of text, terminated with the \n character. Often a Python file object is created from the socket so that the complete set of file method functions for reading and writing are available.
Read the reply. Many of the standard protocols will respond with a 3-digit numeric code indicating the status of the request. We'll review some common variations on these codes, below.

Developing an Address. An IP address is numeric. However, the Internet provides domain names, via Domain Name Services (DNS). This permits useful text names to be associated with numeric IP addresses. We're more used to "www.python.org". DNS resolves this to an IP address. The socket module provides functions for DNS name resolution.

The most common operation in developing an address is decoding a host name to create the numeric IP address. The socket module provides several functions for working with host names and IP addresses.

gethostname → string: Returns the current host name.
gethostbyname( host ) → address: Returns the IP address (a string of the form '255.255.255.255') for a host.
gethostbyaddr( address ) → (name, aliaslist, addresslist): Return the true host name, a list of aliases, and a list of IP addresses, for a host. The host argument is a string giving a host name or IP number.
getservbyname ( servicename , protocolname ) → integer: Return a port number from a service name and protocol name. The protocol name should be 'tcp' or 'udp'.

Typically, the socket.gethostbyname function is used to develop the IP address of a specific server name. It does this by makig a DNS inquiry to transform the host name into an IP address.

Port Numbers. The port number is usually defined by your application. For instance, the FTP application uses port number 21. Port numbers from 0 to 1023 are assigned by RFC 1700 standard and are called the well known ports. Port numbers from 1024 to 49151 are available to be registered for use by specific applications. The Internet Assigned Numbers Authority (IANA) tracks these assigned port numbers. See https://www.iana.org/assignments/port-numbers. You can use the private port numbers, from 49152 to 65535, without fear of running into any conflicts. Port numbers above 1024 may conflict with installed software on your host, but are generally safe.

Port numbers below 1024 are restricted so that only priviledged programs can use them. This means that you must have root or administrator access to run a program which provides services on one of these ports. Consequently, many application programs which are not run by root, but run by ordinary users, will use port numbers starting with 1024.

It is very common to use ports from 8000 and above for services that don't require root or administrator privileges to run. Technically, port 8000 has a defined use, and that use has nothing to do with HTTP. Port 8008 and 8080 are the official alternatives to port 80, used for developing web applications. However, port 8000 is often used for web applications.

The usual approach is to have a standard port number for your application, but allow users to override this in the event of conflicts. This can be a command-line parameter or it can be in a configuration file.

Generally, a client program must accept an IP address as a command-line parameter. A network is a dynamic thing: computers are brought online and offline constantly. A "hard-wired" IP address is an inexcusable mistake.

Create and Connect a Socket. A socket is one end of a network connection. Data passes bidirectionally through a socket between client and server. The socket module defines the SocketType, which is the class for all sockets. The socket function creates a socket object.

socket ( family , type , [ protocol ]) → SocketType: Open a socket of the given type. The family argument specifies the address family; it is normally socket.AF_INET. The type argument specifies whether this is a TCP/IP stream (socket.SOCK_STREAM) or UDP/IP datagram (socket.SOCK_DGRAM) socket. The protocol argument is not used for standard TCP/IP or UDP/IP.

A SocketType object has a number of method functions. Some of these are relevant for server-side processing and some for client-side processing. The client side method functions for establishing a connection include the following. In each definition, the variable s is a socket object.

s. connect( address ): Connect the socket to a remote address; the address is usually a (host address, port #) tuple. In the event of a problem, this will raise an exception.
s. connect_ex( address ) → integer: Connect the socket to a remote address; the address is usually a (host address, port #) tuple. This will return an error code instead of raising an exception. A value of 0 means success.
s. fileno → integer: Return underlying file descriptor, usable by the select module or the os.read and os.write functions.
s. getpeername → address: Return the remote address bound to this socket; not supported on all platforms.
s. getsockname → address: Return the local address bound to this socket.
s. getsockopt( level , opt , [ buflen ] ) → string: Get socket options. See the UNIX man pages for more information. The level is usually SOL_SOCKET. The option names all begin with SO_ and are defined in the module. You will have to use the struct module to decode results.
s. setblocking( flag ): Set or clear the blocking I/O flag.
s. setsockopt( level , opt , value ): Set socket options. See the UNIX man pages for more information. The level is usual SOL_SOCKET. The option names all begin with SO_ and are defined in the module. You will have to use the struct module to encode parameters.
s. shutdown( how ): Shutdown traffic on this socket. If how is 0, receives are disallowed; if how is 1, sends are disallowed. Usually this is 2 to disallow both reads and writes. Generally, this should be done before the close.
s. close: Close the socket. It's usually best to use the shutdown method before closing the socket.

Sending the Request and Receiving the Reply. Sending requests and processing replies is done by writing to the socket and reading data from the socket. Often, the response processing is done by reading the file object that is created by a socket's makefile method. Since the value returned by makefile is a conventional file, then readlines and writelines methods can be used on this file object.

A SocketType object has a number of method functions. Some of these are relevant for server-side processing and some for client-side processing. The client side method functions for sending (and receiving) data include the following. In each definition, the variable s is a socket object.

s. recv( bufsize , [ flags ] ) → string: Receive data, limited by bufsize . flags are MSG_OOB (read out-of-band data) or MSG_PEEK (examine the data without consuming it; a subsequent recv will read the data again).
s. recvfrom( bufsize , [ flags ] ) → ( string, address ): Receive data and sender's address, arguments are the same as recv.
s. send( string , [ flags ] ) → ( string, address ): Send data to a connected socket. The MSG_OOB flag is supported for sending out-of-band data.
s. sendto( string , [ flags , ] address ) → integer: Send data to a given address, using an unconnected socket. The flags option is the same as send. Return value is the number of bytes actually sent.
s. makefile( mode , [ bufsize ] ) → file: Return a file object corresponding to this socket. The mode and bufsize options are the same as used in the built in file function.

Example. The following examples show a simple client application using the socket module.

This is the Client class definition.

#!/usr/bin/env python
import socket

class Client( object ):
    rbufsize= -1
    wbufsize= 0
    def __init__( self, address=('localhost',7000) ):
        self.server=socket.socket( socket.AF_INET, socket.SOCK_STREAM )
        self.server.connect( address )
        self.rfile = self.server.makefile('rb', self.rbufsize)
        self.wfile = self.server.makefile('wb', self.wbufsize)
    def makeRequest( self, text ):
        """send a message and get a 1-line reply"""
        self.wfile.write( text + '\n' )
        data= self.rfile.read()
        self.server.close()
        return data

print "Connecting to Echo Server"
c= Client()
response= c.makeRequest( "Greetings" )
print repr(response)
print "Finished"

A Client object is initialized with a specific server name. The host ("localhost") and port number (8000) are default values in the class __init__ function. The address of "localhost" is handy for testing a client and a server on your PC. First the socket is created, then it is bound to an address. If no exceptions are raised, then an input and output file are created to use this socket.

The makeRequet function sends a message and then reads the reply.

Server Programs

When a server program starts, it creates a socket on which it listens for requests. The server has a three-step response to a client. First, it accepts the connection, then it reads and processes the client's request. Finally, it sends a reply to the client. For some client-server relationships, like a database server, there may be multiple requests and replies. Since database requests may take a long time to process, the server must be multi-threaded in order to handle concurrent requests. In the case of HTTP, a single request will lead to multiple replies.

A server program handles requests from a client by using the following outline of processing.

Create a Listener Socket. A listener socket is waiting for client connection requests.
Accept a Client Connection. When a client attempts a connection, the socket's accept method will return a "daughter" socket connected to the client. This daughter socket is used for all subsequent processing.
Read the request. Many of the standard TCP/IP protocols expect the commands to be sent as strings of text, terminated with the \n character. Often a Python file object is created from the socket so that the complete set of file method functions for reading and writing are available.
Send the reply. Many of the standard protocols will respond with a 3-digit numeric code indicating the status of the request. We'll review some common variations on these codes, below.

Create and Listen on a Socket. The following methods are relevant when creating server-side sockets. These server side method functions are used for establishing the public socket that is waiting for client connections. In each definition, the variable s is a socket object.

s. bind( address ): Bind the socket to a local address tuple of ( IP Address and port number ). This tuple is the address and port that will be used by clients to connect with this server. Generally, the first part of the tuple is simply "" to indicate that this server uses the address of the computer on which it is running.
s. listen( queueSize ): Start listening for incoming connections, queueSize specifies the number of queued connections.
s. accept → ( socket, address ): Accept a client connection, returning a socket connected to the client and client address.

Once the socket connection has been accepted, processing is a simple matter of reading and writing on the daughter socket.

We won't show an example of writing a server program using simple sockets. The best way to make use of server-side sockets is to use the SocketServer module.

Practical Server Programs with SocketServer

Generally, we use the SocketServer module for simple socket processing. Usually, we create a TCPSocket using this module. This can simplify the processing of requests and replies. The SocketServer module, for example, is the basis for the SimpleHTTPServer (see the section called “Web Servers and the HTTP protocol”) and SimpleXMLRPCServer (see the section called “Web Services: The xmlrpclib Module”) modules.

Much of server-side processing is encapsulated in two classes of the SocketServer module. You will subclass the StreamRequestHandler class to process TCP/IP requests. This subclass will include the methods that do the essential work of the program.

You will then create an instance of the TCPServer class and give it your RequestHandler subclass. The instance of TCPServer will to manage the public socket, and all of the basic processing. For each connection, it will create an instance of your subclass of StreamRequestHandler to handle the connection.

Define a RequestHandler. Defining a handler is done by creating a subclass of StreamRequestHandler or BaseRequestHandler and adding a handle method function. The BaseRequestHandler defines a simple framework that TCPServer can use when data is received on a socket.

Generally, we use a subclass of StreamRequestHandler. This class has methods that create files from the socket. This alliows the handle method function to simply read and write files. Specifically, the superclass will assure that the variables self.rfile and self.wfile are available.

For example, the echo service runs in port 7. The echo service simply reads the data provided in the socket, and echoes it back to the sender. Many Linux boxes have this service enabled by default. We can build the basic echo handler by creating a subclass of StreamRequestHandler.

#!/usr/bin/env python
"""My Echo"""
import SocketServer

class EchoHandler( SocketServer.StreamRequestHandler ):
    def handle(self):
        input= self.request.recv(1024)
        print "Input: %r" % ( input, )
        self.request.send("Heard: %r\n" % ( input, ) )

server= SocketServer.TCPServer( ("",7000), EchoHandler )
print "Starting Server"
server.serve_forever()

This class can be used by a TCPServer instance to handle requests. In this, the TCPServer instance named server creates an instance of EchoHandler each time a connection is made on port 7. The derived socket is given to the handler instance, as the instance variable self.request.

A more sophisticated handler might decode input commands and perform unique processing for each command. For example, if we were building an on-line Roulette server, there might be three basic commands: a place bet command, a show status command and a spin the wheel command. There might be additional commands to join a table, chat with other players, perform credit checks, etc.

Methods of TCPServer. In order to process requests, there are two methods of a TCPServer that are of interest. In the following examples the TCPServer instance is the variable s.

s. handle_request: Handle a single request: wait for input, create the handler object to process the request.
s. serve_forever: Handle requests in an infinite loop. Runs until the loop is broken with an exception.

Protocol Design Notes

Generally, basic web services do almost everything we need; and they do this kind of thing in a simple and standard way. Using sockets is done either to invent something knew or to cope with something very old. Generally, using web services is a better choice than inventing your own protocol.

If you can't, for some reason, make suitable use of web services, here are some lessons gleaned from the reading the Internetworking Requests for Comments (RFCs).

Many protocols involve a request-reply conversational style. The client connects to the server and makes requests. The server replies to each request. Some protocols (for example, FTP) may involve a long conversation. Other protocols (for example, HTTP) involve a single request and (sometimes) a single reply. Many web sites leverate HTTP's ability to send multiple replies, but some web sites send a single, tidy response.

Many of the Internet standard requests are short 1- to 4-character commands. The syntax is kept intentionally very simple, using spaces for delimeters. Complex syntax with optional clauses and sophisticated punctuation is often an aid for people. In most web protocols, a sequence of simple commands are used instead of a single, complex statement.

The responses are often 3-digit numbers plus explanatory comments. The application depends on the 3-digit number. The explanatory comments can be written to a log or displayed for a human user. The status numbers are often coded as follows:

1yz: Preliminary reply, more replies will follow.
2yz: Completed.
3yz: More information required. This is typically the start of a dialog.
4yz: Request not completed; trying again makes sense. This is a transient problem like a deadlock, timeout, or file system problem.
5yz: Request not completed because it's in error; trying again doesn't make sense. This a syntax problem or other error with the request.

The middle digit within the response provides some additional information.

x0z: The response message is syntax-related.
x1z: The response message is informational.
x2z: The response message is about the connection.
x3z: The response message is about accounting or authentication.
x5z: The response message is file-system related.

These codes allow a program to specify multi-part replies using 1 yz codes. The status of a client-server dialog is managed with 3 yz codes that request additional information. 4 yz codes are problems that might get fixed. 5 yz codes are problems that can never be fixed (the request doesn't make sense, has illegal options, etc.)

Note that protocols like FTP (RFC 959) provide a useful convention for handling multi-line replies: the first line has a - after the status number to indicate that additional lines follow; each subsequent lines are indented. The final line repeats the status number. This rule allows us to detect the first of many lines, and absorb all lines until the matching status number is read.


Client-Server Exercises		Part V. Projects