So how does the
whole interface work? Most servers expect CGI programs and scripts
to reside in a special directory, usually called cgi-bin,
and/or to have a certain file extension. (These configuration parameters
are discussed in the Configuring the Server
section in this chapter.) When a user opens a URL associated with
a CGI program, the client sends a request to the server asking for
the file.
For the most part, the request for a CGI
program looks the same as it does for all Web documents. The difference
is that when a server recognizes that the address being requested
is a CGI program, the server does not return the file contents verbatim.
Instead, the server tries to execute the program. Here is what a
sample client request might look like:
GET /cgi-bin/welcome.pl HTTP/1.0
Accept: www/source
Accept: text/html
Accept: image/gif
User-Agent: Lynx/2.4 libwww/2.14
From: [email protected]
This GET
request identifies the file to retrieve as /cgi-bin/welcome.pl.
Since the server is configured to recognize all files inf the cgi-bin directory tree as CGI programs, it understands that it should execute
the program instead of relaying it directly to the browser. The
string HTTP/1.0 identifies the communication
protocol to use.
The client request also passes the
data formats it can accept (www/source, text/html,
and image/gif), identifies itself as a Lynx client,
and sends user information. All this information is made available
to the CGI program, along with additional information from the server.
The way that
CGI programs get their input depends on the server and on the native
operating system. On a UNIX system, CGI programs
get their input from standard input (STDIN) and
from UNIX
environment variables. These
variables store such information as the input search string (in
the case of a form), the format of the input, the length of the
input (in bytes), the remote host and user passing the input, and
other client information. They also store the server name, the communication
protocol, and the name of the software running the server.
Once the CGI program starts running, it can either create
and output a new document, or provide the URL to an existing one.
On UNIX, programs send their output to
standard output (STDOUT)
as a data stream. The data stream consists of two parts. The first
part is either a full or partial HTTP
header that (at minimum) describes what format the returned data
is in (e.g., HTML, plain text, GIF, etc.). A
blank line signifies the end of the header section. The second part
is the body, which contains the data conforming to the format type
reflected in the header. The body is not modified or interpreted
by the server in any way.
A CGI program can choose to
send the newly created data directly to the client or to send it
indirectly through the server. If the output consists of a complete
HTTP header, the data is sent directly to the
client without server modification. (It's actually a little more
complicated than this, as we will discuss in Chapter 3, Output from the Common Gateway Interface.) Or,
as is usually the case, the output is sent to the server as a data
stream. The server is then responsible for adding the complete header
information and using the HTTP protocol to transfer
the data to the client.
Here is the sample output of
a program generating an HTML virtual document,
with the complete HTTP header:
HTTP/1.0 200 OK
Date: Thursday, 22-February-96 08:28:00 GMT
Server: NCSA/1.4.2
MIME-version: 1.0
Content-type: text/html
Content-length: 2000
<HTML>
<HEAD><TITLE>Welcome to Shishir's WWW Server!</TITLE></HEAD>
<BODY>
<H1>Welcome!</H1>
.
.
</BODY>
</HTML>
The header contains the communication protocol, the date and
time of the response, the server name and version, and the revision
of the
MIME protocol.[1] Most importantly, it also consists
of the MIME content type and the number of characters
(equivalent to the number of bytes) of the enclosed data, as well
as the data itself. Now, the output with the partial HTTP
header:
Content-type: text/html
<HTML>
<HEAD><TITLE>Welcome to Shishir's WWW Server!</TITLE></HEAD>
<BODY>
<H1>Welcome!</H1>
.
.
</BODY>
</HTML>
In this instance, the only header line that is output is the
Content-type
header, which describes the MIME format of the
output. Since the output is in HTML format, text/html
is the content type that is declared.
Most CGI programmers
prefer to supply only a partial header. It is much simpler to output
the format and the data than to formulate the complete header information,
which can be left to the server. However, there are times when you
need to send the information directly to the client (by outputting
a complete HTTP header), as you will see in Chapter 3, Output from the Common Gateway Interface.