|
|
|
|
|
Chapter 2
Input to the Common Gateway Interface |
|
Much
of the most crucial information needed by CGI applications is made
available via UNIX environment variables. Programs
can access this information as they would any environment variable
(e.g., via the %ENV associative array in Perl).
This section concentrates on showing examples of some of the more
typical uses of environment variables in CGI programs. First, however,
Table 2.1 shows a full list of environment
variables available for CGI.
Table 2.1: List of CGI Environment Variables
Environment Variable |
Description |
GATEWAY_INTERFACE |
The revision of the Common Gateway Interface
that the server uses. |
SERVER_NAME |
The server's hostname or IP address. |
SERVER_SOFTWARE |
The name and version of the server software
that is answering the client request. |
SERVER_PROTOCOL |
The name and revision of the information
protocol the request came in with. |
SERVER_PORT |
The port number of the host on which
the server is running. |
REQUEST_METHOD |
The method with which the information
request was issued. |
PATH_INFO |
Extra path information passed to a CGI
program. |
PATH_TRANSLATED |
The translated version of the path given
by the variable PATH_INFO. |
SCRIPT_NAME |
The virtual path (e.g., /cgi-bin/program.pl)
of the script being executed. |
DOCUMENT_ROOT |
The directory from which Web documents
are served. |
QUERY_STRING |
The query information passed to the program.
It is appended to the URL with a "?". |
REMOTE_HOST |
The remote hostname of the user making
the request. |
REMOTE_ADDR |
The remote IP address of the user making
the request. |
AUTH_TYPE |
The authentication method used to validate
a user. |
REMOTE_USER |
The authenticated name of the user. |
REMOTE_IDENT |
The user making the request. This variable
will only be set if NCSA IdentityCheck flag
is enabled, and the client machine supports the RFC 931 identification
scheme (ident daemon). |
CONTENT_TYPE |
The MIME type of the query data, such
as "text/html". |
CONTENT_LENGTH |
The length of the data (in bytes or the
number of characters) passed to the CGI program through standard
input. |
HTTP_FROM |
The email address of the user making
the request. Most browsers do not support this variable. |
HTTP_ACCEPT |
A list of the MIME types that the client
can accept. |
HTTP_USER_AGENT |
The browser the client is using to issue
the request. |
HTTP_REFERER |
The URL of the document that the client
points to before accessing the CGI program. |
We'll use examples to demonstrate how these variables are
typically used within a CGI program.
Let's
start with a simple program that displays various information about
the server, such as the CGI and HTTP revisions
used and the name of the server software.
#!/usr/local/bin/perl
print "Content-type: text/html", "\n\n";
print "<HTML>", "\n";
print "<HEAD><TITLE>About this Server</TITLE></HEAD>", "\n";
print "<BODY><H1>About this Server</H1>", "\n";
print "<HR><PRE>";
print "Server Name: ", $ENV{'SERVER_NAME'}, "<BR>", "\n";
print "Running on Port: ", $ENV{'SERVER_PORT'}, "<BR>", "\n";
print "Server Software: ", $ENV{'SERVER_SOFTWARE'}, "<BR>", "\n";
print "Server Protocol: ", $ENV{'SERVER_PROTOCOL'}, "<BR>", "\n";
print "CGI Revision: ", $ENV{'GATEWAY_INTERFACE'}, "<BR>", "\n";
print "<HR></PRE>", "\n";
print "</BODY></HTML>", "\n";
exit (0);
Let's go through this program step by step. The first line
is very important. It instructs the server to use the Perl interpreter
located in the /usr/local/bin directory to
execute the CGI program. Without this line, the server won't know
how to run the program, and will display an error stating that it
cannot execute the program.
Once the CGI script is running, the first thing it needs to
generate is a valid HTTP header, ending with
a blank line. The header generally contains a content type, also
known as a MIME type. In this case, the content
type of the data that follows is text/html.
After
the MIME content type is output, we can go ahead
and display output in HTML. We send the information
directly to standard output, which is read and processed by the
server, and then sent to the client for display. Five environment
variables are output, consisting of the server name (the IP name
or address of the machine where the server is running), the port
the server is running on, the server software, and the HTTP
and CGI revisions. In Perl, you can access the environment variables
through the %ENV associative array, keyed by
name.
A typical output of this program might look like this:
<HTML>
<HEAD><TITLE>About this Server</TITLE></HEAD>
<BODY><H1>About this Server</H1>
<HR><PRE>
Server Name: bu.edu
Running on Port: 80
Server Software: NCSA/1.4.2
Server Protocol: HTTP/1.0
CGI Revision: CGI/1.1
<HR></PRE>
</BODY></HTML>
Now,
let's look at a slightly more complicated example. One of the more
useful items that the server passes to the CGI program is the client
(or browser) name. We can put this information to good use by checking
the browser type, and then displaying either a text or graphic document.
Different
Web browsers support different HTML tags and
different types of information. If your CGI program generates an
inline image, you need to be sensitive that some browsers support
<IMG> extensions
that others don't, some browsers support JPEG
images as well as GIF images, and some browsers (notably, Lynx and
the old www client) don't support images at
all. Using the
HTTP_USER_AGENT
environment variable, you can determine which browser is being used,
and with that information you can fine-tune your CGI program to
generate output that is optimized for that browser.
Let's build a short program that delivers a different document
depending on whether the browser supports graphics. First, identify
the browsers that you know don't support graphics. Then get the
name of the browser from the HTTP_USER_AGENT
variable:
#!/usr/local/bin/perl
$nongraphic_browsers = 'Lynx|CERN-LineMode';
$client_browser = $ENV{'HTTP_USER_AGENT'};
The variable $nongraphic_browsers contains
a list of the browsers that don't support graphics. Each browser
is separated by the "|" character, which represents alternation
in the regular expression we use later in the program. In this instance,
there are only two browsers listed, Lynx and www.
("CERN-LineMode" is the string the www browser
uses to identify itself.)
The HTTP_USER_AGENT
environment variable contains the name of the browser. All environment
variables that start with HTTP represent information
that is sent by the client. The server adds the prefix and sends
this data with the other information to the CGI program.
Now identify the files that you intend to return depending
on whether the browser supports graphics:
$graphic_document = "full_graphics.html";
$text_document = "text_only.html";
The variables $graphic_document and $text_document
contain the names of the two documents that we will use.
The next thing to do is simply to check if the browser name
is included in the list of non-graphic browsers.
if ($client_browser =~ /$nongraphic_browsers/) {
$html_document = $text_document;
} else {
$html_document = $graphic_document;
}
The conditional checks whether the client browser is one that
we know does not support graphics. If it is, the variable $html_document
will contain the name of the text-only version of the HTML
file. Otherwise, it will contain the name of the version of the
HTML document that contains graphics.
Finally, print the partial header and open the file. (We need
to get the document root from the
DOCUMENT_ROOT
variable and prepend it to the filename, so the Perl program can
locate the document in the file system.)
print "Content-type: text/html", "\n\n";
$document_root = $ENV{'DOCUMENT_ROOT'};
$html_document = join ("/", $document_root, $html_document);
if (open (HTML, "<" . $html_document)) {
while (<HTML>) {
print;
}
close (HTML);
} else {
print "Oops! There is a problem with the configuration on this system!", "\n";
print "Please inform the Webmaster of the problem. Thanks!", "\n";
}
exit (0);
If the filename stored in $html_document
can be opened for reading (as specified by the "<" character),
the while loop iterates through the file and
displays it. The open command creates a handle,
HTML, which is then used to access the file.
During the while loop, as Perl reads a line
from the HTML file handle, it places that line
in its default variable $_. The print
statement without any arguments displays the value stored in $_.
After the entire file is displayed, it is closed. If the file cannot
be opened, an error message is output.
Suppose you have a set of HTML
documents: one for users in your IP domain (e.g., bu.edu), and another
one for users outside of your domain. Why would anyone want to do
this, you may ask? Say you have a document containing internal company
phone numbers, meeting schedules, and other company information.
You certainly don't want everyone on the Internet to see this document.
So you need to set up some type of security to keep your documents
away from prying eyes.
You can configure most servers to restrict access to your
documents according to what domain the user connects from. For example,
under the NCSA server, you can list the domains which you want to
allow or deny access to certain directories by editing the
access.conf configuration file. However,
you can also control domain-based access in a CGI script. The advantage
of using a CGI script is that you don't have to turn away other
domains, just send them different documents. Let's look at a CGI
program that performs pseudo
authentication:
#!/usr/local/bin/perl
$host_address = 'bu\.edu';
$ip_address = '128\.197';
These two variables hold the IP domain name and address that
are considered local. In other words, users in this domain can access
the internal information. The period is "escaped" in both of these
variables (by placing a "\" before the character), because the variables
will be interpolated in a regular expression later in this program.
The "." character has a special significance in a regular expression;
it is used to match any character other than a newline.
$remote_address = $ENV{'REMOTE_ADDR'};
$remote_host = $ENV{'REMOTE_HOST'};
The environment variable
REMOTE_ADDR
returns the IP numerical address for the remote user, while REMOTE_HOST
contains the IP alphanumeric name for the remote user. There are
times when REMOTE_HOST will not return the name,
but only the address (if the DNS server does not have an entry for
the domain). In such a case, you can use the following snippet of
code to convert an IP address to its corresponding name:
@subnet_numbers = split (/\./, $remote_address);
$packed_address = pack ("C4", @subnet_numbers);
($remote_host) = gethostbyaddr ($packed_address, 2);
Don't worry about this code yet. We will discuss functions
like these in Chapter 9, Gateways, Databases, and Search/Index Utilities. Now, let's continue with the rest of this
program.
$local_users = "internal_info.html";
$outside_users = "general.html";
if (($remote_host =~ /\.$host_address$/) && ($remote_address =~ /^$ip_address/)) {
$html_document = $local_users;
} else {
$html_document = $outside_users;
}
The remote host is examined to see if it ends with the domain
name, as specified by the $host_address variable,
and the remote address is checked to make sure it starts with the
domain address stored in $ip_address. Depending
on the outcome of the conditional, the $html_document
variable is set accordingly.
print "Content-type: text/html", "\n\n";
$document_root = $ENV{'DOCUMENT_ROOT'};
$html_document = join ("/", $document_root, $html_document);
if (open (HTML, "<" . $html_document)) {
while (<HTML>) {
print;
}
close (HTML);
} else {
print "Oops! There is a problem with the configuration on this system!", "\n";
print "Please inform the Webmaster of the problem. Thanks!", "\n";
}
exit (0);
The specified document is opened and the information stored
within it is
displayed.
In addition to domain-based
security, most HTTP servers also support a more
complicated method of security, known as user authentication. When
configured for user authentication, specified files or directories
are set up to allow access only by certain users. A user attempting
to open the URLs associated with these files is prompted for a name
and password.
The user name and
password
(which, incidentally, need have no relation to the user's real user
name and password on any system) is checked by the server, and if
legitimate, the user is allowed access. In addition to allowing
the user access to the protected file, the server also maintains
the user's name and passes it to any subsequent CGI programs that
are called. The server passes the user name in the
REMOTE_USER environment variable.
A CGI script can therefore use server authentication information
to identify users.[1]
This isn't what user authentication was meant for, but if the information
is available, it can come in mighty handy. Here is a snippet of
code that illustrates what you can do with the REMOTE_USER
environment variable:
$remote_user = $ENV{'REMOTE_USER'};
if ($remote_user eq "jack") {
print "Welcome Jack, how is Jack Manufacturing doing these days?", "\n";
} elsif ($remote_user eq "bob") {
print "Hey Bob, how's the wife doing? I heard she was sick.", "\n";
}
.
.
.
Server authentication does not provide complete security:
Since the user name and password are sent unencrypted over the network,
it's possible for a "snoop" to look at this data. For that reason,
it's a bad idea to use your real login name and password for server
authentication.
Companies who provide services on the
Web often want to know from what server (or document) the remote
users came. For example, say you visit the server located at https://www.cgi.edu,
and then from there you go to https://www.flowers.com. A CGI program
on www.flowers.com can actually determine that you were previously
at www.cgi.edu.
How is this useful? For advertising, of course. If a company
determines that 90% of all users that visit them come from a certain
server, then they can perhaps work something out financially with
the webmaster at that server to provide advertising. Also, if your
site moves or the content at your site changes dramatically, you
can help avoid frustration among your visitors by informing the
webmasters at the sites referring to yours to change their links.
Here is a simple program that displays this "referral" information:
#!/usr/local/bin/perl
print "Content-type: text/plain", "\n\n";
$remote_address = $ENV{'REMOTE_ADDR'};
$referral_address = $ENV{'HTTP_REFERER'};
print "Hello user from $remote_address!", "\n";
print "The last site you visited was: $referral_address. Am I genius or what?", "\n";
exit (0);
The environment variable
HTTP_REFERER,
which is passed to the server by the client, contains the last site
the user visited before accessing the current server.
Now for the caveats. There are three important things you
need to remember before using the HTTP_REFERER
variable:
- First, not all browsers set this variable.
- Second, if a user accesses your server first, right
at startup, this variable will not be set.
- Third, if someone accesses your site via a bookmark
or just by typing in the URL, the referring document is meaningless.
So if you are keeping some sort of count to determine where users
are coming from, it won't be totally accurate.
|
|
|