Earlier in this chapter we mentioned
the application/x-www-form-urlencoded MIME
type. The browser uses this MIME type to encode
the form data.
First, each form element's name--specified by the NAME
attribute--is equated with the value entered by the user to create
a key-value pair. For example, if the user entered "30" when asked
for the age, the key-value pair would be (age=30). Each key-value
pair is separated by the "
&" character.
Second, since the variable names for the form element and
the actual form data are standard text, it is possible this text
could consist of characters that will confuse browsers. To prevent
possible errors, the encoding scheme translates all "special" characters
to their corresponding
hexadecimal codes. These "special" characters include
control characters and certain alphanumeric symbols. For example,
the string "Thanks for the help!" would be converted to "Thanks%20for%20the%20help%21".
This process is repeated for each key-value pair to create a query
string.[1]
For text and password fields, the user input will represent
the value. If no information was entered, the key-value pair will
be sent anyway, with the value left blank (i.e., "name=").
For radio buttons and checkboxes, the VALUE
attribute represents the value when the button element is checked.
If no VALUE is specified, the value defaults
to "on." An unchecked checkbox will not be sent as a key-value pair;
it will be ignored.
The CGI program then has to "decode" this information in order
to access the form data. The encoding scheme is the same for both
GET and POST.
There
are two methods for sending form data: GET and
POST. The main difference between these methods
is the way in which the form data is passed to the CGI program.
If the GET method is used, the query string is
simply appended to the URL of the program when the client issues
the request to the server. This query string can then be accessed
by using the environment variable
QUERY_STRING.
Here is a sample GET request by the client, which
corresponds to the first form example:
GET /cgi-bin/program.pl?user=Larry%20Bird&age=35&pass=testing HTTP/1.0
Accept: www/source
Accept: text/html
Accept: text/plain
User-Agent: Lynx/2.4 libwww/2.14
As we discussed in Chapter 2, the query string is appended
to the URL after the "?" character.[2]
The server then takes this string and assigns it to the environment
variable QUERY_STRING.
The GET method has both advantages and
disadvantages. The main advantage is that you can access the CGI
program with a query without using a form. In other words, you can
create "
canned queries." Basically,
you are passing parameters to the program. For example, if you want
to send the previous query to the program directly, you can do this:
<A HREF="/cgi-bin/program.pl?user=Larry%20Bird&age=35&pass=testing">CGI
Program</A>
Here is a simple program that will aid you in encoding data:
#!/usr/local/bin/perl
print "Please enter a string to encode: ";
$string = <STDIN>;
chop ($string);
$string =~ s/(\W)/sprintf("%%%x", ord($1))/eg;
print "The encoded string is: ", "\n";
print $string, "\n";
exit(0);
This is not a CGI program; it is meant to be run from the
shell. When you run the program, the program will prompt you for
a string to encode. The
<STDIN>
operator reads one line from standard input. It is similar to the
<FILEHANDLE> construct we have been using. The chop
command removes the trailing newline character ("\n") from the input
string. Finally, the user-specified string is converted to a hexadecimal
value with the sprintf command, and printed
out to standard output.
A query is one method of passing information to a CGI program
via the URL. The other method involves sending extra path information
to the program. Here is an example:
<A HREF="/cgi-bin/program.pl/user=Larry%20Bird/age=35/pass=testing>CGI Program</A>
The string "/user=Larry%20Bird/age=35/pass=testing" will be
placed in the environment variable
PATH_INFO
when the request gets to the CGI program. This method of passing
information to the CGI program is generally used to provide file
information, rather than form data. The NCSA imagemap program works
in this manner by passing the filename of the selected image as
extra path information.
If you use the "question-mark" method or the pathname method
to pass data to the program, you have to be careful, as the browser
or the server may truncate data that exceeds an arbitrary number
of characters.
Now, here is a sample POST request:
POST /cgi-bin/program.pl HTTP/1.0
Accept: www/source
Accept: text/html
Accept: text/plain
User-Agent: Lynx/2.4 libwww/2.14
Content-type: application/x-www-form-urlencoded
Content-length: 35
user=Larry%20Bird&age=35&pass=testing
The main advantage to the POST method is
that query length can be unlimited-- you don't have to worry about
the client or server truncating data. To get data sent by the POST
method, the CGI program reads from standard input. However, you
cannot create "canned queries."
In order
to access the information contained within the form, a decoding
protocol must be applied to the data. First, the program must determine
how the data was passed by the client. This can be done by examining
the value in the environment variable REQUEST_METHOD.
If the value indicates a GET request, either
the query string or the extra path information must be obtained
from the environment variables. On the other hand, if it is a POST
request, the number of bytes specified by the
CONTENT_LENGTH environment
variable must be read from standard input. The algorithm for decoding
form data follows:
- Determine request protocol (either
GET or POST) by checking the
REQUEST_METHOD environment variable.
- If the protocol is GET, read
the query string from QUERY_STRING and/or the
extra path information from PATH_INFO.
- If the protocol is POST, determine
the size of the request using CONTENT_LENGTH
and read that amount of data from the standard input.
- Split the query string on the "&" character,
which separates key-value pairs (the format is key=value&key=value...).
- Decode the hexadecimal and "+" characters in each
key-value pair.
- Create a key-value table with the key as the index.
(If this sounds complicated, don't worry, just use a high-level
language like Perl. The language makes it pretty easy.)
You might wonder why a program needs to check the request
protocol, when you know exactly what type of request the form is
sending. The reason is that by designing the program in this manner,
you can use one module that takes care of both types of requests.
It can also be beneficial in another way.
Say you have a form that sends a POST request,
and a program that decodes both GET and POST
requests. Suppose you know that there are three fields: user, age,
and pass. You can fill out the form, and the client will send the
information as a POST request. However, you can
also send the information as a query string because the program
can handle both types of requests; this means that you can save
the step of filling out the form. You can even save the complete
request as a hotlist item, or as a link on another page.