It is actually a fairly simple process. Your CGI
script must be able to perform two tasks:
Decode the form data. Remember, all data in the
form will be URL encoded (let's ignore Netscape 2.0 multipart MIME
messages).
Open a pipe to mail (or sendmail),
and write the form data to the file.
Let's assume you have an associative array called $in
(for those of you using Steven Brenner's cgi-lib.pl
library, this should be familiar) that contains the form data. Here
is how you would deal with sendmail:
open (SENDMAIL, "| /usr/bin/sendmail -f$in{'from'} -t -n -oi");
print SENDMAIL <<End_of_Mail;
From: $in{'from'} <$in{'name'}>
To: $in{'to'}
Reply-To: $in{'from'}
Subject: $in{'subject'}
$in{'message'}
End_of_Mail
One thing you should note is the "Reply-To:"
header. Since the server is running as user "nobody," the mail headers
might be messed up (especially when people are trying to reply to
it). The "Reply-To:" field fixes that.
There are a lot of mail gateways in operation that use mail
in the following format:
open (MAIL, "| mail -s 'Subject' $in{'to'}");
^
|
+-- Possible security hole!!!!
If you don't check the $in{'to'} variable
for shell metacharacters, you're in for a major headache! For example,
if some malicious user enters the following:
you'll have a major problem on your hands.
Unfortunately, the mailto:
command is not supported by all browsers. If you have this command
in your document, it is a limiting factor, as people who use browsers
that do not support this do not have the ability to send you mail.
Perl has been ported to all the platforms that are
mentioned above. As a result, your Perl CGI program should be reasonably
portable. If you're are interfacing with various external programs
on the UNIX side, then it probably will not be
portable, but if you're just manipulating data, opening and reading
files, etc., you should have no problem.
In a CGI environment, STDERR
points to the server error log file. You can use this to your advantage
by outputting debug messages, and then checking the log file later
on.
Both STDIN and STDOUT
point to the browser. Actually, STDIN points
to the server that interprets the client (or browser's) request
and information, and sends that data to the script.
In order to catch errors, you can "dupe" STDERR
to STDOUT early on in your script (after outputting
the valid HTTP headers):
open (STDERR, ">&STDOUT");
This redirects all of the error messages to STDOUT
(or the browser).
Counter scripts tend to be very popular. The idea
behind a counter is very simple:
- Use a file to store the data
- Whenever someone visits the site, increment the
number in the file
Here is a simple counter script:
#!/usr/local/bin/perl
$counter = "/home/shishir/counter.dat";
print "Content-type: text/plain", "\n\n";
open (FILE, $counter) || die "Cannot read from the counter file.\n";
flock (FILE, 2);
$visitors = <FILE>;
flock (FILE, 8);
close (FILE);
$VISITORS++;
open (FILE, ">" . $counter) || die "Cannot write to counter file.\n";
flock (FILE, 2);
print FILE $visitors;
flock (FILE, 8);
close (FILE);
You can now use SSI (Server Side Includes) to display a counter
in your HTML document:
You are visitor number:
<!--#exec cgi="/cgi-bin/counter.pl-->
Here is a simple regular expression that will strip
HTML tags:
$line =~ s/<(([^>]|\n)*)>//g;
Or you can "escape" certain characters in an HTML
tag so that it can be displayed:
$line =~ s/<(([^>]|\n)*)>/<$1>/g;
You can use the environment
variable HTTP_USER_AGENT
to determine the user's browser.
[ From WWW FAQ ]
Five important environment variables are available to your
CGI script to help in identifying the end user.
- HTTP_FROM
-
This environment variable is, theoretically, set to the email
address of the user. However, many browsers do not set it at all,
and most browsers that do support it allow the user to set any value
for this variable. As such, it is recommended that it be used only
as a default for the reply email address in an email form.
- REMOTE_USER
-
This variable is only set if secure authentication was used
to access the script. The AUTH_TYPE variable
can be checked to determine what form of secure authentication was
used. REMOTE_USER will then contain the name
the user authenticated under. Note that REMOTE_USER
is only set if authentication was actually used, and is not supported
by all web servers. Authentication may unexpectedly fail to happen
under the NCSA server if the method used for
the transaction is not listed in the access.conf
file (i.e., <Limit GET POST>
should be set rather than the default, <Limit GET>).
- REMOTE_IDENT
-
This variable is set if the server has contacted an IDENTD
server on the client machine. This is a slow operation, usually
turned off in most servers, and there is no way to ensure that the
client machine will respond honestly to the query, if it responds
at all.
- REMOTE_HOST
-
This variable will not identify the user specifically, but
does provide information about the site the user has connected from,
if the hostname was retrieved by the server. In the absence of any
certainty regarding the user's precise identity, making decisions
based on a list of trusted addresses is sometimes an adequate workaround.
This variable is not set if the server failed to look up the hostname
or skipped the lookup in the interest of speed; see REMOTE_ADDR
below. Also keep in mind that you may see all users of a particular
proxy server listed under one hostname.
- REMOTE_ADDR
-
This variable will not identify the user specifically, but
does provide information about the site the user has connected from.
REMOTE_ADDR will contain the dotted-decimal IP
address of the client. In the absence of any certainty regarding
the user's precise identity, making decisions based on a list of
trusted addresses is sometimes an adequate workaround. This variable
is always set, unlike REMOTE_HOST, above. Also
keep in mind that you may see all users of a particular proxy server
listed under one address.
[ End of info from WWW FAQ ]
If you configure your server so that it recognizes
that all files in a specific directory (i.e., /cgi-bin),
or files with certain extensions (i.e., .pl,
.tcl, .sh, etc.) are CGI
programs, then it will execute the programs. There is no way for
users to see the script itself.
On the other hand, if you allow people to look at your script
(by placing it, for example, in the document root directory), it
is not a security problem, in most cases.
No, your CGI scripts can access files outside the server and
document root directories, unless the server is running in a chroot-ed
environment.
No! The forms interface allows
you to have a "password" field, but it should not be used for anything
highly confidential. The main reason for this is that form data
gets sent from the browser to the Web server as plain text, and
not as encrypted data.
If you want to solicit secure information, you need to purchase
a secure server, such as Netscape's Commerce Server
(https://home.netscape.com/comprod/netscape_commerce.html).
You can have your CGI script determine whether your script is being accessed by Netscape:
$browser = $ENV{'HTTP_USER_AGENT'};
if ($browser =~ /Mozilla/) {
#
# Netscape
#
} else {
#
# Non Netscape
#
}
This has to do with the way the standard output
is buffered. In order for the output to display in the correct order,
you need to turn buffering off by using the $| variable:
You can access the environment variables through
the %ENV associative array. Here is a simple
script that dumps out all of the environment variables (sorted):
#!/usr/local/bin/perl
print "Content-type: text/plain", "\n\n";
foreach $key (sort keys %ENV) {
print $key, " = ", $ENV{$key}, "\n";
}
exit (0);
If you send a MIME content type
of HTML, you will have to "escape" certain characters,
such as "<," "&," and ">", or else the browser will think
it is HTML.
You have to escape the characters by using the following construct:
Here is a simple script that you can run on the command line
that will give you the ASCII code for non-alphanumeric
characters:
#!/usr/local/bin/perl
print "Please enter a string: ";
chop ($string = <STDIN>);
$string =~ s/([^\w\s])/sprintf ("&#%d;", ord ($1))/ge;
print "The escaped string is: $string\n";
exit (0);
This most likely is due to permission problems.
Remember, your server is probably running as "nobody," "www," or
a process with very minimal privileges. As a result, it will not
be able to execute your script unless it has permission to do so.
Again, this has to do with permissions! The server
cannot write to a file in a certain directory if it does not have
permission to do so.
You should make it a point to check for error status from
the open command:
print "Content-type: text/plain\n\n";
.
.
.
open (FILE, ">" . "/some/dir/some.file") ||
print "Cannot write to the data file!";
.
.
.
You can use the CGI::MiniSvrmodule (https://www-genome.wi.mit.edu/ftp/pub/
software/WWW/CGIperl/docs/MiniSvr.pm.html) to keep state
between multiple entry points.
Or you can create a series of dynamic documents that pass
a unique session identification (either as a query, an extra path
name, or as a hidden field) to each other.
It's difficult to debug a CGI script. You can emulate
a server by setting environment variables manually:
setenv HTTP_USER_AGENT "Mozilla/2.0b6" (csh)
or
export HTTP_USER_AGENT = "Mozilla/2.0b6" (ksh, bash)
You can emulate a POST request by placing
the data in a file and piping it to your program:
cat data.file | some_program.pl
Or, you can use CGI Lint, which will automate some of this.
It will also check for potential security problems, errors in open
( ), and invalid HTTP headers.
You can call a CGI program by simply opening the URL to it:
https://some.machine/cgi-bin/your_program.pl
You can also have a link in a document, such as:
<A HREF="https://some.machine/cgi-bin/your_program.pl">
Click here to access my CGI program</A>
Why people do this, I don't know. But, you can check
the information from all the fields and return a "No Response" if
any of them are empty. Here is an example (assume the associative
array $in contains your form information):
$error = 0;
foreach $value (values %in) {
$value =~ s/\s//g;
$error = 1 unless ($value);
}
if ($error) {
print "Content-type: text/plain\n";
print "Status: 204 No Response\n\n";
print "You should only see this message if your browser does";
print "not support the status code 204\n";
} else {
#
# Process Data Here
#
}
A CGI program can send specific response codes to
the server, which in turn will send them to the browser. For example,
if you want a "No Response" (meaning that the browser will not load
a new page), you need to send a response code of 204 (see the answer
to the last question).
A CGI program can only send one
Location
header. You also cannot send a MIME content type
if you want the server to perform redirection. For example, this
is not valid, though it may work with some servers:
#!/usr/local/bin/perl
.
.
.
print "Content-type: text/plain\n"
print "Location: https://some.machine/some.doc\n\n"";
How can I automatically include a:
line at the bottom of all my HTML pages? Or can
I only do that for SSI pages? How do I get the date of the CGI script?
If you are dynamically creating documents
using CGI, you can insert a time stamp pretty easily. Here is an
example in Perl 5:
$last_updated = localtime (time);
print "Last updated: $last_updated\n";
or in Perl 4:
require "ctime.pl";
$last_updated = &cmtime (time);
print "Last updated: $last_updated\n";
or even:
$date = `/usr/local/bin/date`;
print "Last updated: $last_updated\n";
You can accomplish this with SSI like this:
<--#echo var="LAST_MODIFIED"-->
Each language has its own advantages and disadvantages.
I'm sure you've heard this many times: It depends on what you're
trying to do. If you are writing a CGI program that's going to be
accessed thousands of times in an hour, then you should write it
in C or C++. If you are looking for a quick solution (as far as
implementation), then Perl is the way to go!
You should generally avoid the shell for any type of CGI programming,
just because of the potential for security problems.