The following
example plots the number of Web server accesses for every hour as
a histogram. The program parses through the server log file, keeping
track of the accesses for each hour of the day in an array. The
information stored in this array is written to a file in a format
that gnuplot can understand. We then call gnuplot
to graph the data in the file and output the resulting graphic to
a file.
#!/usr/local/bin/perl
$webmaster = "shishir\@bu\.edu";
$gnuplot = "/usr/local/bin/gnuplot";
$ppmtogif = "/usr/local/bin/pbmplus/ppmtogif";
$access_log = "/usr/local/bin/httpd_1.4.2/logs/access_log";
The gnuplot utility, as of version v3.5,
cannot produce GIF images, but can output PBM (portable bitmap)
format files. We'll use the ppmtogif utility
to convert the output image from PBM to GIF. The $access_log
variable points to the NCSA server log file,
which we'll parse.
$process_id = $$;
$output_ppm = join ("", "/tmp/", $process_id, ".ppm");
$datafile = join ("", "/tmp/", $process_id, ".txt");
These variables are used to store the temporary files. The
$$ variable refers to the number of the process
running this program, as it does in a shell script. I don't care
what process is running my program, but I can use the number to
create a filename that I know will be unique, even if multiple instances
of my program run. (Use of the process number for this purpose is
a trick that shell programmers have used for decades.) The process
identification is prefixed to each filename.
$x = 0.6;
$y = 0.6;
$color = 1;
The size of the plot is defined to be 60% of the original
image in both the x and y directions. All lines in the graph will
be red (indicated by a value of 1).
if ( open (FILE, "<" . $access_log) ) {
for ($loop=0; $loop < 24; $loop++) {
$time[$loop] = 0;
}
We open the NCSA server access log for
input. The format of each entry in the log is:
host rfc931 authuser [DD/Mon/YY:hh:mm:ss] "request" status_code bytes
where:
- host is either
the DNS name or the IP address of the remote client
- rfc931 is the remote user (only
if rfc931 authentication is enabled)
- authuser is the remote user
(only if NCSA server authentication is enabled)
- DD/Mon/YY is the day, month,
and year
- hh:mm:ss is 24-hour-based time
- "request" is the first line
of the HTTP request
- status_code is the status identification
returned by the server
- bytes is the total number of
bytes sent (not including the HTTP header)
A 24-element array called @time is initialized.
This array will contain the number of accesses for each hour.
while (<FILE>) {
if (m|\[\d+/\w+/\d+:([^:]+)|) {
$time[$1]++;
}
}
close (FILE);
In case you didn't believe me when I said in Chapter 1 that Perl offered
superb facilities for CGI programming,
this tiny loop contains some proof of what I'm talking about. The
regular expression (containing some enhancements that only Perl
offers) neatly picks the hour out of the date/time string in the
access log by searching for the pattern "[DD/Mon/YY:h:", as follows:
Back to the program. If a line matches the pattern, the array
element corresponding to the particular hour is incremented.
The subroutine create_output_file is
called to create and display the plot.
} else {
&return_error (500, "Server Log File Error", "Cannot open NCSA server access log!");
}
exit(0);
If the log file can't be opened, thereturn_error
subroutine is called
to output an error.
The create_output_file subroutine is
now defined. It creates a data file consisting of the information
in the @time array.
sub create_output_file
{
local ($loop);
if ( (open (FILE, ">" . $datafile)) ) {
for ($loop=0; $loop < 24; $loop++) {
print FILE $loop, " ", $time[$loop], "\n";
}
close (FILE);
&send_data_to_gnuplot();
} else {
&return_error (500, "Server Log File Error", "Cannot write to data file!");
}
}
The file specified by the variable $datafile
is opened for output. The hour and the number of accesses for that
hour are written to the file. The hour represents the x coordinate,
while the number of accesses represents the y coordinate. The subroutine
send_data_to_gnuplot is called to execute gnuplot.
sub send_data_to_gnuplot
{
open (GNUPLOT, "|$gnuplot");
print GNUPLOT <<gnuplot_Commands_Done;
We're going to use the same technique we've used throughout
the chapter to embed a "language" within a Perl script: We'll open
a pipe to a program and write out commands in the language recognized
by the program. The open command starts gnuplot,
and the print command sends the data to gnuplot
through the pipe.
set term pbm color small
set output "$output_ppm"
set size $x, $y
set title "WWW Server Usage"
set xlabel "Time (Hours)"
set ylabel "No. of Requests"
set xrange [-1:24]
set xtics 0, 2, 23
set noxzeroaxis
set noyzeroaxis
set border
set nogrid
set nokey
plot "$datafile" w boxes $color
gnuplot_Commands_Done
close (GNUPLOT);
Let's take a closer look at the commands that we send to gnuplot
through the pipe. The set term command sets
the format for the output file. In this case, the format is a color
PBM file with a small font for titles. You can even instruct gnuplot
to produce text graphs by setting the term
to "dumb."
The output file is set to the filename stored in the variable
$output_ppm. The size of the image is set using
the size command. The title of the graph and
the labels for the x and y axes are specified with the title,
xlabel, and ylabel commands,
respectively. The range on the x axis is -1 to 24. Even though we
are dealing with data from 0 to 23 hours, the range is increased
because gnuplot graphs data near the axes abnormally.
The tick marks on the x axis range from 0 to 23 in increments of
two. The line representing the y axis is removed by the noyzeroaxis
command, which makes the graph appear neater. The same is true for
the noxzeroaxis command.
The graph is drawn with a border, but without a grid or a
legend. Finally, the plot command graphs the
data in the file specified by the $datafile
variable with red boxes. Several different types of graphs are possible;
instead of boxes, you can try "lines" or "points."
&print_gif_file_and_cleanup();
}
The print_gif_file_and_cleanup subroutine
displays this image, and removes the temporary files.
sub print_gif_file_and_cleanup
{
$| = 1;
print "Content-type: image/gif", "\n\n";
system ("$ppmtogif $output_ppm 2> /dev/null");
unlink $output_ppm, $datafile;
}
The system
command executes the ppmtogif utility to convert
the PBM image to GIF. This utility writes the output directly to
standard output.
You might wonder what the 2> signifies.
Like most utilities, ppmtogif
prints some diagnostic information to standard error when transforming
the image. The 2> redirects standard error
to the null device (/dev/null), basically throwing
it away.
Finally, we use the unlink command to
remove the temporary files that we've created.
The image produced by this program is shown in
Figure 6.5.