mod_perl does away with mod_cgi's forking by
embedding the Perl interpreter into Apache's child
processes, thus avoiding the forking mod_cgi needed to run Perl
programs. In this new model, the child process
doesn't exit when it has processed a request. The
Perl interpreter is loaded only once, when the process is started.
Since the interpreter is persistent throughout the
process's lifetime, all code is loaded and compiled
only once, the first time it is needed. All subsequent requests run
much faster, because everything is already loaded and compiled.
Response processing is reduced to simply running the code, which
improves response times by a factor of 10-100, depending on the code
being executed.
But Doug's real accomplishment was adding a mod_perl
API to the Apache core. This made it possible to write complete
Apache modules in Perl, a feat that used to require coding in C. From
then on, mod_perl enabled the programmer to handle all phases of
request processing in Perl.
To provide backward compatibility for plain CGI scripts that used to
be run under mod_cgi, while still benefiting from a preloaded Perl
interpreter and modules, a few special handlers were written, each
allowing a different level of proximity to pure mod_perl
functionality. Some take full advantage of mod_perl, while others do
not.
mod_perl embeds a copy of the Perl interpreter into the Apache
httpd executable, providing complete access to
Perl functionality within Apache. This enables a set of
mod_perl-specific configuration directives, all of which start with
the string Perl. Most, but not all, of these
directives are used to specify handlers for various phases of the
request.
1.3.1. Running CGI Scripts with mod_perl
Since many web application developers are interested in the content
delivery phase and come from a CGI background, mod_perl includes
packages designed to make the transition from CGI simple and
painless. Apache::PerlRun and
Apache::Registry run unmodified CGI scripts,
albeit much faster than mod_cgi.[10]
[10]Apache::RegistryNG and
Apache::RegistryBB are two new experimental
modules that you may want to try as well.
The difference between
Apache::Registry
and
Apache::PerlRun is that
Apache::Registry caches all scripts, and
Apache::PerlRun doesn't. To
understand why this matters, remember that if one of
mod_perl's benefits is added speed, another is
persistence. Just as the Perl interpreter is loaded only once, at
child process startup, your scripts are loaded and compiled only
once, when they are first used. This can be a double-edged sword:
persistence means global variables aren't reset to
initial values, and file and database handles aren't
closed when the script ends. This can wreak havoc in badly written
CGI scripts.
Whether you should use Apache::Registry or
Apache::PerlRun for your CGI scripts depends on
how well written your existing Perl scripts are. Some scripts
initialize all variables, close all file handles, use taint mode, and
give only polite error messages. Others don't.
Apache::Registry compiles scripts on first use and
keeps the compiled scripts in memory. On subsequent requests, all the
needed code (the script and the modules it uses) is already compiled
and loaded in memory. This gives you enormous performance benefits,
but it requires that scripts be well behaved.
Apache::PerlRun, on the other hand, compiles
scripts at each request. The script's namespace is
flushed and is fresh at the start of every request. This allows
scripts to enjoy the basic benefit of mod_perl (i.e., not having to
load the Perl interpreter) without requiring poorly written scripts
to be rewritten.
A typical problem
some developers encounter
when porting from mod_cgi to Apache::Registry is
the use of uninitialized global variables. Consider the following
script:
use CGI;
$q = CGI->new( );
$topsecret = 1 if $q->param("secret") eq 'Muahaha';
# ...
if ($topsecret) {
display_topsecret_data( );
}
else {
security_alert( );
}
This script will always do the right thing under mod_cgi: if
secret=Muahaha is supplied, the top-secret data
will be displayed via display_topsecret_data( ),
and if the authentication fails, the security_alert(
) function will be called. This works only because under
mod_cgi, all globals are undefined at the beginning of each request.
Under Apache::Registry, however, global variables
preserve their values between requests. Now imagine a situation where
someone has successfully authenticated, setting the global variable
$topsecret to a true value. From now on, anyone
can access the top-secret data without knowing the secret phrase,
because $topsecret will stay true until the
process dies or is modified elsewhere in the code.
This is an example of sloppy code. It will do the right thing under
Apache::PerlRun, since all global variables are
undefined before each iteration of the script. However, under
Apache::Registry and mod_perl handlers, all global
variables must be initialized before they can be used.
The example can be fixed in a few ways. It's a good
idea to always use the strict mode, which requires
the global
variables to be declared before they are used:
use strict;
use CGI;
use vars qw($top $q);
# init globals
$top = 0;
$q = undef;
# code
$q = CGI->new( );
$topsecret = 1 if $q->param("secret") eq 'Muahaha';
# ...
But of course, the simplest solution is to avoid using globals where
possible. Let's look at the example rewritten
without globals:
use strict;
use CGI;
my $q = CGI->new( );
my $topsecret = $q->param("secret") eq 'Muahaha' ? 1 : 0;
# ...
The last two versions of the example will run perfectly under
Apache::Registry.
Here is another example that won't work correctly
under Apache::Registry. This example presents a
simple search engine script:
use CGI;
my $q = CGI->new( );
print $q->header('text/plain');
my @data = read_data( )
my $pat = $q->param("keyword");
foreach (@data) {
print if /$pat/o;
}
The example retrieves some data using read_data( )
(e.g., lines in the text file), tries to match the keyword submitted
by a user against this data, and prints the matching lines. The
/o regular expression modifier is used to compile
the regular expression only once, to speed up the matches. Without
it, the regular expression will be recompiled as many times as the
size of the @data array.
Now consider that someone is using this script to search for
something inappropriate. Under Apache::Registry,
the pattern will be cached and won't be recompiled
in subsequent requests, meaning that the next person using this
script (running in the same process) may receive something quite
unexpected as a result. Oops.
The proper solution to this problem is discussed in Chapter 6, but Apache::PerlRun
provides an immediate workaround, since it resets the regular
expression cache before each request.
So why bother to keep your code clean? Why not use
Apache::PerlRun all the time? As we mentioned
earlier, the convenience provided by
Apache::PerlRun comes at a price of performance
deterioration.
In Chapter 9, we show in detail how to
benchmark the code and server
configuration. Based on the results of the benchmark, you can tune
the service for the best performance. For now, let's
just show the benchmark of the short script in Example 1-6.
Example 1-6. readdir.pl
use strict;
use CGI ( );
use IO::Dir ( );
my $q = CGI->new;
print $q->header("text/plain");
my $dir = IO::Dir->new(".");
print join "\n", $dir->read;
The script loads two modules (CGI and
IO::Dir), prints the HTTP header, and prints the
contents of the current directory. If we compare the performance of
this script under mod_cgi, Apache::Registry, and
Apache::PerlRun, we get the following results:
Mode Requests/sec
-------------------------------
Apache::Registry 473
Apache::PerlRun 289
mod_cgi 10
Because the script does very little, the performance differences
between the three modes are very significant.
Apache::Registry thoroughly outperforms mod_cgi,
and you can see that Apache::PerlRun is much
faster than mod_cgi, although it is still about twice as slow as
Apache::Registry. The performance gap usually
shrinks a bit as more code is added, as the overhead of
fork( ) and code compilation becomes less
significant compared to execution times. But the benchmark results
won't change significantly.
Jumping ahead, if we convert the script in Example 1-6 into a mod_perl handler, we can reach 517
requests per second under the same conditions, which is a bit faster
than Apache::Registry. In Chapter 13, we discuss why running the code under the
Apache::Registry handler is a bit slower than
using a pure mod_perl content handler.
It can easily be seen from this benchmark that
Apache::Registry is what you should use for your
scripts to get the most out of mod_perl. But
Apache::PerlRun is still quite useful for making
an easy transition to mod_perl. With
Apache::PerlRun, you can get a significant
performance improvement over mod_cgi with minimal effort.
Later, we will see that
Apache::Registry's caching
mechanism is implemented by compiling each script in its own
namespace. Apache::Registry builds a unique
package name using the script's name, the current
URI, and the current virtual host (if any).
Apache::Registry prepends a
packagestatement to your script, then compiles it
using Perl's eval function. In
Chapter 6, we will show how exactly this is done.
What happens if you modify the script's file after
it has been compiled and cached? Apache::Registry
checks the file's last-modification time, and if the
file has changed since the last compile, it is reloaded and
recompiled.
In case of a compilation or execution error, the error is logged to
the server's error log, and a server error is
returned to the client.