CGI Programming Guide - [Chapter 9] 9.3 Relational Databases

9.3 Relational Databases

So far in this chapter, we have created pretty simple gateways by piping input to other programs. As long as we create the proper stream of data, it takes very little coding on our part to make these work. But the most interesting uses for gateways involve large, complex sets of data organized into structured databases. Piping a stream does not work for these; we need to use a language that the database understands, such as SQL. This is where we turn in this section.

By combining the power of relational database management systems (RDBMS) with the World Wide Web, one can produce impressive results. Put yourself in the shoes of a doctor who wants to establish an "interactive, virtual infirmary" on the Web, where users (patients) can simply enter their symptoms, and the CGI would return with a diagnosis. The doctor has a large database that contains extensive data, including three-dimensional graphics and multimedia, on the various diseases and ailments that affect humans. All that needs to be done is to write a CGI program that decodes the input, assembles a query of some sort, sends it to the database, processes the return data from the database, and creates a hypertext document (with embedded multimedia) for the user. These types of applications are possible by combining databases with the Web.

Before we go any further, let's look at SQL, the query language used to communicate with various RDBMS systems.

Introduction to SQL

SQL--pronounced "S Q L" and not "Sequel"--is a standardized sub-language to access and manipulate data within a relational database system. The original SQL prototype defined a "structured" language, thus the term Structured Query Language, but this is no longer true of the current SQL-92 standard. SQL was designed specifically to be used in conjunction with a primary high-level programming language. In fact, most of the basic constructs that you would find in a high-level language, such as loops and conditionals, do not exist in SQL.

Most of the commercial relational database systems in use today, such as Oracle and Sybase, support SQL. As a result, the code to access and manipulate a database can be ported easily and quickly to any platform. Now, let's look at SQL.

Creating a database

We will start out by discussing how a database is created. Suppose you have the following information:

Player

Years

Points

Rebounds

Assists

Championships

Larry Bird

12

28

10

7

3

Michael Jordan

10

33

6

5

3

Magic Johnson

12

22

7

12

5

John Stockton

10

16

3

13

0

Karl Malone

10

25

11

3

0

Shaquille O'Neal

2

29

12

3

0

The SQL code to create this database is:

create table Player_Info
(
    Player            character varying (30) not null,
    Years             integer,
    Points            integer,
    Rebounds          integer,
    Assists           integer,
    Championships     integer
);

The create table command creates a database, or a table. The Player field is stored as a non-null varying character string. In other words, if the data in the field is less than 30 characters, the database will not pad it with spaces, as it would for a regular character data type. Also, the database forces the user to enter a value for the Player field; it cannot be empty.

The rest of the fields are defined to be integers. Some of the other valid data types include date, time, smallint, numeric, and decimal. The numeric and decimal data types allow you to specify floating-point values. For example, if you want a five-digit floating-point number with a precision to the hundredth place, you can specify decimal (5, 2).

Accessing data

Let's say you want a list of the entire database. You can use the following code:

select * 
    from Player_Info;

The select command retrieves specific information from the database. In this case, all columns are selected from the Player_Info database. The "*" should be used with great caution, especially on large databases, as you might inadvertently extract a lot of information. Notice that we are dealing only with columns, and not rows. For example, if you wanted to list all the players in the database, you could do this:

select Player
    from Player_Info;

Now, what if you want to list all the players who scored more than 25 points? Here is the code needed to accomplish the task:

select *
    from Player_Info
    where (Points > 25);

This would list all the columns for the players who scored more than 25 points:

Player

Years

Points

Rebounds

Assists

Championships

Larry Bird

12

28

10

7

3

Michael Jordan

10

33

6

5

3

Shaquille O'Neal

2

29

12

3

0

But, say you wanted to list just the Player and Points columns:

select Player, Points
    from Player_Info
    where (Points > 25);

Here is an example that returns all the players who scored more than 25 points and won a championship:

select Player, Points, Championships
    from Player_Info
    where (Points > 25) and
          (Championships > 0);

The output of this SQL statement would be:

Player

Points

Championships

Larry Bird

28

3

Michael Jordan

33

3

You could also use wildcards in a select command. For example, the following will return all the players that have a last name of "Johnson":

select *
    from Player_Info
    where Player LIKE '% Johnson';

This will match a string ending with "Johnson".

Updating a database

Let's suppose that Shaquille O'Neal won a championship. We need to update our database to reflect this. This is how it can be done:

update Player_Info
    set Championships = 1
    where Player = 'Shaquille O'Neal';

SQL also has methods to modify entire columns. After every basketball season, we need to increment the Years column by one:

update Player_Info
    set Years = (Years + 1);

Now, let's discuss insertion into a table. Say we need to add another player to the database. We could do it this way:

insert into Player_Info
    values
    ('Hakeem Olajuwon', 10, 27, 11, 4, 2);

As you can see, it is very simple to insert an element into the table. However, if you have a database with a large number of columns, and you want to insert a row into the table, you can manually specify the columns:

insert into Player_Info
    (Player, Years, Points, Rebounds, Assists, Championships)
    values
    ('Hakeem Olajuwon', 10, 27, 11, 4, 2);

When used in this context, the order of the fields does not necessarily have to match the order in the database, as long as the fields and the values specified match each other.

Deleting information

If you wanted to delete "John Stockton" from the database, you could do this:

delete from Player_Info
    where Player = 'John Stockton';

SQL also allows you remove entire columns. You should be very careful when attempting such a move. Instead, it is much safer to create another database, and copy only the columns you want to the new database. Here is how you would remove a column:

alter table Player_Info
    drop column Championships;

If you want to delete all the records in the table, the following statement is used:

delete from Player_Info;

And finally, the drop table command deletes the entire database:

drop table Player_Info;

(For more information on SQL, see the reference guide on SQL-92 at https://sunsite.doc.ic.ac.uk/packages/perl/db/refinfo/sql2/sql1992.txt).

Sprite

Never heard of Sprite? That is because I developed it for this book. It is a Perl 5 module that allows you to manipulate text-delimited databases (all data and delimiters are text) using a small but important subset of SQL-92. I offer Sprite so you can create your own databases and access them in CGI scripts, even if you do not have a database product like Sybase or Oracle. See Appendix E, Applications, Modules, Utilities, and Documentation for information on where you can get Sprite.

If you do have a commercial product, you can use techniques like those shown here to issue SQL commands. We will use some Perl interfaces to Oracle and Sybase later in the chapter. Let's look at an example.

Employee database

Let's assume that you have a text file that contains a list of your company's employees, as well as some information about them:

Last,First,Job_Title,Department,EMail,Phone
Supra,John,System Operator,Systems,jsupra,(617) 555-1578
Painton,Todd,Network Engineer,Systems,tpainton,(617) 555-6530
Martin,Robert,Sales Representative,Sales,martinr,(617) 555-7406
Levine,Julia,Administrative Assistant,Administration,julia,(617) 555-3056
Keenan,Jeff,Manager,Software,jeffk,(617) 555-7769
Nets,Laurie,Group Leader,Development,lnets,(617) 555-9962

The first line of the file contains the field names (delimited by commas). This is all you need to use the database. Unlike other databases that store the data in a unique (and strange) format, Sprite operates on plain text.

Here is the form that will act as the front end to the database:

<HTML>
<HEAD><TITLE>CGI Corporation</TITLE></HEAD>
<BODY>
<H1>Employee Database</H1>
Welcome to the CGI Corporations's Employee Search Form. You can use
this to find information about one of our employee. 
Enter as much information as possible to narrow down the search. 
<HR>
<FORM ACTION="/cgi-bin/db_phone.pl" METHOD="POST">
<PRE>
Last Name:      <INPUT TYPE="text" NAME="Last" SIZE=40>
First Name:     <INPUT TYPE="text" NAME="First" SIZE=40>
Job Title:      <INPUT TYPE="text" NAME="Job_Title" SIZE=40>
Department:     <INPUT TYPE="text" NAME="Department" SIZE=40>
EMail Address:  <INPUT TYPE="text" NAME="EMail" SIZE=40>
Phone Number:   <INPUT TYPE="text" NAME="Phone" SIZE=40>
</PRE>
<INPUT TYPE="submit" VALUE="Submit the search">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The form is shown in Figure 9.5.

Figure 9.5: Phone form

[Graphic: Figure 9-5]

Now, let's build the CGI application that will decode the form information, process the user's query, and create a document displaying the results, as seen in Figure 9.6.

Figure 9.6: CGI gateway to database

[Graphic: Figure 9-6]

The program begins:

#!/usr/local/bin/perl5
use Sprite;
$webmaster = "shishir\@bu\.edu";
$query = undef;

The use command instructs Perl to load the module (or extension). You can load more than one module at a time. For example, if we wanted to create dynamic GIF images from the data contained in a database, we would have to load both the GD and the Sprite modules:

use GD;
use Sprite;

To continue with the program:

&parse_form_data(*FORM);
$fields = '(Last|First|Job_Title|Department|EMail|Phone)';

The form data is decoded. The parse_form_data subroutine used in this program is the one we've been using throughout this book. The fields variable contains a list of all the fields in the form. You might wonder why we would need to have such a list when then the parse_form_data subroutine decodes all the fields in the form. The reason for this is to make sure that only valid fields are processed, as the search query is dynamically created from the user-specified information. Remember, forms are very insecure; a cracker can download a form, edit it, add an extra field, and submit the form to the program. If the program is not carefully designed, we could have a major problem!

foreach $key (keys %FORM) {
    if ( ($key !~ /\b$fields\b/o) || ($FORM{$key} =~ /[^\w\-\(\) ]/) ) {
        &return_error (500, "CGI Corporation Employee Database Error",
                            "Invalid Information in Form.");

The foreach construct iterates through all of the fields stored in the FORM associative array, and checks for two things, represented by the two expressions separated by the || operator. First, the field is checked against the list stored in the fields variable for validity. Second, it makes sure the information entered by the user is constrained to the following characters: A-Z, a-z, 0-9, (, ), and the space character. This ensures that no shell metacharacters are passed.

    } else {
        $FORM{$key} =~ s/(\W)/\\$1/g;
        if ($FORM{$key}) {
            $query = join (" and ", $query, "($key =~ /$FORM{$key}/i)");
        }
    }
}

The conditional is executed if the field is valid. It checks to see if any information was entered in the field. If there is information, a query is built by joining each field and value with "and". You would normally have to escape the "/" character if you are using the regular expression search in Sprite. In this case, you don't need to because the user cannot enter "/" in any search field.

Once the loop terminates, a query might look something like the following:

and (Last =~ /Martin/i) and (First =~ /Robert/i) and (Department =~ /Sales/i)

The reason the query has an "and" at the beginning has to do with the way in which the query was created. If you look back at the join command, you can see that the information stored in the query variable is concatenated to a combination of a key and a value with "and", and is finally stored in query. Remember, $query will be undefined the first time through the loop, and thus will end up with an "and" at the beginning. Let's remove the unwanted initial string.

if ($query) {
    $query =~ s/^ and //;
} else {
    &return_error (500, "CGI Corporation Employee Database Error",
                        "No query was entered.");
}

If the user failed to enter any information, an error message is displayed. Otherwise, the "and" at the beginning of the query is removed to create a normal query:

(Last =~ /Martin/i) and (First =~ /Robert/i) and (Department =~ /Sales/i)

Note that Sprite allows you to use regular expression operators to search for data. If the user entered "M" in the last name field, this program instructs the database to return all records that contain the letter "M" (or "m", as the "i" flag indicates case insensitivity). There are cases when this is not desirable. In such cases, you would need to modify the way the query is joined:

$FORM{$key} = s/(['"])/\\$1/g;
$query = join (" and ", $query, "($key = '$FORM{$key}')");

This will return only exact matches. Since the value in the field is a string, you need to enclose $FORM{$key} in single quotes and escape all other quotes (or Sprite will return an error).

$rdb = new Sprite ();
$rdb->set_delimiter ("Read", ",");

This is some object-oriented Perl syntax that you saw in Chapter 6, Hypermedia Documents. A new database object is created, and the reference to it is stored in the variable rdb. The set_delimiter function sets the delimiter for the data stored in the database. The set_delimiter function takes two arguments. In the first, we specify that we are reading from the database. In the second, we specify the comma as the field delimiter (so we have to know what the data file looks like).

@data = $rdb->sql (<<End_of_Query);
    select * from phone.db
    where $query
End_of_Query

The query is passed to Sprite with the sql function. In this case, a here document is used to pass the query (so it looks readable to humans). You could just as easily do this:

@data = $rdb->sql ("select * from phone.db where $query");

Sprite returns the matched records as an array, with all the fields in each record joined by the null character "\0". However, the first element of the array is not a record, but a flag indicating success or failure. For instance, if you passed the following query:

select * from phone.db where (Department =~ /Systems/i)

the array would look like this:

$data[0] = 1
$data[1] = Supra\0John\0System Operator\0Systems\0jsupra\0(617) 555-1578
$data[2] = Painton\0Todd\0Network Engineer\0Systems\0tpainton\0(617) 555-6530

A value of 1 indicates success, while a 0 indicates failure.

$status = shift (@data);
$no_elements = scalar (@data);

The shift statement removes the first element of the array and stores it in the variable status. Then scalar is used to determine the number of elements in the array. You can also evaluate the array in a scalar context, without using the scalar command:

$no_elements = @data;

This is the same as using the scalar command, but different from:

$last_element = $#data;

This returns the index of the last element of the array (so in most cases, it would have a value one less than the number of elements, as arrays are zero-based).

if (!$status) {
    &return_error (500, "CGI Corporation Employee Database Error",
                        "Sprite Database Error!");
} elsif (!$no_elements) {
    &return_error (500, "CGI Corporation Employee Database Error",
                        "The record you specified does not exist.");

Two things are checked: the error status and the number of records returned by Sprite. If either the status is 0 or no records were returned, an error is displayed.

} else {
    print <<End_of_HTML;
Content-type: text/html
<HTML>
<HEAD><TITLE>CGI Corporation Employee Directory</TITLE></HEAD>
<BODY>
<H1>CGI Corporation Employee Directory</H1>
<HR><PRE>
End_of_HTML

This code is executed if valid records were returned by Sprite. We are now formatting the output for display. One of Perl's original attractions was the report-generating features it offered; Larry Wall even said that the "rl" in Perl stood for "Reporting Language." We will use some of those powerful features here. What we have to do is create a format and assign it to the $~ variable. Then, whenever we issue a write statement, Perl will print the data according to the format.

    $~ = "HEADING";
    write;

The "HEADING" format is selected to display header information.

    $~ = "EACH_ENTRY";
    foreach (@data) {
         s/([^\w\s\0])/sprintf ("&#%d;", ord ($1))/ge;
        ($last, $first, $job, $department, $email, $phone) =
            split (/\0/, $_, 6);
        write;
    }
    print "</PRE>", "\n";
    print "<HR>";
        print "</BODY></HTML>", "\n";
}

The "EACH_ENTRY" format is selected to display each record from the phone database. The foreach loop iterates through each record, splits it into the different fields, and issues a write to display the data. Note that no variable was supplied as part of the foreach loop. Normally, we would have something like this:

foreach $record (@data) {
    .
    .
    .
}

Since we did not supply a variable, Perl automatically places it in its default variable: $_.

$rdb->close ();
exit (0);

Finally, the database is closed, and the script terminates. Now, let's look at the two format statements:

format HEADING = 
Last       First      Job Title      Department   EMail       Phone
----       -----      ---------      ----------   -----       -----
.

This is a simple one! It is used as a header to display all of the fields. The period on a line by itself terminates the format.

format EACH_ENTRY = 
@<<<<<<<<  @<<<<<<<<  @<<<<<<<<<<<<  @<<<<<<<<<<  @<<<<<<<<<  @<<<<<<<<<<<<<
$last,     $first,    $job,          $department, $email,     $phone
.

This one is a little more complex. The "@<<<<<<<<" indicates an eight-character, left-justified field holder. The value stored in the variable, which is listed below a field holder, is displayed each time a write is called. This will allow for a neat and clean display, as shown in Figure 9.7.

Figure 9.7: Phone gateway results

[Graphic: Figure 9-7]

Student database

A CGI program is not limited to just reading information from a database; it can also manipulate the information. Here is a CGI program that can read, modify, and delete a database consisting of student information. Before we go any further, let's look at the supporting HTML documents:

<HTML>
<HEAD><TITLE>Welcome to CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>Student Database</H1>
You can use a combination of forms and CGI to access and modify information in
the student database. Please choose one of the following options:
<HR>
<A HREF="/add.html">Add New Student</A><BR>
<A HREF="/modify.html">Modify Student Information</A><BR>
<A HREF="/view.html">View Student Information</A><BR>
<A HREF="/delete.html">Delete Student</A><BR>
<HR>
</BODY>
</HTML>

This is the initial document containing links to the various forms that allow the user to view, add, modify, and delete information from the student database.

<HTML>
<HEAD><TITLE>Welcome to CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>Add New Student</H1>
<HR>
<FORM ACTION="/cgi-bin/student.pl?add" METHOD="POST">
<PRE>
Student Name:       <INPUT TYPE="text" NAME="Student" SIZE=40>
Year of Graduation: <INPUT TYPE="text" NAME="YOG" SIZE=4 MAXLENGTH=4>
Address (Mailing Information):
<TEXTAREA NAME="Address" ROWS=4 COLS=40></TEXTAREA>
</PRE>
<INPUT TYPE="submit" VALUE="Add New Student">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY></HTML>

This is the form used to add information into the database. When the user submits this form, a query of "add" is sent to the CGI program.

<HTML>
<HEAD><TITLE>Welcome to CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>Modify Student Information</H1>
<HR>
<FORM ACTION="/cgi-bin/student.pl?modify_form" METHOD="POST">
Student Name: <INPUT TYPE="text" NAME="Student" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="Modify Student Information">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>

This form allows the user to modify information for a particular student. When this form is submitted, the program builds and displays another form dynamically. Here is the form used to view the results of a specified query.

<HTML>
<HEAD><TITLE>Welcome to CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>View Student Information</H1>
<HR>
<FORM ACTION="/cgi-bin/student.pl?view" METHOD="POST">
Student Name: <INPUT TYPE="text" NAME="Student" SIZE=40>
<P>
Year of Graduation:
<INPUT TYPE="radio" NAME="Sign" VALUE="greater"> Greater Than
<INPUT TYPE="radio" NAME="Sign" VALUE="equal" CHECKED> Equal To
<INPUT TYPE="radio" NAME="Sign" VALUE="less"> Less Than
<INPUT TYPE="text" NAME="YOG" SIZE=4 MAXLENGTH=4>
<P>
Address Information: <INPUT TYPE="text" NAME="Address" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="View Student Information">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>

This form is used to view records that match certain criteria. The user can select records based on a conditional year of graduation (either greater than, less than, or equal to a certain year). We could have just as easily allowed mathematical operators (>, <, and =) to be entered, but this can be a potential security hole, as some of them have a special meaning to the shell (i.e., shell metacharacters). It is far better and safer to use strings like "equal", "greater", and "less", and let the CGI program convert them to the appropriate operators when creating a query.

<HTML>
<HEAD><TITLE>Welcome to CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>Delete Student</H1>
<HR>
<FORM ACTION="/cgi-bin/student.pl?delete" METHOD="POST">
<PRE>
Student Name:       <INPUT TYPE="text" NAME="Student" SIZE=40>
Year of Graduation: <INPUT TYPE="text" NAME="YOG" SIZE=4 MAXLENGTH=4>
</PRE>
<INPUT TYPE="submit" VALUE="Delete Student">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>

A user can use this form to delete information from the database. In this case, only the student name and year of graduation fields are presented. Records for an entire class can be deleted by specifying the year of graduation, and leaving the name field empty. You should not normally allow such a dangerous option! However, it is shown here to illustrate the power of databases and the Web.

Now, let's look at the CGI program that works with these forms.

#!/usr/local/bin/perl5
use Sprite;
$query_string = $ENV{'QUERY_STRING'};
$script = $ENV{'SCRIPT_NAME'};
$request_method = $ENV{'REQUEST_METHOD'};
$webmaster = "shishir\@bu\.edu";
$database = "/home/shishir/student.db";
$main_form = "/student.html";
$commands = '(add|modify_form|modify|view|delete)';

The Sprite module is loaded. The full path to the student database and the relative path to the main HTML document (the one that contains links to the other forms) are stored in the database, and main_form variables, respectively. Finally, commands contains a list of the valid queries that forms can pass to this program. If you look carefully at the list, you will notice that none of the forms listed above passes the "modify" query. The form that passes this is dynamically created by this program, as you will later see.

$delimiter = "::";
$error = "CGI Student Database Error";

Fields in the student database are delimited by the "::" characters.

if ($query_string =~ /^\b$commands\b$/) {
    &parse_form_data (*DB);

If the query is valid, the POST form data is decoded and placed in the DB associative array. (As always, the parse_form_data subroutine used in this program is the one we've been using throughout all our examples.)

    &check_all_fields ();
    &check_database ();

The check_all_fields subroutine iterates through the DB associative array to ensure that there are no shell metacharacters. The check_database subroutine checks to see if the student database exists. If not, a new one is created.

    $rdb = new Sprite ();
    $rdb->set_delimiter ("Read",  $delimiter);
    $rdb->set_delimiter ("Write", $delimiter);

A new database object is created. The set_delimiter function sets the delimiter to be used when reading from and writing to a database.

    $command_status = &$query_string ();

This is a construct that you may not have seen before. The subroutine corresponding to the value stored in query_string is called. It is equivalent to saying:

if ($query_string eq "add") {
    $command_status = &add ();
} elsif ($query_string eq "modify_form") {
    $command_status = &modify_form ();
} elsif ($query_string eq "modify") {
    $command_status = &modify ();
} elsif ($query_string eq "view") {
    $command_status = &view ();
} elsif ($query_string eq "delete") {
    $command_status = &delete ();
}

How convenient! Now, let's continue on with the program.

    if ($command_status) {
        $rdb->close ($database);
        print "Location: ", $main_form, "\n\n";
    } else {
        $rdb->close ();
    }

Depending on the status returned from one of the subroutines above, a server redirect is done with the Location: header. There is a subtle difference between the two $rdb->close subroutines. If you specify a database as part of the close subroutine, the modifications performed on that database are saved. Otherwise, the changes are discarded.

} else {
    &return_error (500, $error,
                "Invalid command passed through QUERY_STRING.");
}
exit (0);

If an invalid query was passed to this program, an error is returned.

The following subroutine checks to see if the database exists. If it does not, a new database is created, and a header line containing the field names, delimited by "::", is output.

sub check_database
{
    local ($exclusive_lock, $unlock, $header);
    $exclusive_lock = 2;
    $unlock = 8;
    if (! (-e $database) ) {
        if ( open (DATABASE, ">" . $database) ) {
            flock (DATABASE, $exclusive_lock);
            $header = join ($delimiter, "Student", "YOG", "Address");
            print DATABASE $header, "\n";
            flock (DATABASE, $unlock);
            close (DATABASE);
        } else {
            &return_error (500, $error, "Cannot create new student database.");
        }
    }
}

The check_all_fields subroutine makes sure the form elements do not contain shell meta-characters:

sub check_all_fields
{
    local ($key);
    foreach $key (keys %DB) {
        if ($DB{$key} =~ /[`\!;\\\*\\$[amp   ][lt    ][gt    ]]/) {\n              [amp   ]\|return_error (500, $error,
                                  "Invalid characters in the [$key] field.");
        }
    }
}

The subroutine iterates through the DB associative array checking to make sure that none of the elements contains any dangerous shell metacharacters. If any are found, an error message is displayed.

sub build_check_condition
{
    local ($columns) = @_;
    local ($all_fields, $loop, $key, $sign, $sql_condition);

This is a very useful subroutine that dynamically builds a query. It expects a string in the following format:

"Student,=,Address,=~"

From this, the following query is constructed (assuming that the user entered "Ed Surge" in the student field, and "Elm Street" in the address field):

(Student = 'Ed Surge') and (Address =~ 'Elm Street')

(You might have noticed that the regular expression is not the usual format ($string =~ /abc/). You are correct! However, Perl accepts this format as well.)

    @all_fields = split (/,/, $columns);

The all_fields array consists of successive elements of the field name, followed by the operator that should be used to search that field. In this example, the array would look like this:

$all_fields[0] = "Student";
$all_fields[1] = "=";
$all_fields[2] = "Address";
$all_fields[3] = "=~";

Now, let's look at the loop that iterates through this array to build the query.

    for ($loop=0; $loop <= $#all_fields; $loop = $loop + 2) {
        $key  = $all_fields[$loop];
        $sign = $all_fields[$loop + 1];

The key and the sign variables consist of the field name and the operator, respectively.

        if ($DB{$key}) {
            $DB{$key} =~ s/([\W])/\\$1/g;
            $sql_condition = join (" and ", $sql_condition,
                    "( $key $sign '$DB{$key}' )", );
        }
    }

The query is built in nearly the same manner as in the preceding example, except that the operator can be different for each field.

    if ($sql_condition) {
        $sql_condition =~ s/^ and //;
        return ($sql_condition);
    } else {
        &return_error (500, $error, "No query was entered.");
    }
}

If the user did not enter any information into the fields, an error message is displayed. Otherwise, the dynamically created query is returned (to the subroutine that called).

This is a very simple subroutine (if you can call it that) that returns an error.

sub database_error
{
    &return_error (500, $error,
        "Sprite database error. Please check the log file.");
}

The only reason this statement was placed in a subroutine is for convenience. For example, it is much shorter and quicker to say:

$rdb->update (<<Update_Command) || &database_error ();

than to say:

$rdb->update (<<Update_Command) || &return_error (500, $error,
        "Sprite database error. Please check the log file.");

This is especially true if the same error needs to be returned for various problems.

The check_select_command subroutine is generally used after an SQL "select" statement. It checks the first element of the returned data, as well as the number of records returned, and displays an error if either of these values equal 0. Otherwise, a status of 1 is returned.

sub check_select_command
{
    local ($value, $no_elements) = @_;
    if (!$value) {
        &database_error ();
    } elsif (!$no_elements) {
        &return_error (500, $error, 
            "The record you specified does not exist.");
    } else {
        return (1);
    }
}

The add subroutine inserts a record into the database.

sub add
{
    $DB{'Address'} =~ s/\n/<BR>/g;
    $DB{'Address'} =~ s/(['"])/\\$1/g;
    $DB{'Student'} =~ s/(['"])/\\$1/g;
    $rdb->sql (<<End_of_Insert) || &database_error ();
        
insert into $database
    (Student, YOG, Address)
values
    ('$DB{'Student'}', '$DB{'YOG'}', '$DB{'Address'}')
End_of_Insert
    return (1);
}

All newline characters are converted to "<BR>" and all single and double quotes are escaped. Remember, all records in a text-delimited database are delimited by newline characters! This ensures that the data will be correctly displayed by the browser when the user decides to view it.

The format for the "insert" SQL statement is the same as described in the SQL primer earlier. If the record could not be inserted into the database, an error is returned. Otherwise, a status of 1 is returned. This instructs the script to save the database and perform a server redirect to display the main HTML document.

Now for the most complicated action--modifying a row.

sub modify_form
{
    local (@info, $modify_status, $no_elements, $status);
    $DB{'Student'} =~ s/(['"])/\\$1/g;
    @info = $rdb->sql (<<End_of_Select);
select * from $database
where (Student = '$DB{'Student'}')
End_of_Select
    $status = shift (@info);
    $no_elements = scalar (@info);
    $modify_status = &check_select_command ($status, $no_elements);
    if ($modify_status) {
        &display_modify_form ($info[0]);
    }
    return (0);
}

This subroutine performs two actions. First, it uses the student's name, as specified in the modify form (shown with the other forms at the beginning of this section), to retrieve the record for that student. The check_select_command subroutine ensures that data was returned by the database. Second, display_modify_form is called (with the first record in the array as an argument) to display a new form that contains all of the information about the student. The user can then modify the data in the form and submit it.

A status of 0 is returned by this subroutine. As a result, the database is not saved (which is what we want, since it was not modified), and no server redirection is performed.

The display_modify_form subroutine returns a form for changing a student's record.

sub display_modify_form
{
    local ($fields) = @_;
    local ($student, $yog, $address);
    ($student, $yog, $address) = split (/\0/, $fields);
     $address =~ s/<BR>/\n/g;
     $student = &escape_html ($student);
     $yog = &escape_html ($yog);

The record that is passed to this subroutine by modify_form is split on the "\0" delimiter, and the "<BR>" characters are converted back to newlines. In addition, we call the escape_html subroutine to "escape" characters that have a special significance to the browser, such as the double quote, "<", ">", and "&". We perform these steps so that the information is displayed properly.

    print <<End_of_Modify_Form;
Content-type: text/html
<HTML>
<HEAD><TITLE>CGI Educational Center</TITLE></HEAD>
<BODY>
<H1>Modify Student Information</H1>
<HR>
<B>Student Name: $student</B>
<P>
<FORM ACTION="$script?modify" METHOD="POST">
<INPUT TYPE="hidden" NAME="Student" VALUE="$student">
Year of Graduation:
<INPUT TYPE="text" NAME="YOG" SIZE=4 MAXLENGTH=4 VALUE="$yog">
<P>
Address (Mailing Information):
<TEXTAREA NAME="Address" ROWS=4 COLS=40>
$address
</TEXTAREA>
<P>
<INPUT TYPE="submit" VALUE="Modify Record For: $student">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>
End_of_Modify_Form
}

The form containing the information for the specified student is output. The user can now modify this form. We use $student twice: once to remind the user which student was chosen, and once to pass the name back to this CGI program so it modifies the right row. The form is shown in Figure 9.8.

Figure 9.8: Modify form

[Graphic: Figure 9-8]

The escape_html subroutine escapes certain characters so that they are displayed correctly by the browser.

sub escape_html
{
    local ($string) = @_;
    local (%html_chars, $html_string);
    
    %html_chars = ('&', '&amp;',
                   '>', '&gt;',
                   '<', '&lt;',
                   '"', '&quot;');
    $html_string = join ("", keys %html_chars);
    $string =~ s/([$html_string])/$html_chars{$1}/go;
    return ($string);
}

The view subroutine shows a student's current record.

sub view
{
    local ( $fields, $query, @students, $view_status, $status, $no_elements);
    $fields = 'Student,=,Address,=~';
    if ($DB{'YOG'}) {
        if ($DB{'Sign'} eq 'greater') {
            $DB{'Sign'} = '>';
        } elsif ($DB{'Sign'} eq 'less') {
            $DB{'Sign'} = '<';
        } else {
            $DB{'Sign'} = '=';
        }
        $fields = join (",", $fields, 'YOG', $DB{'Sign'});
    }
    $query = &build_check_condition ($fields);

If the user entered information into the year of graduation field, the search operator is determined. This is then appended to the value stored in the fields variable. The build_check_condition subroutine is called to dynamically construct the search query.

    @students = $rdb->sql (<<End_of_Display);
select * from $database
    where $query
End_of_Display
    $status = shift (@students);
    $no_elements = scalar (@students);
    $view_status = &check_select_command ($status, $no_elements);

The query is passed to the select command. The information returned by the database is checked for possible errors. If there are no errors, view_status contains the value of 1.

    if ($view_status) {
        &display_results ("View Students", *students);
    }
    return (0);
}

If the data returned by the database is valid, the display_results subroutine is called to display the search results. The two arguments passed to the subroutine are the header for the HTML document and the reference to the array that contains the results.

sub display_results
{
    local ($title, *data) = @_;
    local ($student, $yog, $address);
    print "Content-type: text/html", "\n";
    print "Pragma: no-cache", "\n\n";
        print "<HTML>", "\n";
    print "<HEAD><TITLE>CGI Educational Center</TITLE></HEAD>";
        print "<BODY>", "\n";
    print "<H1>", $title, "</H1>";
    print "<HR>";

The Content-type: and Pragma: MIME headers are output. We do not want the browser to cache the page containing the results. As a result, the displayed data reflects the true status of the database.

    foreach (@data) {
        s/([^\w\s\0])/sprintf ("&#%d;", ord ($1))/ge;
        ($student, $yog, $address) = split ("\0", $_, 3);
        $student = "NULL"                      if (!$student);
        $yog     = "Unknown graduation date"   if (!$yog);
        $address = "No address specified"      if (!$address);

If any of the fields for a record are null, certain default values are used, so as not to display empty fields.

        $address =~ s/&#60;BR&#62;/<BR>/g;
        print "<BR>", "\n";
        print "<B>", $student, "</B> ", "($yog)", "<BR>", "\n";
        print $address, "<BR>", "\n";
    }
    print "<HR>", "\n";
        print "</BODY></HTML>", "\n";
}

The foreach loop iterates through the matched records, and displays them.

The delete subroutine removes records from the database.

sub delete
{
    local ($fields, $query);
    $fields = 'Student,=,YOG,=';
    $query = &build_check_condition ($fields);
    $rdb->sql (<<End_of_Delete) || &database_error ();
delete from $database
    where $query
End_of_Delete
    return (1);
}

Multiple records can be deleted by leaving the student field empty, but entering a valid year for the YOG field. If the specified records cannot be deleted, an error message is displayed.

Existing records are modified with the modify subroutine.

sub modify
{
    local (@fields, $key);
    @fields = ('YOG', 'Address');
    $DB{'Address'} =~ s/\n/<BR>/g;
    $DB{'YOG'} =~ s/(['"])/\\$1/g;
    $DB{'Student'} =~ s/(['"])/\\$1/g;
    $DB{'Address'} =~ s/(['"])/\\$1/g;
    foreach $key (@fields) {
        $rdb->sql (<<Update_Database) || &database_error ();
update $database
set $key = ('$DB{$key}') 
where (Student = '$DB{'Student'}');
Update_Database
    }
    return (1);
}

The current version of Sprite does not support multiple fields in a update statement. As a result, a loop is used to update the record multiple times. If the user entered "1991" in the year of graduation field, and "Elm Street, 02215" in the address field, the two update statements are generated:

update /home/shishir/student.db
set YOG = ('1991')
where (Student = 'Ed Surge')
update /home/shishir/student.db
set Address = ('Elm Street, 02215')
where (Student = 'Ed Surge')

That concludes the section on Sprite.

A Gateway to an Oracle Database

Now, let's look at CGI gateways to the two most popular commercial databases: Oracle and Sybase. Each of these is supported by Perl and Tcl extensions that make our job much easier by letting us submit SQL queries that the database recognizes. I will use Oracle and Sybase to illustrate two different ways to display the results of a query. In this section, I will query an Oracle database and plot the data using gnuplot (available from ftp://prep.ai.mit.edu/pub/gnu/gnuplot-3.5.tar.gz). In the next section, I will use Sybase and display the results in a table using HTML.

Suppose you have a database consisting of stock trading history for various companies over an eleven-year span (from 1980 through 1990). A sample table is shown below:

ID

Company

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

Doe

Doe, Inc.

12.1

12.5

13.0

12.7

13.2

14.1

15.7

13.9

14.6

19.3

19.0

FaH

Federal Ham.

37.3

40.4

38.2

41.1

42.3

44.4

45.9

45.3

47.9

48.1

50.0

Max

Max Corp.

73.2

73.9

74.1

74.0

74.7

74.7

76.6

80.3

71.1

59.6

70.3

You would like to present this valuable source of information as crisp graphs or plots to the general public. How would you go about doing it? The first step is to create a form where the user can enter a company's identification:

<HTML>
<HEAD><TITLE>Welcome to CGI Stock Service</TITLE></HEAD>
<BODY>
<H1>Stock Quotes</H1>
<HR>
<FORM ACTION="/cgi-bin/stocks.pl" METHOD="GET">
<EM>Please enter the name of the stock that you would like to 
get a quote for:</EM>
<P>
<INPUT TYPE="text" NAME="Company_ID" SIZE=10 MAXLENGTH=10>
<P>
<INPUT TYPE="submit" VALUE="Look Up This Stock">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>

The second step is to write a CGI program that sends the query to the database, retrieves the results, and utilizes gnuplot to graph the information. Here is the CGI program that interacts with the Oracle database using oraperl:

#!/usr/local/bin/oraperl
require "oraperl.ph";

oraperl is a derivative of Perl that contains functionality to access and interact with Oracle databases. As of Perl 5, the DBperl extensions (a.k.a. DBI) supersede most of the Perl 4.0 database derivatives (such as oraperl and sybperl). For information on where to get oraperl, syperl, and DBperl, see Appendix E, Applications, Modules, Utilities, and Documentation.

$| = 1;
$webmaster = "shishir\@bu\.edu";
$gnuplot = "/usr/local/bin/gnuplot";
$ppmtogif = "/usr/local/bin/pbmplus/ppmtogif";

Buffering is turned off, and the full path to the gnuplot and ppmtogif commands is defined. (See Chapter 6, Hypermedia Documents for other examples of how these commands are used.)

&parse_form_data (*DB);
($company_id = $DB{'Company_ID'}) =~ s/^\s*(.*)\b\s*$/$1/;

The form information is decoded. In this case, we are dealing with only one field (Company_ID). The information stored in this field is equated to the company_id variable, and the leading and trailing spaces are removed.

if ($company_id =~ /^\w+$/) {

If the field value is an alphanumeric character (A-Z, a-z, 0-9, _), the program continues. Otherwise, an error message is returned. We want to make sure that only the characters that we need are allowed! In this case, shell metacharacters are not allowed to pass through.

    $process_id = $$;
    $output_ppm = join ("", "/tmp/", $process_id, ".ppm");
    $data_file =  join ("", "/tmp/", $process_id, ".txt");

We need two temporary files in this program. To make sure that each running instance of the program uses unique temporary files, we borrow a trick from UNIX shell scripting and put our process identification number (PID) into the names. Each time the program runs, it has to have a unique PID, so we know we will not clobber our own temporary file. The output_ppm and data_file variables contain the full file specification for the temporary files that will be created by this program. The current process id number ensures unique filenames.

    $color_number = 1;

The color number of 1 indicates Red. This is the color of the plot line.

    $system_id = "Miscellaneous";
    $username = "shishir";
    $password = "fnjop673e2nB";

The Oracle system identification (SID), the username, and the password are set. You might wonder if it is safe to hard-code the database password into this program. The answer to that depends on how the database is set up. In cases like this, you should create a generic user, such as "guest," with minimal access rights (read-only), so that there is no danger to the database.

    $lda = &ora_login ($system_id, $username, $password);

The ora_login subroutine is used to log in to the database. The value returned is the login identifier, also referred to as the Oracle Login Data Area. This identifier will be used to execute an SQL command.

    $csr = &ora_open ($lda, " select * from Stocks where ID = '$company_id' ");

The ora_open subroutine executes a specified SQL command. It requires a login identifier, and returns a statement identifier or an Oracle Cursor. This statement identifier is needed to retrieve the actual data (resulting from the SQL command).

You are not limited to specifying the SQL command on one line; you can use the block notation:

$csr = &ora_open ($lda, <<End_of_Select);
select * from Stocks
where ID = '$company_id'
End_of_Select

Let's continue with the rest of the program.

    if ( open (DATA, ">" . $data_file) ) {
        ($company_id, $company, @stock_prices) = &ora_fetch ($csr);

The ora_fetch subroutine retrieves the information returned by the SQL select command. The first two fields (or columns) are stored in company_id and company, respectively. The rest of the columns, however, are stored in the stock_prices array. This consists of the 11 columns representing 11 years, as shown in the previous table.

        &ora_close ($csr);
        &ora_logoff ($lda);

The statement identifier is released with the ora_close subroutine, and the database is closed.

        if ($company_id) {

This block of code is executed only if a database record matched the user's selection. Otherwise, an error message is returned.

            $stocks_start = 1980;
            $stocks_end = 1990;
            $stocks_duration = $stocks_end - $stocks_start;
            for ($loop=0; $loop <= $stocks_duration; $loop++) {
                $price = $stock_prices[$loop];
                $year  = $stocks_start + $loop;
                print DATA $year, " ", $price, "\n";
            }
            close (DATA);

The loop iterates 11 times to create a data file with all of the year/stock price pairs. For example, here is how the data file would look like if the user selected "Fah":

When we build our plot, the first column provides data for the x axis, while the second column provides data for the y axis.

            &graph_data ("Stock History for $company", $data_file,
                         "Year", "Price", $color_number, $output_ppm);

The graph_data subroutine is called to create a PBM file (which is later converted to GIF). The arguments to this subroutine are the title of the graph, the data file to use, the label for the X axis, the label for the Y axis, the line color, and the output file.

            &create_gif ($output_ppm);

The final GIF image is created by the create_gif subroutine, which expects one argument: the name of the PBM file created by gnuplot.

        } else {
            &return_error (500, "Oracle Gateway CGI Error",
                "The specified company could not be found.");
        }

An error message is displayed if the user selected a non-existent company name.

    } else {
        &return_error (500, "Oracle Gateway CGI Error",
                    "Could not create output file.");
    }

If the data file could not be created, an error is returned.

} else {
    &return_error (500, "Oracle Gateway CGI Error",
                "Invalid characters in company field.");
}
exit (0);

Finally, if the information in the form field contains any non-alphanumeric characters, an error message is sent.

The graph_data subroutine opens a pipe to the gnuplot numerical analysis program, and sends a group of format commands through it. The end result of this is a pbm graphics file, which is later converted to GIF.

sub graph_data
{
    local ($title, $file, $x_label, $y_label, $color, $output) = @_;
    open (GNUPLOT, "| $gnuplot");
    print GNUPLOT <<gnuplot_Commands_Done;
    
        set term pbm color small
        set output "$output"
        set title "$title"
        set xlabel "$x_label"
        set ylabel "$y_label"
        set noxzeroaxis
        set noyzeroaxis
        set border
        set nokey
        plot "$file" w lines $color
gnuplot_Commands_Done
    close (GNUPLOT);
}

The create_gif subroutine uses the ppmtogif utility to convert the pbm file to GIF, for display on the Web (see Figure 9.9).

sub create_gif
{
    local ($output) = @_;
    print "Content-type: image/gif", "\n\n";
    system ("$ppmtogif $output 2> /dev/null");
    unlink $output_ppm, $data_file;
}

Finally, the temporary files are "unlinked," or deleted.

Figure 9.9: Stocks graph

[Graphic: Figure 9-9]

Accessing a Sybase Database

In this example, the form input (from the user) is used to access a Sybase database to look up information on books. Our interface to Sybase is the sybperl library, which provides Perl subroutines for giving Sybase queries in the form it can recognize. The data returned by Sybase is converted to an HTML 3.0 table format. In other words, the output, when displayed on a browser that recognizes HTML 3.0, resembles a nice table with solid three-dimensional lines separating the different fields.

<HTML>
<HEAD><TITLE>Welcome to CGI Publishing Company</TITLE></HEAD>
<BODY>
<H1>Book Search</H1>
<HR>
<FORM ACTION="/cgi-bin/books.pl" METHOD="GET">
<EM>Please enter the name of the book that you would like to look up:</EM>
<P>
<INPUT TYPE="text" NAME="Book" SIZE=40>
<P>
<INPUT TYPE="submit" VALUE="Look Up This Book">
<INPUT TYPE="reset"  VALUE="Clear the Information">
</FORM>
<HR>
</BODY>
</HTML>

Above is the form that is used to retrieve the input from the user.

Let's look at the program:

#!/usr/local/bin/sybperl
require "sybperl.pl";
$user = "shishir";
$password = "mkhBhd9v2sK";
$server = $ENV{'DSQUERY'} || "Books";

The user, password, and server name are set. If the environment variable DSQUERY is defined, the server is set to the value of that variable. If not, the server is set to "Books". The following statement:

$server = $ENV{'DSQUERY'} || "Books";

is a simpler of way of doing the following:

if ($ENV{'DSQUERY'}) {
    $server = $ENV{'DSQUERY'};
} else {
    $server = "Books";
}

Next, the dblogin subroutine is used to log in to the Sybase server.

$dbproc = &dblogin ($user, $password, $server);

dblogin returns the identification for the newly created database process into the dbproc variable.

@fields = ('Author', 'Book', 'Publisher', 'Year', 'Pages');
$title = "CGI Publishing Company Book Database";

The fields array holds a list of all the fields in a record. The title variable contains the title of the HTML 3.0 table.

&parse_form_data (*DB);
($book_name = $DB{'Book'}) =~ s/^\s*(.*)\b\s*$/$1/;

Leading and trailing spaces are removed from the Book field.

if ($book_name =~ /^[\w\s]+$/) {

Since we are dealing with book names, the user is allowed to enter only the following characters: A-Z, a-z, 0-9, _, and whitespace. If any other characters are entered, an error message is returned.

To retrieve data from a Sybase database, you attach to the database, execute a query, and then loop through the returned data one row at a time. These standard steps are performed in this CGI application.

    &dbcmd ($dbproc, " select * from Catalog where Book = '$book_name' ");
    &dbsqlexec ($dbproc);
    $status = &dbresults ($dbproc);

The dbcmd subroutine associates the SQL command with the current database process (dbproc). The dbsqlexec subroutine executes the SQL command, while the dbresults make the data available to the program. The dbresults subroutine returns either "$SUCCEED" or "$FAIL" (these are variables that are special to sybperl).

    if ($status == $SUCCEED) {
        while ( (@books = &dbnextrow ($dbproc)) ) {
            $book_string = join ("\0", @books);
            push (@all_books, $book_string);
        }

If the user-specified records are found, the dbresults subroutine returns "$SUCCEED". The while loop iterates through all of the data by calling the dbnextrow subroutine each time through the loop, in case there is more than one book that matches the criteria. The books array consists of information in the following format (for a sample book);

$books[0] = "Andy Oram and Steve Talbott"
$books[1] = "Managing Projects with make"
$books[2] = "O'Reilly & Associates, Inc."
$books[3] = 1991
$books[4] = 152

We need to create this intermediate array because that is the structure of the data returned by dbnextrow. But what we really want is a single string, because then we could store all the information on a single book in one element of the @all_books array. So we use the join statement to form the following string:

$book_string = "Andy Oram and Steve Talbott\0Managing Projects with make\0O'Reilly & Associates, Inc.\01991\0152"

This string is then pushed into the all_books array. This process is repeated for all matches.

        &dbexit ($dbproc);
        &display_table ($title, *fields, *all_books, "\0");

The database is closed by calling the dbexit subroutine. Finally, the table is displayed by calling a generic subroutine, display_table. The subroutine expects the following arguments: the title of the table, the array consisting of the header (or field) names, the array consisting of the strings, and the delimiter by which these strings are concatenated.

    } else {
        &return_error (500, "Sybase Database CGI Error",
            "The book title(s) you specified does not exist.");
    }
} else {
    &return_error (500, "Sybase Database CGI Error",
                "Invalid characters in book name.");
}
exit(0);

Error messages are returned if either the specified book name does not exist, or the input contains invalid characters.

The display_table subroutine prints out the table.

sub display_table
{
    local ($title, *columns, *selected_entries, $delimiter) = @_;
    local ($name, $entry);
    print "Content-type: text/html", "\n\n";
        print "<HTML>", "\n";
    print "<HEAD><TITLE>", $title, "</TITLE></HEAD>", "\n";
        print "<BODY>", "\n";
    print "<TABLE BORDER=2>", "\n";
    print "<CAPTION>", $title, "</CAPTION>", "\n";
    print "<TR>", "\n";

A MIME type of text/html is output, along with some HTML 3.0 tags to create a table.

    foreach $name (@columns) {
        print "<TH>", $name, "\n";
    }

This loop iterates through and displays all of the field headers.

    foreach $entry (@selected_entries) {
        $entry =~ s/$delimiter/<TD>/go;
        print "<TR>", "<TD>", $entry, "\n";
    }
    print "</TABLE>", "\n";
        print "</BODY></HTML>", "\n";
}

The foreach loop iterates through the matching records, substitutes the delimiter with the <TD> tag, and prints out the HTML needed to create a new row. There is no delimiter before the first item in $entry, so the print statement supplies the first <TD> tag. Finally, the table is closed. Figure 9.10 shows what the table looks like.

Figure 9.10: Results of search gateway

[Graphic: Figure 9-10]