If
you look back at the guestbook example in Chapter 7, Advanced Form Applications,
you will notice that one of the fields asked for the user's HTTP
server. At that time, we did not discuss any methods to check if
the address given by the user is valid. However, with our new knowledge
of sockets and network communication, we can, indeed, determine
the validity of the address. After all, web servers have to use
the same Internet protocols as everyone else; they possess no magic.
If we open a TCP/IP socket connection to a web server, we can pass
it commands it recognizes, just as we passed a command to the finger
daemon (server). Before we go any further, here is a small snippet
of code from the guestbook that outputs the user-specified URL:
if ($FORM{'www'}) {
print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
}
Here is a subroutine that utilizes the socket library to check
for valid
URL addresses. It takes one argument,
the URL to check.
sub check_url
{
local ($url) = @_;
local ($current_host, $host, $service, $file, $first_line);
if (($host, $service, $file) =
($url =~ m|https://([^/:]+):{0,1}(\d*)(\S*)$|)) {
This regular expression parses the specified URL and retrieves
the hostname, the port number (if included), and the file.
Let's continue with the program:
chop ($current_host = `\bin\hostname`);
$host = $current_host if ($host eq "localhost");
$service = "http" unless ($service);
$file = "/" unless ($file);
If the hostname is given as "localhost", the current hostname
is used. In addition, the service name and the file are set to "http",
and "/", respectively, if no information was specified for these
fields.
&open_connection (HTTP, $host, $service) || return (0);
print HTTP "HEAD $file HTTP/1.0", "\n\n";
A socket is created, and a connection is attempted to the
remote host. If it fails, an error status of zero is returned. If
it succeeds, the HEAD command is issued to the
HTTP server. If the specified document exists,
the server returns something like this:
HTTP/1.0 200 OK
Date: Fri Nov 3 06:09:17 1995 GMT
Server: NCSA/1.4.2
MIME-version: 1.0
Content-type: text/html
Last-modified: Sat Feb 4 17:56:33 1995 GMT
Content-length: 486
All
we are concerned about is the first line, which contains a status
code. If the status code is 200, a success status of one is returned.
If the document is protected, or does not exist, error codes of
401 and 404, respectively, are returned (see Chapter 3, Output from the Common Gateway Interface). Here
is the code to check the status:
chop ($first_line = <HTTP>);
if ($first_line =~ /200/) {
return (1);
} else {
return (0);
}
close (HTTP);
} else {
return (0);
}
}
This is how you would use this subroutine in the guestbook:
if ($FORM{'www'}) {
&check_url ($FORM{'www'}) ||
&return_error (500, "Guestbook File Error",
"The specified URL does not exist. Please enter a valid URL.");
print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
}
Now, let's look at an example that creates a gateway to the
Archie server using pre-existing client software.