Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Previous Chapter 10
Gateways to Internet Information Servers
Next
 

10.5 Checking Hypertext (HTTP) Links

If you look back at the guestbook example in Chapter 7, Advanced Form Applications, you will notice that one of the fields asked for the user's HTTP server. At that time, we did not discuss any methods to check if the address given by the user is valid. However, with our new knowledge of sockets and network communication, we can, indeed, determine the validity of the address. After all, web servers have to use the same Internet protocols as everyone else; they possess no magic. If we open a TCP/IP socket connection to a web server, we can pass it commands it recognizes, just as we passed a command to the finger daemon (server). Before we go any further, here is a small snippet of code from the guestbook that outputs the user-specified URL:

        if ($FORM{'www'}) {
            print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
        }

Here is a subroutine that utilizes the socket library to check for valid URL addresses. It takes one argument, the URL to check.

sub check_url 
{
    local ($url) = @_;
    local ($current_host, $host, $service, $file, $first_line);
    if (($host, $service, $file) = 
        ($url =~ m|https://([^/:]+):{0,1}(\d*)(\S*)$|)) {

This regular expression parses the specified URL and retrieves the hostname, the port number (if included), and the file.

[Graphic: Figure from the text]

Let's continue with the program:

        chop ($current_host = `\bin\hostname`);
        $host = $current_host  if ($host eq "localhost");
        $service = "http"      unless ($service);
        $file = "/"            unless ($file);

If the hostname is given as "localhost", the current hostname is used. In addition, the service name and the file are set to "http", and "/", respectively, if no information was specified for these fields.

        &open_connection (HTTP, $host, $service) || return (0);   
        print HTTP "HEAD $file HTTP/1.0", "\n\n";

A socket is created, and a connection is attempted to the remote host. If it fails, an error status of zero is returned. If it succeeds, the HEAD command is issued to the HTTP server. If the specified document exists, the server returns something like this:

HTTP/1.0 200 OK
Date: Fri Nov  3 06:09:17 1995 GMT
Server: NCSA/1.4.2
MIME-version: 1.0
Content-type: text/html
Last-modified: Sat Feb  4 17:56:33 1995 GMT
Content-length: 486

All we are concerned about is the first line, which contains a status code. If the status code is 200, a success status of one is returned. If the document is protected, or does not exist, error codes of 401 and 404, respectively, are returned (see Chapter 3, Output from the Common Gateway Interface). Here is the code to check the status:

        chop ($first_line = <HTTP>);
        if ($first_line =~ /200/) {
            return (1);
        } else {
            return (0);
        }
        close (HTTP);
    } else {
        return (0);
    }
}

This is how you would use this subroutine in the guestbook:

        if ($FORM{'www'}) {
            &check_url ($FORM{'www'}) ||
                &return_error (500, "Guestbook File Error",
                "The specified URL does not exist. Please enter a valid URL.");
            print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
        }

Now, let's look at an example that creates a gateway to the Archie server using pre-existing client software.


Previous Home Next
Socket Library Book Index Archie

 
 
  Published under free license. Design by Interspire