Unix Programming - Ad-hoc Code Generation - Case Study: Generating HTML Code for a Tabular List

On-line Guides

Eclipse Documentation

How To Guides

The Art of Unix Programming
Prev	Home	Next

Unix Programming - Ad-hoc Code Generation - Case Study: Generating HTML Code for a Tabular List

Case Study: Generating HTML Code for a Tabular List

Let's suppose that we want to put a page of tabular data on a Web page. We want the first few lines to look like Example9.6.

Example9.6.Desired output format for the star table.

Aalat         David Weber             The Armageddon Inheritance
Aelmos        Alan Dean Foster        The Man who Used the Universe 
Aedryr        Steve Miller/Sharon Lee Scout's Progress 
Aergistal     Gerard Klein            The Overlords of War 
Afdiar        L. Neil Smith           Tom Paine Maru 
Agandar       Donald Kingsbury        Psychohistorical Crisis 
Aghirnamirr   Jo Clayton              Shadowkill

The thick-as-a-plank way to handle this would be to hand-write HTML table code for the desired appearance. Then, each time we want to add a name, we'd have to hand-write another set of <tr> and <td> tags for the entry. This would get very tedious very quickly. But what's worse, changing the format of the list would require hand-hacking every entry.

The superficially clever way to handle this would be to make this data a three-column relation in a database, then use some fancy CGI^[99] technique or a database-capable templating engine like PHP to generate the page on the fly. But suppose we know that the list will not change very often, don't want to run a database server just to be able to display this list, and don't want to load the server with unnecessary CGI traffic?

There's a better solution. We put the data in a tabular flat-file format like Example9.7.

Example9.7.Master form of the star table.

Aalat         :David Weber                 :The Armageddon Inheritance
Aelmos        :Alan Dean Foster            :The Man who Used the Universe 
Aedryr        :Steve Miller/Sharon Lee     :Scout's Progress 
Aergistal     :Gerard Klein                :The Overlords of War 
Afdiar        :L. Neil Smith               :Tom Paine Maru 
Agandar       :Donald Kingsbury            :Psychohistorical Crisis 
Aghirnamirr   :Jo Clayton                  :Shadowkill

We could in a pinch have done without the explicit colon field delimiters, using the pattern consisting of two or more spaces as a delimiter, but the explicit delimiter protects us in case we press spacebar twice while editing a field value and fail to notice it.

We then write a script in shell , Perl, Python , or Tcl that massages this file into an HTML table, and run that each time we add an entry. The old-school Unix way would revolve around the following nigh-unreadable sed(1) invocation


sed -e 's,^,<tr><td>,' -e 's,$,</td></tr>,' -e 's,:,</td><td>,g'

or this perhaps slightly more scrutable awk(1) program:


awk -F: '{printf("<tr><td>%s</td><td>%s</td><td>%s</td></tr>\n", \
                 $1, $2, $3)}'

(If either of these examples interests but mystifies, read the documentation for sed(1) or awk(1). We explained in Chapter8 that the latter has largely fallen out of use. The former is still an important Unix tool that we haven't examined in detail because (a) Unix programmers already know it, and (b) it's easy for non-Unix programmers to pick up from the manual page once they grasp the basic ideas about pipelines and redirection.)

A new-school solution might center on this Python code, or on equivalent Perl:


for row in map(lambda x:x.rstrip().split(':'),sys.stdin.readlines()):
    print "<tr><td>" + "</td><td>".join(row) + "</td></tr>"

These scripts took about five minutes each to write and debug, certainly less time than would have been required to either hand-hack the initial HTML or create and verify the database. The combination of the table and this code will be much simpler to maintain than either the under-engineered hand-hacked HTML or the over-engineered database.

A further advantage of this way of solving the problem is that the master file stays easy to search and modify with an ordinary text editor. Another is that we can experiment with different table-to-HTML transformations by tweaking the generator script, or easily make a subset of the report by putting a grep(1) filter before it.

I actually use this technique to maintain the Web page that lists fetchmail test sites; the example above is science-fictional only because publishing the real data would reveal account usernames and passwords.

This was a somewhat less trivial example than the previous one. What we've actually designed here is a separation between content and formatting, with the generator script acting as a stylesheet. (This is yet another mechanism-vs.-policy separation.)

The lesson in all these cases is the same. Do as little work as possible. Let the data shape the code. Lean on your tools. Separate mechanism from policy. Expert Unix programmers learn to see possibilities like these quickly and automatically. Constructive laziness is one of the cardinal virtues of the master programmer.

^[98] Scripting languages tend to solve this problem more elegantly than C does. Investigate the shell's here documents and Python's triple-quote construct to find out how.

^[99]Here, CGI refers not to Computer Graphic Inagery but to the Common Gateway Interface used for live Web content.

[an error occurred while processing this directive]

The Art of Unix Programming
Prev	Home	Next

Published under free license.

Design by Interspire