Unix Programming - Ad-hoc Code Generation - Case Study: Generating HTML Code for a Tabular List
Let's suppose that we want to put a page of tabular data on a
Web page. We want the first few lines to look like Example9.6.
The thick-as-a-plank way to handle this would be to hand-write
HTML table code for the desired appearance. Then, each time we want
to add a name, we'd have to hand-write another set of
<tr> and <td> tags for
the entry. This would get very tedious very quickly. But what's
worse, changing the format of the list would require hand-hacking
every entry.
The superficially clever way to handle this would be to make
this data a three-column relation in a database, then use some fancy
CGI[99] technique or a database-capable templating
engine like PHP to generate the page on the fly. But suppose we know
that the list will not change very often, don't want to run a database
server just to be able to display this list, and don't want to load
the server with unnecessary CGI traffic?
There's a better solution. We put the data in a tabular flat-file
format like Example9.7.
We could in a pinch have done without the explicit colon field
delimiters, using the pattern consisting of two or more spaces as a
delimiter, but the explicit delimiter protects us in case we press
spacebar twice while editing a field value and fail to notice
it.
We then write a script in shell, Perl, Python, or Tcl that massages
this file into an HTML table, and run that each time we add an entry.
The old-school Unix way would revolve around the following
nigh-unreadable sed(1) invocation
sed -e 's,^,<tr><td>,' -e 's,$,</td></tr>,' -e 's,:,</td><td>,g'
or this perhaps slightly more scrutable awk(1) program:
awk -F: '{printf("<tr><td>%s</td><td>%s</td><td>%s</td></tr>\n", \
$1, $2, $3)}'
(If either of these examples interests but mystifies, read the
documentation for
sed(1)
or
awk(1).
We explained in Chapter8 that the latter has largely fallen
out of use. The former is still an important Unix tool that we
haven't examined in detail because (a) Unix programmers already know
it, and (b) it's easy for non-Unix programmers to pick up from the
manual page once they grasp the basic ideas about pipelines and
redirection.)
A new-school solution might center on this Python code, or
on equivalent Perl:
for row in map(lambda x:x.rstrip().split(':'),sys.stdin.readlines()):
print "<tr><td>" + "</td><td>".join(row) + "</td></tr>"
These scripts took about five minutes each to write and debug,
certainly less time than would have been required to either hand-hack
the initial HTML or create and verify the database. The combination
of the table and this code will be much simpler to maintain than
either the under-engineered hand-hacked HTML or the over-engineered
database.
A further advantage of this way of solving the problem is that
the master file stays easy to search and modify with an ordinary text
editor. Another is that we can experiment with different
table-to-HTML transformations by tweaking the generator script, or
easily make a subset of the report by putting a
grep(1)
filter before it.
I actually use this technique to maintain the Web page that
lists
fetchmail
test sites; the example above is science-fictional only because
publishing the real data would reveal account usernames and
passwords.
This was a somewhat less trivial example than the previous
one. What we've actually designed here is a separation between content
and formatting, with the generator script acting as a stylesheet.
(This is yet another mechanism-vs.-policy separation.)
The lesson in all these cases is the same. Do as little work
as possible. Let the data shape the code. Lean on your tools.
Separate mechanism from policy. Expert Unix programmers learn to see
possibilities like these quickly and automatically. Constructive
laziness is one of the cardinal virtues of the master programmer.
[an error occurred while processing this directive]
|