Unix Programming - Studying Cases - Case Study: The Terminfo Database
The terminfo database is a collection of descriptions of
video-display terminals. Each entry describes the escape sequences
that perform various manipulations on the terminal screen, such as
inserting or deleting lines, erasing from the cursor position to end
of line or screen, or beginning and ending screen highlights such
as reverse video, underline, or blink.
The terminfo database is primarily used by the
curses(3)
libraries. These underlie the “roguelike” interface
style we discuss in Chapter11, and some very widely used programs such
as
mutt(1),
lynx(1),
and
slrn(1).
Though the terminal emulators such as
xterm(1)
that run on today's bitmapped displays all have capabilities that are
minor variations on those of the ANSI X3.64 standard and the venerable
VT100 terminal, there is still enough variation that hardwiring ANSI
capabilities into applications would be a bad idea. Terminfo is also
worth studying because problems that are logically similar to the one
it addressed arise constantly in managing other kinds of peripheral
hardware that doesn't have a standard way to report their own
capabilities.
The design of terminfo benefits from experience with an earlier
capability format called termcap. The database of termcap
descriptions lived in a textual format in one big file,
/etc/termcap; though this format is
now obsolete, your Unix system almost certainly includes a
copy.
Normally, the key used to look up your terminal type entry is
the environment variable TERM, which for purposes of
this case study is set by magic.[61] Applications that
use terminfo (or termcap) pay a small penalty in startup lag; when the
curses(3)
library initializes itself, it has to look up the entry corresponding
to TERM and load the entry into memory.
Experience with termcap showed that the startup penalty was
dominated by the time required to parse the textual representation of
capabilities. Accordingly, terminfo entries are binary structure dumps
that can be marshaled and unmarshaled more quickly. There is a master
textual format for the entire database, the terminfo capability file.
That file (or individual entries) can be compiled to binary form with
the terminfo compiler
tic(1);
binary entries can be decompiled to the editable text format by
infocmp(1).
The design superficially contradicts the advice we gave in Chapter5 against binary
caches, but this is actually the extreme case in which that's a good
tactic. Edits to the text masters are very rare — in fact,
Unixes normally ship with the terminfo database precompiled and the
text master serving primarily as documentation. Thus, the
synchronization and inconsistency problems that would normally
militate against this approach almost never arise.
The designers of terminfo could have optimized for speed
in a second way. The entire database of binary entries could have
been put in some kind of big opaque database file. What they actually did
instead was more clever and more in the Unix spirit. Terminfo entries
live in a directory hierarchy, usually on modern Unixes under
/usr/share/terminfo. Consult the
terminfo(5)
man page to find the location on your system.
If you look in the terminfo directory, you'll see subdirectories
named by single printable characters. Under each of these are the
entries for each terminal type that has a name beginning with that
letter. The goal of this organization was to avoid having to do a
linear search of a very large directory; under more modern Unix
file systems, which represent directories with B-trees or other
structures optimized for fast lookup, the subdirectories won't be
necessary.
|
I found that even on a fairly modern Unix, splitting a big directory up
into subdirectories can improve performance substantially. It was tens
of thousands of files, an authorized-user database for a big educational
institution, on a late-model DEC Alpha running DEC's Unix. (Subdirectories
named by first and last letter of name — e.g., "johnson" would be in
directory "j_n" — worked best of the schemes we tested. Using the first
two letters wasn't nearly as good, because there were a lot of
systematically-generated names which differed only toward the end.)
This may just say that sophisticated directory indexing is still not as
common as it should be... but even so, that makes an organization which
works well without it more portable than one which requires it.
|
|
--
Henry Spencer
|
|
Thus, the cost of opening a terminfo entry is two file system
lookups and a file open. But since mining the same entry from one
big database would have required a lookup and open for the database,
the incremental cost for terminfo's organization is at most one
file system lookup. Actually, it's less than that; it's the cost
difference between one file system lookup and whatever retrieval method
the one big database would have used. This is probably marginal, and
quite tolerable once per application at startup time.
Terminfo uses the file system itself as a simple hierarchical
database. This is a superb bit of constructive laziness, obeying the
Rule of Economy and the Rule of Transparency. It means that all the
ordinary tools for navigating, examining and modifying the file system
can be used to navigate, examine, and modify the terminfo database; no
special ones (other than
tic(1)
and
infocmp(1)
for packing and unpacking the individual records) need to be written
and debugged. It also means that work on speeding up database access
would be work on speeding up the file system itself, tuning that would
benefit many more applications than just users of
curses(3).
There is one additional advantage of this organization that
doesn't come up in the terminfo case; you get to use Unix's
permissions mechanism rather than having to invent your own
access-control layer with its own bugs. This falls out as a
consequence of adopting the “everything is a file”
philosophy of Unix rather than trying to fight it.
The terminfo directory layout is rather space-inefficient
on most Unix file systems. The entries are usually between 400 and
1400 bytes long, but file systems normally allocate a minimum of 4K
for every nonempty disk file. The designers accepted this cost
for the same reason they chose a packed binary format, to
cut the startup latency of terminfo-using programs to a minimum. Disk
capacity for constant price has exploded over a thousandfold since,
tending to vindicate that decision.
The contrast with the formats used by the Microsoft Windows
registry files is instructive. Registries are property databases used
by both Windows itself and applications. Each registry lives in one
big file. Registries contain a mix of text and binary data that
requires specialized editing tools. The one-big-file approach leads,
among other things, to the notorious ‘registry creep’
phenomenon; average access time rises without bound as new entries are
added. Because there is no standard API for editing the registry
provided by the system, applications use ad-hoc code to edit it
themselves, making it notoriously subject to corruption that can lock
up the entire system.
Using the Unix file system as a database is a tactic other
applications with simple database requirements might do well to emulate.
Good reasons not to do it are more likely to have to do with the
database keys not naturally looking like filenames than they are with
any performance problems. In any case, it's the sort of good fast
hack that can be very useful in prototyping.
[an error occurred while processing this directive]
|