A good notation has a subtlety and suggestiveness which at times
makes it almost seem like a live teacher.
--
Bertrand Russell
The World of Mathematics (1956)
One of the most consistent results from large-scale studies of
error patterns in software is that programmer error rates in defects per
hundreds of lines are largely independent of the language in which
the programmers are coding.[78]
Higher-level languages, which allow
you to get more done in fewer lines, mean fewer bugs as well.
Unix has a long tradition of hosting little languages
specialized for a particular application domain, languages that can
enable you to drastically reduce the line count of your programs.
Domain-specific language examples include the numerous Unix
typesetting languages (troff,
eqn, tbl,
pic, grap),
shell utilities (awk,
sed, dc,
bc), and software development tools
(make, yacc,
lex). There is a fuzzy boundary between
domain-specific languages and the more flexible sort of application
run-control file (sendmail, BIND, X);
another with data-file formats; and another with scripting
languages
(which we'll survey in Chapter14).
Historically, domain-specific languages of this kind have been
called ‘little languages’ or ‘minilanguages’
in the Unix world, because early examples were small and low in
complexity relative to general-purpose languages (all three terms for
the category are in common use). But if the application domain is
complex (in that it has lots of different primitive operations or
involves manipulation of intricate data structures), an application
language for it may have to be rather more complex than some
general-purpose languages. We'll keep the traditional term
‘minilanguage’ to emphasize that the wise course
is usually to keep these designs as small and simple as
possible.
The domain-specific little language is an extremely powerful
design idea. It allows you to define your own higher-level language
to specify the appropriate methods, rules, and algorithms for the task
at hand, reducing global complexity relative to a design that uses
hardwired lower-level code for the same ends. You can get to a
minilanguage design in at least three ways, two of them good and one
of them dangerous.
One right way to get there is to realize up front that you can
use a minilanguage design to push a given specification of a programming
problem up a level, into a notation that is more compact and
expressive than you could support in a general-purpose language. As
with code generation and data-driven programming, a minilanguage lets
you take practical advantage of the fact that the defect rate in your
software will be largely independent of the level of the language you
are using; more expressive languages mean shorter programs and fewer
bugs.
The second right way to get to a minilanguage design is to
notice that one of your specification file formats is looking more and
more like a minilanguage — that is, it is developing complex
structures and implying actions in the application you are
controlling. Is it trying to describe control flow as well as data
layouts? If so, it may be time to promote that control flow from
being implicit to being explicit in your specification
language.
The wrong way to get to a minilanguage design is to extend your
way to it, one patch and crufty added feature at a time. On this path,
your specification file keeps sprouting more implied control flow and
more tangled special-purpose structures until it has become an ad-hoc
language without your noticing it. Some legendary nightmares have
been spawned this way; the example every Unix guru will think of (and
shudder over) is the sendmail.cf configuration
file associated with the sendmail mail
transport.
Sadly, most people do their first minilanguage the wrong way,
and only realize later what a mess it is. Then the question is: how
to clean it up? Sometimes the answer implies rethinking the entire
application design. Another notorious example of language-by-feature
creep was the editor TECO, which grew first
macros and then loops and conditionals as programmers wanted to use it
to package increasingly complex editing routines. The resulting
ugliness was eventually fixed by a redesign of the entire editor to be
based on an intentional language; this is how Emacs
Lisp
(which we'll survey below) evolved.
All sufficiently complicated specification files aspire to the
condition of minilanguages. Therefore, it will often be the case that
your only defense against designing a bad minilanguage is knowing how
to design a good one. This need not be a huge step or involve knowing
a lot of formal language theory; with modern tools, learning a few
relatively simple techniques and bearing good examples in mind as you
design should be sufficient.
In this chapter we'll examine all the kinds of minilanguages
normally supported under Unix, and try to identify the kinds of
situation in which each of them represents an effective design
solution. This chapter is not meant to be an exhaustive catalog of
Unix languages, but rather to bring out the design principles involved
in structuring an application around a minilanguage. We'll have much
more to say about languages for general-purpose programming in Chapter14.
We'll need to start by doing a little taxonomy, so we'll
know what we're talking about later on.