Now that we have all these methods, what should we do with them?
The first thing to notice is that
tempfiles, the more
interactive sort of master/slave process relationship, sockets, RPC,
and all other methods of bidirectional IPC are at some level
equivalent — they're all just ways for programs to exchange data
during their lifetimes. Much of what we do in a sophisticated way
using sockets or shared memory we could do in a primitive way using
tempfiles as mailboxes and signals for notification. The differences
are at the edges, in how programs establish communication, where and
when one does the marshalling and unmarshalling of messages, in what
sorts of buffering problems you may have, and atomicity guarantees you
get on the messages (that is, to what extent you can know that the
result of a single send action from one side will show up as a single
receive event on the other).
We've seen from the PostgreSQL study that one effective way to
hold down complexity is to break an application into a client/server
pair. The PostgreSQL client and server communicate through an application
protocol over sockets, but very little about the design pattern would
change if they used any other bidirectional IPC method.
This kind of partitioning is particularly effective in
situations where multiple instances of the application must manage
access to resources that are shared among all. A single server process
may manage all resource contention, or cooperating peers may each take
responsibility for some critical resource.
Client-server partitioning can also help distribute cycle-hungry
applications across multiple hosts. Or it may make them suitable for
distributed computing across the Internet (as with Freeciv). We'll
discuss the related CLI server pattern in Chapter11.
Because all these peer-to-peer IPC techniques are alike at some
level, we should evaluate them mainly on the amount of
program-complexity overhead they incur, and how much opacity they
introduce into our designs. This, ultimately, is why BSD sockets have
won over other Unix IPC methods, and why RPC has generally failed to
get much traction.
Threads are fundamentally different. Rather than supporting
communication among different programs, they support a sort of
timesharing within an instance of a single program. Rather
than being a way to partition a big program into smaller ones
with simpler behavior, threading is strictly a performance hack. It
has all the problems normally associated with performance hacks, and
a few special ones of its own.
Accordingly, while we should seek ways to break up large
programs into simpler cooperating processes, the use of threads within
processes should be a last resort rather than a first. Often, you may
find you can avoid them. If you can use limited shared memory and
semaphores, asynchronous I/O using SIGIO, or
poll(2)/select(2)
rather than threading, do it that way. Keep it simple; use techniques
earlier on this list and lower on the complexity scale in preference
to later ones.
The combination of threads, remote-procedure-call interfaces, and
heavyweight object-oriented design is especially dangerous. Used
sparingly and tastefully, any of these techniques can be valuable —
but if you are ever invited onto a project that is supposed to feature
all three, fleeing in terror might well be an appropriate
reaction.
We have previously observed that programming in the real world
is all about managing complexity. Tools to manage complexity are good
things. But when the effect of those tools is to proliferate
complexity rather than to control it, we would be better off throwing them
away and starting from zero. An important part of the Unix wisdom
is to never forget this.
|