|
|
|
|
Unix Programming - Taxonomy of Unix IPC Methods - Peer-to-Peer Inter-Process Communication
All the communication methods we've discussed so far have a sort
of implicit hierarchy about them, with one program effectively
controlling or driving another and zero or limited feedback passing in
the opposite direction. In communications and networking we frequently
need channels that are
peer-to-peer
, usually (but
not necessarily) with data flowing freely in both directions. We'll
survey peer-to-peer communications methods under Unix here, and
develop some case studies in later chapters.
The use of tempfiles as communications drops between
cooperating programs is the oldest IPC technique there is. Despite
drawbacks, it's still useful in shellscripts, and in one-off programs
where a more elaborate and coordinated method of communication would
be overkill.
The most obvious problem with using tempfiles as an IPC
technique is that it tends to leave garbage lying around if processing
is interrupted before the tempfile can be deleted. A less obvious risk
is that of collisions between multiple instances of a program using
the same name for a tempfile. This is why it is conventional for
shellscripts that make tempfiles to include $$ in their names; this
shell variable expands to the process-ID of the enclosing shell and
effectively guarantees that the filename will be unique (the same
trick is supported in Perl).
Finally, if an attacker knows the location to which a tempfile
will be written, it can overwrite on that name and possibly either read
the producer's data or spoof the consumer process by inserting
modified or spurious data into the file.[74]
This is a security risk. If the processes involved have root
privileges, this is a very serious risk. It can be mitigated by setting
the permissions on the tempfile directory carefully, but such
arrangements are notoriously likely to spring leaks.
All these problems aside, tempfiles still have a niche because
they're easy to set up, they're flexible, and they're less vulnerable
to deadlocks or race conditions than more elaborate methods. And
sometimes, nothing else will do. The calling conventions of your
child process may require that it be handed a file to operate on. Our
first example of a shellout to an editor demonstrates this
perfectly.
The simplest and crudest way for two processes on the same
machine to communicate with each other is for one to send the other a
signal
. Unix signals are a form of soft
interrupt; each one has a default effect on the receiving process
(usually to kill it). A process can declare a
signal
handler
that overrides the default action for the signal; the
handler is a function that is executed asynchronously when the signal
is received.
Signals were originally designed into Unix as a way for the
operating system to notify programs of certain errors and critical
events, not as an IPC facility. The SIGHUP signal,
for example, is sent to every program started from a given terminal
session when that session is terminated. The SIGINT
signal is sent to whatever process is currently attached to the
keyboard when the user enters the currently-defined interrupt
character (often control-C). Nevertheless, signals can be useful for
some IPC situations (and the POSIX-standard signal set includes two
signals, SIGUSR1 and SIGUSR2,
intended for this
use). They are often
employed as a control channel for
daemons
(programs that run constantly, invisibly, in background), a way for an
operator or another program to tell a daemon that it needs to either
reinitialize itself, wake up to do work, or write
internal-state/debugging information to a known location.
|
I insisted SIGUSR1 and SIGUSR2 be invented for BSD. People
were grabbing system signals to mean what they needed them to mean for IPC,
so that (for example) some programs that segfaulted would not coredump
because SIGSEGV had been hijacked.
This is a general principle — people will want to hijack
any tools you build, so you have to design them to either be
un-hijackable or to be hijacked cleanly. Those are your only choices.
Except, of course, for being ignored—a highly reliable way to
remain unsullied, but less satisfying than might at first appear.
|
|
--
Ken Arnold
|
|
A technique often used with signal IPC is the so-called
pidfile
. Programs that will need to be signaled
will write a small file to a known location (often in
/var/run or the invoking user's home directory)
containing their process ID or PID. Other programs can read that file
to discover that PID. The pidfile may also function as an implicit
lock file
in cases where no more than one
instance of the daemon should be running simultaneously.
There are actually two different flavors of signals. In the
older implementations (notably V7, System
III, and early
System V), the
handler for a given signal is reset to the default for that signal
whenever the handler fires. The result of sending two of the same
signal in quick succession is therefore usually to kill the process,
no matter what handler was set.
The BSD 4.x versions of Unix changed to
“reliable” signals, which do not reset unless the user
explicitly requests it. They also introduced primitives to block or
temporarily suspend processing of a given set of signals. Modern
Unixes support both styles. You should use the BSD-style
nonresetting entry points for new code, but program defensively in
case your code is ever ported to an implementation that does not
support them.
Receiving N signals does not necessarily invoke the signal
handler N times. Under the older System V signal model, two or more
signals spaced very closely together (that is, within a single
timeslice of the target process) can result in various race
conditions[75] or anomalies. Depending on what
variant of signals semantics the system supports, the second and later
instances may be ignored, may cause an unexpected process kill, or may
have their delivery delayed until earlier instances have been
processed (on modern Unixes the last is most likely).
The modern signals API is portable across all recent Unix
versions, but not to Windows or classic (pre-OS X) MacOS.
Many well-known system daemons accept SIGHUP
(originally the signal sent to programs on a serial-line drop, such as
was produced by hanging up a modem connection) as a signal to
reinitialize (that is, reload their configuration files); examples
include Apache and the
Linux implementations of
bootpd(8),
gated(8),
inetd(8),
mountd(8),
named(8),
nfsd(8),
and
ypbind(8). In
a few cases, SIGHUP is accepted in its original sense
of a session-shutdown signal (notably in Linux
pppd(8)),
but that role nowadays generally goes to
SIGTERM.
SIGTERM (‘terminate’) is
often accepted as a graceful-shutdown signal (this is as distinct from
SIGKILL, which does an immediate process
kill and cannot be blocked or handled). SIGTERM actions often involve cleaning up
tempfiles, flushing final updates out to databases, and the
like.
When writing daemons, follow the Rule of Least Surprise: use
these conventions, and read the manual pages to look for existing
models.
The fetchmail utility is normally set
up to run as a daemon in background, periodically collecting mail from
all remote sites defined in its run-control file and passing the mail
to the local SMTP listener on port 25 without user
intervention. fetchmail sleeps for a
user-defined interval (defaulting to 15 minutes) between collection
attempts, so as to avoid constantly loading the network.
When you invoke fetchmail with no arguments,
it checks to see if you have a fetchmail
daemon already running (it does this by looking for a pidfile). If no
daemon is running, fetchmail starts up
normally using whatever control information has been specified in its
run-control file. If a daemon is running, on the other hand, the new
fetchmail instance just signals the old one
to wake up and collect mail immediately; then the new instance
terminates. In addition, fetchmail -q sends a
termination signal to any running fetchmail
daemon.
Thus, typing fetchmail means, in effect,
“poll now and leave a daemon running to poll later; don't bother me
with the detail of whether a daemon was already running or not”.
Observe that the detail of which particular signals are used for
wakeup and termination is something the user doesn't have to know.
Sockets were developed in the
BSD lineage of Unix
as a way to encapsulate access to data networks. Two programs
communicating over a socket typically see a bidirectional byte stream
(there are other socket modes and transmission methods, but they are
of only minor importance). The byte stream is both sequenced (that is,
even single bytes will be received in the same order sent) and
reliable (socket users are guaranteed that the underlying network will
do error detection and retry to ensure delivery). Socket descriptors,
once obtained, behave essentially like file descriptors.
|
Sockets differ from read/write in one important case. If
the bytes you send arrive, but the receiving machine fails to ACK, the
sending machine's TCP/IP stack will time out. So getting an error does
not
necessarily mean that the bytes didn't
arrive; the receiver may be using them. This problem has profound
consequences for the design of reliable protocols, because you have to be
able to work properly when you don't know what was received in the
past. Local I/O is ‘yes/no’. Socket I/O is ‘yes/no/maybe’.
And nothing can ensure delivery — the remote machine
might have been destroyed by a comet.
|
|
--
Ken Arnold
|
|
At the time a socket is created, you specify a
protocol family
which tells the network layer how
the name of the socket is interpreted. Sockets are usually thought of
in connection with the Internet, as a way of passing data between
programs running on different hosts; this is the AF_INET socket
family, in which addresses are interpreted as host-address and
service-number pairs. However, the AF_UNIX (aka AF_LOCAL) protocol
family supports the same socket abstraction for communication between
two processes on the same machine (names are interpreted as the
locations of special files analogous to bidirectional named pipes). As
an example, client programs and servers using the X windowing
system
typically use AF_LOCAL sockets to communicate.
All modern Unixes support BSD-style
sockets, and as a
matter of design they are usually the right thing to use for
bidirectional IPC no matter where your cooperating processes are
located. Performance pressure may push you to use shared memory or
tempfiles or other techniques that make stronger locality assumptions,
but under modern conditions it is best to assume that your code will
need to be scaled up to distributed operation. More importantly,
those locality assumptions may mean that portions of your system get
chummier with each others' internals than ought to be the case in a
good design. The separation of address spaces that sockets enforce is
a feature, not a bug.
To use sockets gracefully, in the Unix tradition, start by
designing an
application protocol
for use between
them — a set of requests and responses which expresses the
semantics of what your programs will be communicating about in a
succinct way. We've already discussed the some major issues in the
design of application protocols in Chapter5.
Sockets are supported in all recent Unixes, under Windows, and
under classic MacOS as well.
PostgreSQL is an open-source database program. Had it been
implemented as a monster monolith, it would be a single program with
an interactive interface that manipulates database files on disk
directly. Interface would be welded together with implementation, and
two instances of the program attempting to manipulate the same
database at the same time would have serious contention and locking
issues.
Instead, the PostgreSQL suite includes a server called
postmaster and at least three client
applications. One postmaster server
process per machine runs in background and has exclusive access to the
database files. It accepts requests in the SQL query minilanguage through
TCP/IP sockets,
and returns answers in a textual format as well. When the user runs a
PostgreSQL client, that client opens a session to
postmaster and does SQL transactions with
it. The server can handle several client sessions at once, and
sequences requests so that they don't interfere with each other.
Because the front end and back end are separate, the server
doesn't need to know anything except how to interpret SQL requests
from a client and send SQL reports back to it. The clients, on the
other hand, don't need to know anything about how the database is
stored. Clients can be specialized for different needs and have
different user interfaces.
This organization is quite typical for Unix databases — so
much so that it is often possible to mix and match SQL clients and SQL
servers. The interoperability issues are the SQL server's
TCP/IP port number,
and whether client and server support the same dialect of SQL.
Whereas two processes using sockets to communicate may live on
different machines (and, in fact, be separated by an Internet
connection spanning half the globe), shared memory requires producers
and consumers to be co-resident on the same hardware. But, if your
communicating processes can get access to the same physical memory,
shared memory will be the fastest way to pass information between
them.
Shared memory may be disguised under different APIs, but on
modern Unixes the implementation normally depends on the use of
mmap(2)
to map files into memory that can be shared between processes.
POSIX defines a
shm_open(3)
facility with an API that supports using files as shared memory; this
is mostly a hint to the operating system that it need not flush the
pseudofile data to disk.
Because access to shared memory is not automatically serialized
by a discipline resembling read and write calls, programs doing the
sharing must handle contention and deadlock issues themselves,
typically by using semaphore variables located in the shared segment.
The issues here resemble those in multithreading (see the end of this
chapter for discussion) but are more manageable because default is
not
to share memory. Thus, problems are better
contained.
On systems where it is available and reliable, the
Apache web server's
scoreboard facility uses shared memory for communication between an
Apache master process and the load-sharing pool of Apache images that it
manages. Modern X implementations also use shared memory, to pass
large images between client and server when they are resident on the
same machine, to avoid the overhead of socket communication.
Both uses are performance hacks justified by experience and testing,
rather than being architectural choices.
The
mmap(2)
call is supported under all modern Unixes, including
Linux
and the open-source BSD
versions; this is described in the Single Unix Specification. It will
not normally be available under Windows, MacOS classic, and other
operating systems.
Before purpose-built
mmap(2)
was available, a common way for two processes to communicate was for
them to open the same file, and then delete that file. The file
wouldn't go away until all open filehandles were closed, but some old
Unixes took the link count falling to zero as a hint that they could
stop updating the on-disk copy of the file. The downside was that
your backing store was the file system rather than a swap device,
the file system the deleted file lived on couldn't be unmounted until the
programs using it closed, and attaching new processes to an existing
shared memory segment faked up in this way was tricky at best.
After Version 7 and the split between the
BSD and
System V lineages,
the evolution of Unix interprocess communication took two different
directions. The BSD direction led to sockets. The AT&T lineage, on the other
hand, developed named pipes (as previously
discussed) and an IPC facility, specifically designed for
passing binary data and based on shared-memory bidirectional message
queues. This is called ‘System V IPC’—or,
among old timers, ‘Indian Hill’ IPC after the AT&T
facility where it was first written.
The upper, message-passing layer of System V IPC has largely
fallen out of use. The lower layer, which consists of shared memory
and semaphores, still has significant applications under circumstances
in which one needs to do mutual-exclusion locking and some global data
sharing among processes running on the same machine. These System V
shared memory facilities evolved into the POSIX shared-memory API,
supported under Linux, the BSDs, MacOS X and Windows, but not classic
MacOS.
By using these shared-memory and semaphore facilities
(shmget(2),
semget(2),
and friends) one can avoid the overhead of copying data through the
network stack. Large commercial databases (including Oracle, DB2, Sybase, and
Informix) use this technique heavily.
[an error occurred while processing this directive]
|
|
|