Unix Programming - Problems and Methods to Avoid - Remote Procedure Calls
Despite occasional exceptions such as NFS (Network File System)
and the GNOME project, attempts to import CORBA, ASN.1, and other
forms of remote-procedure-call interface have largely failed —
these technologies have not been naturalized into the Unix
culture.
There seem to be several underlying reasons for this. One is
that RPC interfaces are not readily
discoverable;
that is, it is difficult to query these interfaces for their
capabilities, and difficult to monitor them in action without building
single-use tools as complex as the programs being monitored (we
examined some of the reasons for this in Chapter6). They have the same version skew
problems as libraries, but those problems are harder to track because
they're distributed and not generally obvious at link time.
As a related issue, interfaces that have richer type signatures
also tend to be more complex, therefore more brittle. Over time, they
tend to succumb to ontology creep as the inventory of types that get
passed across interfaces grows steadily larger and the individual types
more elaborate. Ontology creep is a problem because structs are more
likely to mismatch than strings; if the ontologies of the programs on
each side don't exactly match, it can be very hard to teach them to
communicate at all, and fiendishly difficult to resolve bugs. The
most successful RPC applications, such as the Network File System,
are those in which the application domain naturally has only a
few simple data types.
The usual argument for RPC is that it permits
“richer” interfaces than methods like text streams
— that is, interfaces with a more elaborate and
application-specific ontology of data types. But the Rule of
Simplicity applies! We observed in Chapter4 that one of the functions of interfaces
is as choke points that prevent the implementation details of modules
from leaking into each other. Therefore, the main argument in favor
of RPC is also an argument that it increases global complexity rather
than minimizing it.
With classical RPC, it's too easy to do things in a complicated
and obscure way instead of keeping them simple. RPC seems to
encourage the production of large, baroque, over-engineered systems with
obfuscated interfaces, high global complexity, and serious
version-skew and reliability problems — a perfect example of
thick glue layers run amok.
Windows COM and DCOM are perhaps the archetypal examples of how
bad this can get, but there are plenty of others. Apple abandoned
OpenDoc, and both CORBA and the once wildly hyped Java
RMI have
receded from view in the Unix world as people have gained field
experience with them. This may well be because these methods don't
actually solve more problems than they cause.
Andrew S. Tanenbaum and Robbert van Renesse have given us a
detailed analysis of the general problem in A Critique of
the Remote Procedure Call Paradigm [Tanenbaum-VanRenesse], a paper which should serve as a
strong cautionary note to anyone considering an architecture based on
RPC.
All these problems may predict long-term difficulties for the
relatively few Unix projects that use RPC. Of these projects, perhaps
the best known is the GNOME desktop effort.[77] These problems also contribute to the
notorious security vulnerabilities of exposing NFS servers.
Unix tradition, on the other hand, strongly favors
transparent and
discoverable
interfaces. This is one of the forces behind the Unix culture's
continuing attachment to IPC through textual protocols. It is often
argued that the parsing overhead of textual protocols is a performance
problem relative to binary RPCs — but RPC interfaces tend to
have latency problems that are far worse, because (a) you can't
readily anticipate how much data marshaling and unmarshaling a given
call will involve, and (b) the RPC model tends to encourage
programmers to treat network transactions as cost-free. Adding even
one additional round trip to a transaction interface tends to add
enough network latency to swamp any overhead from parsing or
marshaling.
Even if text streams were less efficient than RPC, the
performance loss would be marginal and linear, the kind better
addressed by upgrading your hardware than by expending development
time or adding architectural complexity. Anything you might lose in
performance by using text streams, you gain back in the ability to
design systems that are simpler — easier to monitor, to model,
and to understand.
Today, RPC and the Unix attachment to text streams are
converging in an interesting way, through protocols like XML-RPC and
SOAP. These, being textual and transparent, are more palatable to Unix
programmers than the ugly and heavyweight binary serialization formats
they replace. While they don't solve all the more general
problems pointed out by Tanenbaum and van Renesse, they do in some
ways combine the advantages of both text-stream and RPC worlds.
[an error occurred while processing this directive]
|