Another effect of fast processors is that performance is
usually bounded by the cost of I/O and — especially with
programs that use the Internet — network transactions. It's
therefore valuable to know how to design network protocols for
good performance.
The most important issue is avoiding protocol round trips as
much as possible. Every protocol transaction that requires a
handshake turns any latency in the connection into a potentially
serious slowdown. Avoiding such handshakes is not specifically a
Unix-tradition practice, but it's one that needs mention here because
so many protocol designs lose huge amounts of performance to
them.
|
I cannot say enough about latency. X11 went well beyond
X10 in avoiding round trip requests: the Render extension goes even
further. X (and these days, HTTP/1.1) is a streaming protocol. For
example, on my laptop, I can execute over 4
million
11 rectangle requests (8 million no-op
requests) per second. But round trips are hundreds or thousands of
times more expensive. Anytime you can get a client to do something
without having to contact the server, you have a tremendous
win.
|
נ |
--
Jim Gettys
|
|
In fact, a good rule of thumb is to design for the lowest
possible latency and ignore bandwidth costs until your profiling tells
you otherwise. Bandwidth problems can be solved later in development
by tricks like compressing a protocol stream on the fly; but getting
rid of high latency baked into an existing design is much, much harder
(often, effectively impossible).
While this effect shows up most clearly in network protocol
design, throughput vs. latency tradeoffs are a much more general
phenomenon. In writing applications, you will sometimes face a choice
between doing an expensive computation once in anticipation that it
will be used several times, or computing only when actually needed
(even if that means you will often recompute results). In most cases
where you face a tradeoff like this, the right thing to do is bias
toward low latency. That is, don't try to precompute expensive
operations unless you have a throughput requirement and know by actual
measurement that the throughput you are getting is too low.
Precomputation may seem efficient because it minimizes total use of
processor cycles, but processor cycles are cheap. Unless you are
doing one of a handful of monstrously compute-intensive applications
like data mining, animation rendering, or the aforementioned bomb
simulations, it is usually better to opt for short startup times and
quick response.
In Unix's early days this advice might have been considered
heretical. Processors were much slower and cost ratios were very
different then; also, the pattern of Unix use was tilted rather more
strongly toward server operations. The point about the value of low
latency needs to be made partly because even newer Unix developers
sometimes inherit an old-time cultural prejudice toward optimizing for
throughput. But times have changed.
Three general strategies for reducing latency are (a) batching
transactions that can share startup costs, (b) allowing transactions
to overlap, and (c) caching.