Unix Programming - Process Partitioning at the Design Level

Process Partitioning at the Design Level

Now that we have all these methods, what should we do with them?

The first thing to notice is that tempfiles, the more interactive sort of master/slave process relationship, sockets, RPC, and all other methods of bidirectional IPC are at some level equivalent — they're all just ways for programs to exchange data during their lifetimes. Much of what we do in a sophisticated way using sockets or shared memory we could do in a primitive way using tempfiles as mailboxes and signals for notification. The differences are at the edges, in how programs establish communication, where and when one does the marshalling and unmarshalling of messages, in what sorts of buffering problems you may have, and atomicity guarantees you get on the messages (that is, to what extent you can know that the result of a single send action from one side will show up as a single receive event on the other).

We've seen from the PostgreSQL study that one effective way to hold down complexity is to break an application into a client/server pair. The PostgreSQL client and server communicate through an application protocol over sockets, but very little about the design pattern would change if they used any other bidirectional IPC method.

This kind of partitioning is particularly effective in situations where multiple instances of the application must manage access to resources that are shared among all. A single server process may manage all resource contention, or cooperating peers may each take responsibility for some critical resource.

Client-server partitioning can also help distribute cycle-hungry applications across multiple hosts. Or it may make them suitable for distributed computing across the Internet (as with Freeciv). We'll discuss the related CLI server pattern in Chapter11.

Because all these peer-to-peer IPC techniques are alike at some level, we should evaluate them mainly on the amount of program-complexity overhead they incur, and how much opacity they introduce into our designs. This, ultimately, is why BSD sockets have won over other Unix IPC methods, and why RPC has generally failed to get much traction.

Threads are fundamentally different. Rather than supporting communication among different programs, they support a sort of timesharing within an instance of a single program. Rather than being a way to partition a big program into smaller ones with simpler behavior, threading is strictly a performance hack. It has all the problems normally associated with performance hacks, and a few special ones of its own.

Accordingly, while we should seek ways to break up large programs into simpler cooperating processes, the use of threads within processes should be a last resort rather than a first. Often, you may find you can avoid them. If you can use limited shared memory and semaphores, asynchronous I/O using SIGIO, or poll(2)/select(2) rather than threading, do it that way. Keep it simple; use techniques earlier on this list and lower on the complexity scale in preference to later ones.

The combination of threads, remote-procedure-call interfaces, and heavyweight object-oriented design is especially dangerous. Used sparingly and tastefully, any of these techniques can be valuable — but if you are ever invited onto a project that is supposed to feature all three, fleeing in terror might well be an appropriate reaction.

We have previously observed that programming in the real world is all about managing complexity. Tools to manage complexity are good things. But when the effect of those tools is to proliferate complexity rather than to control it, we would be better off throwing them away and starting from zero. An important part of the Unix wisdom is to never forget this.