Resource Locking (Practical mod

19.2. Resource Locking

Database locking is required if more than one process will try to modify the data. In an environment in which there are both reading and writing processes, the reading processes should use locking as well, since it's possible for another process to modify the resource at the same moment, in which case the reading process gets corrupted data.

We distinguish between shared-access and exclusive-access locks. Before doing an operation on the DBM file, an exclusive lock request is issued if a read/write access is required. Otherwise, a shared lock is issued.

19.2.1. Deadlocks

First let's make sure that you know how processes work with the CPU. Each process gets a tiny CPU time slice before another process takes over. Usually operating systems use a "round robin" technique to decide which processes should get CPU slices and when. This decision is based on a simple queue, with each process that needs CPU entering the queue at the end of it. Eventually the added process moves to the head of the queue and receives a tiny allotment of CPU time, depending on the processor speed and implementation (think microseconds). After this time slice, if it is still not finished, the process moves to the end of the queue again. Figure 19-1 depicts this process. (Of course, this diagram is a simplified one; in reality various processes have different priorities, so one process may get more CPU time slices than others over the same period of time.)

Figure 19-1. CPU time allocation

Now let's talk about the situation called deadlock. If two processes simultaneously try to acquire exclusive locks on two separate resources (databases), a deadlock is possible. Consider this example:

sub lock_foo {
    exclusive_lock('DB1');
    exclusive_lock('DB2');
}

sub lock_bar {
    exclusive_lock('DB2');
    exclusive_lock('DB1');
}

Suppose process A calls lock_foo( ) and process B calls lock_bar( ) at the same time. Process A locks resource DB1 and process B locks resource DB2. Now suppose process A needs to acquire a lock on DB2, and process B needs a lock on DB1. Neither of them can proceed, since they each hold the resource needed by the other. This situation is called a deadlock.

Using the same CPU-sharing diagram shown in Figure 19-1, let's imagine that process A gets an exclusive lock on DB1 at time slice 1 and process B gets an exclusive lock on DB2 at time slice 2. Then at time slice 4, process A gets the CPU back, but it cannot do anything because it's waiting for the lock on DB2 to be released. The same thing happens to process B at time slice 5. From now on, the two processes will get the CPU, try to get the lock, fail, and wait for the next chance indefinitely.

Deadlock wouldn't be a problem if lock_foo( ) and lock_bar( ) were atomic, which would mean that no other process would get access to the CPU before the whole subroutine was completed. But this never happens, because all the running processes get access to the CPU only for a few milliseconds or even microseconds at a time (called a time slice). It usually takes more than one CPU time slice to accomplish even a very simple operation.

For the same reason, this code shouldn't be relied on:

sub get_lock {
    sleep 1, until -e $lock_file;
    open LF, $lock_file or die $!;
    return 1;
}

The problem with this code is that the test and the action pair aren't atomic. Even if the -e test determines that the file doesn't exist, nothing prevents another process from creating the file in between the -e test and the next operation that tries to create it. Later we will see how this problem can be resolved.

19.2.2. Exclusive Locking Starvation

If a shared lock request is issued, it is granted immediately if the file is not locked or has another shared lock on it. If the file has an exclusive lock on it, the shared lock request is granted as soon as that lock is removed. The lock status becomes SHARED on success.

If an exclusive lock is requested, it is granted as soon as the file becomes unlocked. The lock status becomes EXCLUSIVE on success.

If the DB has a shared lock on it, a process that makes an exclusive lock request will poll until there are no reading or writing processes left. Lots of processes can successfully read the file, since they do not block each other. This means that a process that wants to write to the file may never get a chance to squeeze in, since it needs to obtain an exclusive lock.

Figure 19-2 represents a possible scenario in which everybody can read but no one can write. ("pX" represents different processes running at different times, all acquiring shared locks on the DBM file.)

Figure 19-2. Overlapping shared locks prevent an exclusive lock

The result is a starving process that will time out the request, which will fail to update the DB. Ken Williams solved this problem with his Tie::DB_Lock module, discussed later in this chapter.

There are several locking wrappers for DB_File on CPAN right now. Each one implements locking differently and has different goals in mind. It is worth knowing the differences between them, so that you can pick the right one for your application.