Principal Buffer Policies
DTrace permits tracing in highly constrained contexts in the kernel. In particular, DTrace
permits tracing in contexts in which kernel software may not reliably allocate memory.
The consequence of this flexibility of context is that there always exists a
possibility that DTrace will attempt to trace data when there isn't space available.
DTrace must have a policy to deal with such situations when they arise,
but you might wish to tune the policy based on the needs of
a given experiment. Sometimes the appropriate policy might be to discard the new
data. Other times it might be desirable to reuse the space containing the
oldest recorded data to trace new data. Most often, the desired policy is
to minimize the likelihood of running out of available space in the first
place. To accommodate these varying demands, DTrace supports several different buffer policies. This
support is implemented with the bufpolicy option, and can be set on a per-consumer
basis. See Chapter 16, Options and Tunables for more details on setting options.
switch Policy
By default, the principal buffer has a switch buffer policy. Under this policy, per-CPU
buffers are allocated in pairs: one buffer is active and the other buffer
is inactive. When a DTrace consumer attempts to read a buffer, the kernel
firsts switches the inactive and active buffers. Buffer switching is done in such
a manner that there is no window in which tracing data may be
lost. Once the buffers are switched, the newly inactive buffer is copied out
to the DTrace consumer. This policy assures that the consumer always sees a
self-consistent buffer: a buffer is never simultaneously traced to and copied out. This
technique also avoids introducing a window in which tracing is paused or otherwise
prevented. The rate at which the buffer is switched and read out is
controlled by the consumer with the switchrate option. As with any rate
option, switchrate may be specified with any time suffix, but defaults to rate-per-second.
For more details on switchrate and other options, see Chapter 16, Options and Tunables.
Note - To process the principal buffer at user-level at a rate faster than the
default of once per second, tune the value of switchrate. The system processes
actions that induce user-level activity (such as printa() and system()) when the
corresponding record in the principal buffer is processed. The value of switchrate dictates the
rate at which the system processes such actions.
Under the switch policy, if a given enabled probe would trace more data
than there is space available in the active principal buffer, the data is
dropped and a per-CPU drop count is incremented. In the event of one
or more drops, dtrace(1M) displays a message similar to the following example:
dtrace: 11 drops on CPU 0
If a given record is larger than the total buffer size, the
record will be dropped regardless of buffer policy. You can reduce or eliminate
drops by either increasing the size of the principal buffer with the bufsize
option or by increasing the switching rate with the switchrate option.
Under the switch policy, scratch space for copyin(), copyinstr(), and alloca() is allocated
out of the active buffer.
fill Policy
For some problems, you might wish to use a single in-kernel buffer. While
this approach can be implemented with the switch policy and appropriate D constructs by
incrementing a variable in D and predicating an exit() action appropriately, such an
implementation does not eliminate the possibility of drops. To request a single, large
in-kernel buffer, and continue tracing until one or more of the per-CPU buffers
has filled, use the fill buffer policy. Under this policy, tracing continues until
an enabled probe attempts to trace more data than can fit in the
remaining principal buffer space. When insufficient space remains, the buffer is marked as
filled and the consumer is notified that at least one of its per-CPU
buffers has filled. Once dtrace(1M) detects a single filled buffer, tracing is stopped,
all buffers are processed and dtrace exits. No further data will be traced
to a filled buffer even if the data would fit in the buffer.
To use the fill policy, set the bufpolicy option to fill. For example,
the following command traces every system call entry into a per-CPU 2K buffer
with the buffer policy set to fill:
# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill
fill Policy and END Probes
END probes normally do not fire until tracing has been explicitly stopped by
the DTrace consumer. END probes are guaranteed to only fire on one CPU, but
the CPU on which the probe fires is undefined. With fill buffers, tracing
is explicitly stopped when at least one of the per-CPU principal buffers has
been marked as filled. If the fill policy is selected, the END probe
may fire on a CPU that has a filled buffer. To accommodate END
tracing in fill buffers, DTrace calculates the amount of space potentially consumed by
END probes and subtracts this space from the size of the principal buffer. If
the net size is negative, DTrace will refuse to start, and dtrace(1M)
will output a corresponding error message:
dtrace: END enablings exceed size of principal buffer
The reservation mechanism ensures that a full buffer always has sufficient space for
any END probes.
ring Policy
The DTrace ring buffer policy helps you trace the events leading up to
a failure. If reproducing the failure takes hours or days, you might wish
to keep only the most recent data. Once a principal buffer has filled,
tracing wraps around to the first entry, thereby overwriting older tracing data. You
establish the ring buffer by setting the bufpolicy option to the string ring:
# dtrace -s foo.d -x bufpolicy=ring
When used to create a ring buffer, dtrace(1M) will not display any
output until the process is terminated. At that time, the ring buffer is
consumed and processed. dtrace processes each ring buffer in CPU order. Within a CPU's
buffer, trace records will be displayed in order from oldest to youngest. Just
as with the switch buffering policy, no ordering exists between records from different
CPUs are made. If such an ordering is required, you should trace the
timestamp variable as part of your tracing request.
The following example demonstrates the use of a #pragma option directive to enable ring
buffering:
#pragma D option bufpolicy=ring
#pragma D option bufsize=16k
syscall:::entry
/execname == $1/
{
trace(timestamp);
}
syscall::rexit:entry
{
exit(0);
}