Aggregations
DTrace stores the results of aggregating functions in objects called aggregations. The aggregation results
are indexed using a tuple of expressions similar to those used for associative
arrays. In D, the syntax for an aggregation is:
@name[ keys ] = aggfunc ( args );
where name is the name of the aggregation, keys is a comma-separated
list of D expressions, aggfunc is one of the DTrace aggregating functions,
and args is a comma-separated list of arguments appropriate for the aggregating function. The
aggregation name is a D identifier that is prefixed with the special character
@. All aggregations named in your D programs are global variables; there are
no thread- or clause-local aggregations. The aggregation names are kept in a separate
identifier namespace from other D global variables. Remember that a and @a are
not the same variable if you reuse names. The special aggregation name @
can be used to name an anonymous aggregation in simple D programs. The
D compiler treats this name as an alias for the aggregation name @_.
The DTrace aggregating functions are shown in the following table. Most aggregating functions
take just a single argument that represents the new datum.
Table 9-1 DTrace Aggregating Functions
Function Name |
Arguments |
Result |
count |
none |
The number of
times called. |
sum |
scalar expression |
The total value of the specified expressions. |
avg |
scalar expression |
The arithmetic average
of the specified expressions. |
min |
scalar expression |
The smallest value among the specified expressions. |
max |
scalar expression |
The
largest value among the specified expressions. |
lquantize |
scalar expression, lower bound, upper bound, step
value |
A linear frequency distribution, sized by the specified range, of the values of
the specified expressions. Increments the value in the highest bucket that is
less than the specified expression. |
quantize |
scalar expression |
A power-of-two frequency distribution of the values
of the specified expressions. Increments the value in the highest power-of-two bucket that is
less than the specified expression. |
For example, to count the number of write(2) system calls in the
system, you could use an informative string as a key and the count()
aggregating function:
syscall::write:entry
{
@counts["write system calls"] = count();
}
The dtrace command prints aggregation results by default when the process terminates, either
as the result of an explicit END action or when the user presses
Control-C. The following example output shows the result of running this command, waiting
for a few seconds, and pressing Control-C:
# dtrace -s writes.d
dtrace: script './writes.d' matched 1 probe
^C
write system calls 179
#
You can count system calls per process name using the execname variable as the
key to an aggregation:
syscall::write:entry
{
@counts[execname] = count();
}
The following example output shows the result of running this command, waiting for
a few seconds, and pressing Control-C:
# dtrace -s writesbycmd.d
dtrace: script './writesbycmd.d' matched 1 probe
^C
dtrace 1
cat 4
sed 9
head 9
grep 14
find 15
tail 25
mountd 28
expr 72
sh 291
tee 814
def.dir.flp 1996
make.bin 2010
#
Alternatively, you might want to further examine writes organized by both executable name
and file descriptor. The file descriptor is the first argument to write(2), so the
following example uses a key consisting of both execname and arg0:
syscall::write:entry
{
@counts[execname, arg0] = count();
}
Running this command results in a table with both executable name and file
descriptor, as shown in the following example:
# dtrace -s writesbycmdfd.d
dtrace: script './writesbycmdfd.d' matched 1 probe
^C
cat 1 58
sed 1 60
grep 1 89
tee 1 156
tee 3 156
make.bin 5 164
acomp 1 263
macrogen 4 286
cg 1 397
acomp 3 736
make.bin 1 880
iropt 4 1731
#
The following example displays the average time spent in the write system call,
organized by process name. This example uses the avg() aggregating function, specifying the
expression to average as the argument. The example averages the wall clock time
spent in the system call:
syscall::write:entry
{
self->ts = timestamp;
}
syscall::write:return
/self->ts/
{
@time[execname] = avg(timestamp - self->ts);
self->ts = 0;
}
The following example output shows the result of running this command, waiting for
a few seconds, and pressing Control-C:
# dtrace -s writetime.d
dtrace: script './writetime.d' matched 2 probes
^C
iropt 31315
acomp 37037
make.bin 63736
tee 68702
date 84020
sh 91632
dtrace 159200
ctfmerge 321560
install 343300
mcs 394400
get 413695
ctfconvert 594400
bringover 1332465
tail 1335260
#
The average can be useful, but often does not provide sufficient detail to
understand the distribution of data points. To understand the distribution in further detail,
use the quantize() aggregating function as shown in the following example:
syscall::write:entry
{
self->ts = timestamp;
}
syscall::write:return
/self->ts/
{
@time[execname] = quantize(timestamp - self->ts);
self->ts = 0;
}
Because each line of output becomes a frequency distribution diagram, the output of
this script is substantially longer than previous ones. The following example shows a
selection of sample output:
lint
value ------------- Distribution ------------- count
8192 | 0
16384 | 2
32768 | 0
65536 |@@@@@@@@@@@@@@@@@@@ 74
131072 |@@@@@@@@@@@@@@@ 59
262144 |@@@ 14
524288 | 0
acomp
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@ 840
16384 |@@@@@@@@@@@ 750
32768 |@@ 165
65536 |@@@@@@ 460
131072 |@@@@@@ 446
262144 | 16
524288 | 0
1048576 | 1
2097152 | 0
iropt
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@ 4149
16384 |@@@@@@@@@@ 1798
32768 |@ 332
65536 |@ 325
131072 |@@ 431
262144 | 3
524288 | 2
1048576 | 1
2097152 | 0
Notice that the rows for the frequency distribution are always power-of-two values. Each
rows indicates the count of the number of elements greater than or equal to the corresponding
value, but less than the next larger row value. For example, the above output
shows that iropt had 4,149 writes taking between 8,192 nanoseconds and 16,383 nanoseconds,
inclusive.
While quantize() is useful for getting quick insight into the data, you might
want to examine a distribution across linear values instead. To display a linear
value distribution, use the lquantize() aggregating function. The lquantize() function takes three arguments in
addition to a D expression: a lower bound, an upper bound, and
a step. For example, if you wanted to look at the distribution of
writes by file descriptor, a power-of-two quantization would not be effective. Instead, use
a linear quantization with a small range, as shown in the following example:
syscall::write:entry
{
@fds[execname] = lquantize(arg0, 0, 100, 1);
}
Running this script for several seconds yields a large amount of information. The
following example shows a selection of typical output:
mountd
value ------------- Distribution ------------- count
11 | 0
12 |@ 4
13 | 0
14 |@@@@@@@@@@@@@@@@@@@@@@@@@ 70
15 | 0
16 |@@@@@@@@@@@@ 34
17 | 0
xemacs-20.4
value ------------- Distribution ------------- count
6 | 0
7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 521
8 | 0
9 | 1
10 | 0
make.bin
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3596
2 | 0
3 | 0
4 | 42
5 | 50
6 | 0
acomp
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@ 1156
2 | 0
3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6635
4 |@ 297
5 | 0
iropt
value ------------- Distribution ------------- count
2 | 0
3 | 299
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20144
5 | 0
You can also use the lquantize() aggregating function to aggregate on time since
some point in the past. This technique allows you to observe a change
in behavior over time. The following example displays the change in system call
behavior over the lifetime of a process executing the date(1) command:
syscall::exec:return,
syscall::exece:return
/execname == "date"/
{
self->start = vtimestamp;
}
syscall:::entry
/self->start/
{
/*
* We linearly quantize on the current virtual time minus our
* process's start time. We divide by 1000 to yield microseconds
* rather than nanoseconds. The range runs from 0 to 10 milliseconds
* in steps of 100 microseconds; we expect that no date(1) process
* will take longer than 10 milliseconds to complete.
*/
@a["system calls over time"] =
lquantize((vtimestamp - self->start) / 1000, 0, 10000, 100);
}
syscall::rexit:entry
/self->start/
{
self->start = 0;
}
The preceding script provides greater insight into system call behavior when many date(1)
processes are executed. To see this result, run sh -c 'while true; do date >/dev/null; done' in one window, while
executing the D script in another. The script produces a profile of the
system call behavior of the date(1) command:
# dtrace -s dateprof.d
dtrace: script './dateprof.d' matched 218 probes
^C
system calls over time
value ------------- Distribution ------------- count
< 0 | 0
0 |@@ 20530
100 |@@@@@@ 48814
200 |@@@ 28119
300 |@ 14646
400 |@@@@@ 41237
500 | 1259
600 | 218
700 | 116
800 |@ 12783
900 |@@@ 28133
1000 | 7897
1100 |@ 14065
1200 |@@@ 27549
1300 |@@@ 25715
1400 |@@@@ 35011
1500 |@@ 16734
1600 | 498
1700 | 256
1800 | 369
1900 | 404
2000 | 320
2100 | 555
2200 | 54
2300 | 17
2400 | 5
2500 | 1
2600 | 7
2700 | 0
This output provides a rough idea of the different phases of the
date(1) command with respect to the services required of the kernel. To better
understand these phases, you might want to understand which system calls are being
called when. If so, you could change the D script to aggregate on
the variable probefunc instead of a constant string.