Data Normalization
When aggregating data over some period of time, you might want to normalize
the data with respect to some constant factor. This technique enables you to
compare disjoint data more easily. For example, when aggregating system calls, you might
want to output system calls as a per-second rate instead of as an
absolute value over the course of the run. The DTrace normalize() action enables
you to normalize data in this way. The parameters to normalize() are an aggregation
and a normalization factor. The output of the aggregation shows each value divided
by the normalization factor.
The following example shows how to aggregate data by system call:
#pragma D option quiet
BEGIN
{
/*
* Get the start time, in nanoseconds.
*/
start = timestamp;
}
syscall:::entry
{
@func[execname] = count();
}
END
{
/*
* Normalize the aggregation based on the number of seconds we have
* been running. (There are 1,000,000,000 nanoseconds in one second.)
*/
normalize(@func, (timestamp - start) / 1000000000);
}
Running the above script for a brief period of time results in
the following output on a desktop machine:
# dtrace -s ./normalize.d
^C
syslogd 0
rpc.rusersd 0
utmpd 0
xbiff 0
in.routed 1
sendmail 2
echo 2
FvwmAuto 2
stty 2
cut 2
init 2
pt_chmod 3
picld 3
utmp_update 3
httpd 4
xclock 5
basename 6
tput 6
sh 7
tr 7
arch 9
expr 10
uname 11
mibiisa 15
dirname 18
dtrace 40
ksh 48
java 58
xterm 100
nscd 120
fvwm2 154
prstat 180
perfbar 188
Xsun 1309
.netscape.bin 3005
normalize() sets the normalization factor for the specified aggregation, but this action does
not modify the underlying data. denormalize() takes only an aggregation. Adding the
denormalize action to the preceding example returns both raw system call counts and
per-second rates:
#pragma D option quiet
BEGIN
{
start = timestamp;
}
syscall:::entry
{
@func[execname] = count();
}
END
{
this->seconds = (timestamp - start) / 1000000000;
printf("Ran for %d seconds.\n", this->seconds);
printf("Per-second rate:\n");
normalize(@func, this->seconds);
printa(@func);
printf("\nRaw counts:\n");
denormalize(@func);
printa(@func);
}
Running the above script for a brief period of time produces output similar
to the following example:
# dtrace -s ./denorm.d
^C
Ran for 14 seconds.
Per-second rate:
syslogd 0
in.routed 0
xbiff 1
sendmail 2
elm 2
picld 3
httpd 4
xclock 6
FvwmAuto 7
mibiisa 22
dtrace 42
java 55
xterm 75
adeptedit 118
nscd 127
prstat 179
perfbar 184
fvwm2 296
Xsun 829
Raw counts:
syslogd 1
in.routed 4
xbiff 21
sendmail 30
elm 36
picld 43
httpd 56
xclock 91
FvwmAuto 104
mibiisa 314
dtrace 592
java 774
xterm 1062
adeptedit 1665
nscd 1781
prstat 2506
perfbar 2581
fvwm2 4156
Xsun 11616
Aggregations can also be renormalized. If normalize() is called more than once for
the same aggregation, the normalization factor will be the factor specified in the
most recent call. The following example prints per-second rates over time:
Example 9-1 renormalize.d: Renormalizing an Aggregation
#pragma D option quiet
BEGIN
{
start = timestamp;
}
syscall:::entry
{
@func[execname] = count();
}
tick-10sec
{
normalize(@func, (timestamp - start) / 1000000000);
printa(@func);
}