Examples
The following example script displays pertinent information for every I/O as it's issued:
#pragma D option quiet
BEGIN
{
printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW");
}
io:::start
{
printf("%10s %58s %2s\n", args[1]->dev_statname,
args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W");
}
The output of the example when cold-starting Acrobat Reader on an x86 laptop
system resembles the following example:
# dtrace -s ./iosnoop.d
DEVICE FILE RW
cmdk0 /opt/Acrobat4/bin/acroread R
cmdk0 /opt/Acrobat4/bin/acroread R
cmdk0 <unknown> R
cmdk0 /opt/Acrobat4/Reader/AcroVersion R
cmdk0 <unknown> R
cmdk0 <unknown> R
cmdk0 <none> R
cmdk0 <unknown> R
cmdk0 <none> R
cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R
cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R
cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R
cmdk0 <none> R
cmdk0 <unknown> R
cmdk0 <unknown> R
cmdk0 <unknown> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 <none> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 <unknown> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 <none> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R
cmdk0 <unknown> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R
cmdk0 <none> R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R
cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R
...
The <none> entries in the output indicate that the I/O doesn't correspond to
the data in any particular file: these I/Os are due to metadata of
one form or another. The <unknown> entries in the output indicate that the
pathname for the file is not known. This situation is relatively rare.
You could make the example script slightly more sophisticated by using an associative
array to track the time spent on each I/O, as shown in
the following example:
#pragma D option quiet
BEGIN
{
printf("%10s %58s %2s %7s\n", "DEVICE", "FILE", "RW", "MS");
}
io:::start
{
start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
}
io:::done
/start[args[0]->b_edev, args[0]->b_blkno]/
{
this->elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno];
printf("%10s %58s %2s %3d.%03d\n", args[1]->dev_statname,
args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W",
this->elapsed / 10000000, (this->elapsed / 1000) % 1000);
start[args[0]->b_edev, args[0]->b_blkno] = 0;
}
The output of the above example while hot-plugging a USB storage device into
an otherwise idle x86 laptop system is shown in the following example:
# dtrace -s ./iotime.d
DEVICE FILE RW MS
cmdk0 /kernel/drv/scsa2usb R 24.781
cmdk0 /kernel/drv/scsa2usb R 25.208
cmdk0 /var/adm/messages W 25.981
cmdk0 /kernel/drv/scsa2usb R 5.448
cmdk0 <none> W 4.172
cmdk0 /kernel/drv/scsa2usb R 2.620
cmdk0 /var/adm/messages W 0.252
cmdk0 <unknown> R 3.213
cmdk0 <none> W 3.011
cmdk0 <unknown> R 2.197
cmdk0 /var/adm/messages W 2.680
cmdk0 <none> W 0.436
cmdk0 /var/adm/messages W 0.542
cmdk0 <none> W 0.339
cmdk0 /var/adm/messages W 0.414
cmdk0 <none> W 0.344
cmdk0 /var/adm/messages W 0.361
cmdk0 <none> W 0.315
cmdk0 /var/adm/messages W 0.421
cmdk0 <none> W 0.349
cmdk0 <none> R 1.524
cmdk0 <unknown> R 3.648
cmdk0 /usr/lib/librcm.so.1 R 2.553
cmdk0 /usr/lib/librcm.so.1 R 1.332
cmdk0 /usr/lib/librcm.so.1 R 0.222
cmdk0 /usr/lib/librcm.so.1 R 0.228
cmdk0 /usr/lib/librcm.so.1 R 0.927
cmdk0 <none> R 1.189
...
cmdk0 /usr/lib/devfsadm/linkmod R 1.110
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_audio_link.so R 1.763
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_audio_link.so R 0.161
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_cfg_link.so R 0.819
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_cfg_link.so R 0.168
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_disk_link.so R 0.886
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_disk_link.so R 0.185
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_fssnap_link.so R 0.778
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_fssnap_link.so R 0.166
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_lofi_link.so R 1.634
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_lofi_link.so R 0.163
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_md_link.so R 0.477
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_md_link.so R 0.161
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.198
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.168
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.247
cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link_i386.so R 1.735
...
You can make several observations about the mechanics of the system based on
this output. First, note the long time to perform the first several I/Os,
which took about 25 milliseconds each. This time might have been due to
the cmdk0 device having been power managed on the laptop. Second, observe the
I/O due to the scsa2usb(7D) driver loading to deal with USB Mass Storage
device. Third, note the writes to /var/adm/messages as the device is reported. Finally,
observe the reading of the device link generators (the files ending in link.so)
, which presumably deal with the new device.
The io provider enables in-depth understanding of iostat(1M) output. Assume you observe
iostat output similar to the following example:
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
cmdk0 8.0 0.0 399.8 0.0 0.0 0.0 0.8 0 1
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd2 0.0 109.0 0.0 435.9 0.0 1.0 8.9 0 97
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
You can use the iotime.d script to see these I/Os as they happen,
as shown in the following example:
DEVICE FILE RW MS
sd2 /mnt/archives.tar W 0.856
sd2 /mnt/archives.tar W 0.729
sd2 /mnt/archives.tar W 0.890
sd2 /mnt/archives.tar W 0.759
sd2 /mnt/archives.tar W 0.884
sd2 /mnt/archives.tar W 0.746
sd2 /mnt/archives.tar W 0.891
sd2 /mnt/archives.tar W 0.760
sd2 /mnt/archives.tar W 0.889
cmdk0 /export/archives/archives.tar R 0.827
sd2 /mnt/archives.tar W 0.537
sd2 /mnt/archives.tar W 0.887
sd2 /mnt/archives.tar W 0.763
sd2 /mnt/archives.tar W 0.878
sd2 /mnt/archives.tar W 0.751
sd2 /mnt/archives.tar W 0.884
sd2 /mnt/archives.tar W 0.760
sd2 /mnt/archives.tar W 3.994
sd2 /mnt/archives.tar W 0.653
sd2 /mnt/archives.tar W 0.896
sd2 /mnt/archives.tar W 0.975
sd2 /mnt/archives.tar W 1.405
sd2 /mnt/archives.tar W 0.724
sd2 /mnt/archives.tar W 1.841
cmdk0 /export/archives/archives.tar R 0.549
sd2 /mnt/archives.tar W 0.543
sd2 /mnt/archives.tar W 0.863
sd2 /mnt/archives.tar W 0.734
sd2 /mnt/archives.tar W 0.859
sd2 /mnt/archives.tar W 0.754
sd2 /mnt/archives.tar W 0.914
sd2 /mnt/archives.tar W 0.751
sd2 /mnt/archives.tar W 0.902
sd2 /mnt/archives.tar W 0.735
sd2 /mnt/archives.tar W 0.908
sd2 /mnt/archives.tar W 0.753
This output appears to show that the file archives.tar is being read from
cmdk0 (in /export/archives), and being written to device sd2 (in /mnt). This existence of
two files named archives.tar that are being operated on separately in parallel seems
unlikely. To investigate further, you can aggregate on device, application, process ID and
bytes transferred, as shown in the following example:
#pragma D option quiet
io:::start
{
@[args[1]->dev_statname, execname, pid] = sum(args[0]->b_bcount);
}
END
{
printf("%10s %20s %10s %15s\n", "DEVICE", "APP", "PID", "BYTES");
printa("%10s %20s %10d %15@d\n", @);
}
Running this script for a few seconds results in output similar to
the following example:
# dtrace -s ./whoio.d
^C
DEVICE APP PID BYTES
cmdk0 cp 790 1515520
sd2 cp 790 1527808
This output shows that this activity is a copy of the file archives.tar
from one device to another. This conclusion leads to another natural question: is
one of these devices faster than the other? Which device acts as the
limiter on the copy? To answer these questions, you need to know the
effective throughput of each device rather than the number of bytes per second
each device is transferring. You can determine the throughput with the following example
script:
#pragma D option quiet
io:::start
{
start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
}
io:::done
/start[args[0]->b_edev, args[0]->b_blkno]/
{
/*
* We want to get an idea of our throughput to this device in KB/sec.
* What we have, however, is nanoseconds and bytes. That is we want
* to calculate:
*
* bytes / 1024
* ------------------------
* nanoseconds / 1000000000
*
* But we can't calculate this using integer arithmetic without losing
* precision (the denomenator, for one, is between 0 and 1 for nearly
* all I/Os). So we restate the fraction, and cancel:
*
* bytes 1000000000 bytes 976562
* --------- * ------------- = --------- * -------------
* 1024 nanoseconds 1 nanoseconds
*
* This is easy to calculate using integer arithmetic; this is what
* we do below.
*/
this->elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno];
@[args[1]->dev_statname, args[1]->dev_pathname] =
quantize((args[0]->b_bcount * 976562) / this->elapsed);
start[args[0]->b_edev, args[0]->b_blkno] = 0;
}
END
{
printa(" %s (%s)\n%@d\n", @);
}
Running the example script for several seconds yields the following output:
sd2 (/devices/pci@0,0/pci1179,1@1d/storage@2/disk@0,0:r)
value ------------- Distribution ------------- count
32 | 0
64 | 3
128 | 1
256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2257
512 | 1
1024 | 0
cmdk0 (/devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0:a)
value ------------- Distribution ------------- count
128 | 0
256 | 1
512 | 0
1024 | 2
2048 | 0
4096 | 2
8192 |@@@@@@@@@@@@@@@@@@ 172
16384 |@@@@@ 52
32768 |@@@@@@@@@@@ 108
65536 |@@@ 34
131072 | 0
The output shows that sd2 is clearly the limiting device. The sd2 throughput
is between 256K/sec and 512K/sec, while cmdk0 is delivering I/O at anywhere from
8 MB/second to over 64 MB/second. The script prints out both the name
as seen in iostat, and the full path of the device. To
find out more about the device, you could specify the device path to
prtconf, as shown in the following example:
# prtconf -v /devices/pci@0,0/pci1179,1@1d/storage@2/disk@0,0
disk, instance #2 (driver name: sd)
Driver properties:
name='lba-access-ok' type=boolean dev=(29,128)
name='removable-media' type=boolean dev=none
name='pm-components' type=string items=3 dev=none
value='NAME=spindle-motor' + '0=off' + '1=on'
name='pm-hardware-state' type=string items=1 dev=none
value='needs-suspend-resume'
name='ddi-failfast-supported' type=boolean dev=none
name='ddi-kernel-ioctl' type=boolean dev=none
Hardware properties:
name='inquiry-revision-id' type=string items=1
value='1.04'
name='inquiry-product-id' type=string items=1
value='STORAGE DEVICE'
name='inquiry-vendor-id' type=string items=1
value='Generic'
name='inquiry-device-type' type=int items=1
value=00000000
name='usb' type=boolean
name='compatible' type=string items=1
value='sd'
name='lun' type=int items=1
value=00000000
name='target' type=int items=1
value=00000000
As the emphasized terms indicate, this device is a removable USB storage device.
The examples in this section have explored all I/O requests. However, you might
only be interested in one type of request. The following example tracks the
directories in which writes are occurring, along with the applications performing the writes:
#pragma D option quiet
io:::start
/args[0]->b_flags & B_WRITE/
{
@[execname, args[2]->fi_dirname] = count();
}
END
{
printf("%20s %51s %5s\n", "WHO", "WHERE", "COUNT");
printa("%20s %51s %5@d\n", @);
}
Running this example script on a desktop workload for a period of
time yields some interesting results, as shown in the following example output:
# dtrace -s ./whowrite.d
^C
WHO WHERE COUNT
su /var/adm 1
fsflush /etc 1
fsflush / 1
fsflush /var/log 1
fsflush /export/bmc/lisa 1
esd /export/bmc/.phoenix/default/78cxczuy.slt/Cache 1
fsflush /export/bmc/.phoenix 1
esd /export/bmc/.phoenix/default/78cxczuy.slt 1
vi /var/tmp 2
vi /etc 2
cat <none> 2
bash / 2
vi <none> 3
xterm /var/adm 3
fsflush /export/bmc 7
MozillaFirebird <none> 8
vim /export/bmc 9
MozillaFirebird /export/bmc 10
fsflush /var/adm 11
devfsadm /dev 14
ksh <none> 71
ksh /export/bmc 71
fsflush /export/bmc/.phoenix/default/78cxczuy.slt 119
MozillaFirebird /export/bmc/.phoenix/default/78cxczuy.slt 119
fsflush <none> 211
MozillaFirebird /export/bmc/.phoenix/default/78cxczuy.slt/Cache 591
fsflush /export/bmc/.phoenix/default/78cxczuy.slt/Cache 666
sched <none> 2385
As the output indicates, virtually all writes are associated with the Mozilla Firebird
cache. The writes labeled <none> are likely due to writes associated with the
UFS log, writes that are themselves induced by other writes in the filesystem.
See ufs(7FS) for details on logging. This example shows how to use the
io provider to discover a problem at a much higher layer of software.
In this case, the script has revealed a configuration problem: the web browser
would induce much less I/O (and quite likely none at all) if its
cache were in a directory in a tmpfs(7FS) filesystem.
The previous examples have used only the start and done probes. You
can use the wait-start and wait-done probes to understand why applications block
for I/O – and for how long. The following example script uses both
io probes and sched probes (see Chapter 26, sched Provider) to derive CPU time compared to
I/O wait time for the StarOffice software:
#pragma D option quiet
sched:::on-cpu
/execname == "soffice.bin"/
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@time["<on cpu>"] = sum(vtimestamp - self->on);
self->on = 0;
}
io:::wait-start
/execname == "soffice.bin"/
{
self->wait = timestamp;
}
io:::wait-done
/self->wait/
{
@io[args[2]->fi_name] = sum(timestamp - self->wait);
@time["<I/O wait>"] = sum(timestamp - self->wait);
self->wait = 0;
}
END
{
printf("Time breakdown (milliseconds):\n");
normalize(@time, 1000000);
printa(" %-50s %15@d\n", @time);
printf("\nI/O wait breakdown (milliseconds):\n");
normalize(@io, 1000000);
printa(" %-50s %15@d\n", @io);
}
Running the example script during a cold start of the StarOffice software yields
the following output:
Time breakdown (milliseconds):
<on cpu> 3634
<I/O wait> 13114
I/O wait breakdown (milliseconds):
soffice.tmp 0
Office 0
unorc 0
sbasic.cfg 0
en 0
smath.cfg 0
toolboxlayout.xml 0
sdraw.cfg 0
swriter.cfg 0
Linguistic.dat 0
scalc.cfg 0
Views.dat 0
Store.dat 0
META-INF 0
Common.xml.tmp 0
afm 0
libsimreg.so 1
xiiimp.so.2 3
outline 4
Inet.dat 6
fontmetric 6
...
libucb1.so 44
libj641si_g.so 46
libX11.so.4 46
liblng641si.so 48
swriter.db 53
libwrp641si.so 53
liblocaledata_ascii.so 56
libi18npool641si.so 65
libdbtools2.so 69
ofa64101.res 74
libxcr641si.so 82
libucpchelp1.so 83
libsot641si.so 86
libcppuhelper3C52.so 98
libfwl641si.so 100
libsb641si.so 104
libcomphelp2.so 105
libxo641si.so 106
libucpfile1.so 110
libcppu.so.3 111
sw64101.res 114
libdb-3.2.so 119
libtk641si.so 126
libdtransX11641si.so 127
libgo641si.so 132
libfwe641si.so 150
libi18n641si.so 152
libfwi641si.so 154
libso641si.so 173
libpsp641si.so 186
libtl641si.so 189
<unknown> 189
libucbhelper1C52.so 195
libutl641si.so 213
libofa641si.so 216
libfwk641si.so 229
libsvl641si.so 261
libcfgmgr2.so 368
libsvt641si.so 373
libvcl641si.so 741
libsvx641si.so 885
libsfx641si.so 993
<none> 1096
libsw641si.so 1365
applicat.rdb 1580
As this output shows, much of the cold StarOffice start time is
due to waiting for I/O. (13.1 seconds waiting for I/O as opposed to
3.6 seconds on CPU.) Running the script on a warm start of the
StarOffice software reveals that page caching has eliminated the I/O time , as
shown in the following example output:
Time breakdown (milliseconds):
<I/O wait> 0
<on cpu> 2860
I/O wait breakdown (milliseconds):
temp 0
soffice.tmp 0
<unknown> 0
Office 0
The cold start output shows that the file applicat.rdb accounts for more I/O
wait time than any other file. This result is presumably due to many
I/Os to the file. To explore the I/Os performed to this file,
you can use the following D script:
io:::start
/execname == "soffice.bin" && args[2]->fi_name == "applicat.rdb"/
{
@ = lquantize(args[2]->fi_offset != -1 ?
args[2]->fi_offset / (1000 * 1024) : -1, 0, 1000);
}
This script uses the fi_offset field of the fileinfo_t structure to understand
which parts of the file are being accessed, at the granularity of a
megabyte. Running this script during a cold start of the StarOffice software results
in output similar to the following example:
# dtrace -s ./applicat.d
dtrace: script './applicat.d' matched 4 probes
^C
value ------------- Distribution ------------ count
< 0 | 0
0 |@@@ 28
1 |@@ 17
2 |@@@@ 35
3 |@@@@@@@@@ 72
4 |@@@@@@@@@@ 78
5 |@@@@@@@@ 65
6 | 0
This output indicates that only the first six megabytes of the file are
accessed, perhaps because the file is six megabytes in size. The output also
indicates that the entire file is not accessed. If you wanted to
improve the cold start time of StarOffice, you might want to understand the
access pattern of the file. If the needed sections of the file could
be largely contiguous, one way to improve StarOffice cold start time might be
to have a scout thread run ahead of the application, inducing the I/O
to the file before it's needed. (This approach is particularly straightforward if the
file is accessed using mmap(2).) However, the approximately 1.6 seconds that this strategy
would gain in cold start time does not merit the additional complexity and
maintenance burden in the application. Either way, the data gathered with the io
provider allows a precise understanding of the benefit that such work could ultimately
deliver.