openSolaris 2008 - Example - Solaris Dynamic Tracing Guide

Example

Examine the following output from vmstat(1M):

kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd s0 — —   in   sy   cs us sy id
 0 1 0 1341844 836720 26 311 1644 0 0 0  0 216 0  0  0  797  817  697  9 10 81
 0 1 0 1341344 835300 238 934 1576 0 0 0 0 194 0  0  0  750 2795  791  7 14 79
 0 1 0 1340764 833668 24 165 1149 0 0 0  0 133 0  0  0  637  813  547  5  4 91
 0 1 0 1340420 833024 24 394 1002 0 0 0  0 130 0  0  0  621 2284  653 14  7 79
 0 1 0 1340068 831520 14 202 380 0 0  0  0 59  0  0  0  482 5688 1434 25  7 68

The pi column in the above output denotes the number of pages paged in. The vminfo provider enables you to learn more about the source of these page-ins, as shown in the following example:

dtrace -n pgin'{@[execname] = count()}'
dtrace: description 'pgin' matched 1 probe
^C
  xterm                                                             1
  ksh                                                               1
  ls                                                                2
  lpstat                                                            7
  sh                                                               17
  soffice                                                          39
  javaldx                                                         103
  soffice.bin                                                    3065

The output shows that a process associated with the StarOffice^TM software, soffice.bin, is responsible for most of the page-ins. To get a better picture of soffice.bin in terms of virtual memory behavior, you could enable all vminfo probes. The following example runs dtrace(1M) while launching the StarOffice software:

dtrace -P vminfo'/execname == "soffice.bin"/{@[probename] = count()}'
dtrace: description 'vminfo' matched 42 probes
^C

  kernel_asflt                                                      1
  fspgin                                                           10
  pgout                                                            16
  execfree                                                         16
  execpgout                                                        16
  fsfree                                                           16
  fspgout                                                          16
  anonfree                                                         16
  anonpgout                                                        16
  pgpgout                                                          16
  dfree                                                            16
  execpgin                                                         80
  prot_fault                                                       85
  maj_fault                                                        88
  pgin                                                             90
  pgpgin                                                           90
  cow_fault                                                       859
  zfod                                                           1619
  pgfrec                                                         8811
  pgrec                                                          8827
  as_fault                                                       9495

The following example script provides more information about the virtual memory behavior of the StarOffice software during its startup:

vminfo:::maj_fault,
vminfo:::zfod,
vminfo:::as_fault
/execname == "soffice.bin" && start == 0/
{
    /*
     * This is the first time that a vminfo probe has been hit; record
     * our initial timestamp.
     */
    start = timestamp;
}

vminfo:::maj_fault,
vminfo:::zfod,
vminfo:::as_fault
/execname == "soffice.bin"/
{
    /*
     * Aggregate on the probename, and lquantize() the number of seconds
     * since our initial timestamp.  (There are 1,000,000,000 nanoseconds
     * in a second.)  We assume that the script will be terminated before
     * 60 seconds elapses.
     */
    @[probename] =
        lquantize((timestamp - start) / 1000000000, 0, 60);
}

Run the script while again starting the StarOffice software. Then, create a new drawing, create a new presentation, and then close all files and quit the application. Press Control-C in the shell running the D script. The results provide a view of some virtual memory behavior over time:

# dtrace -s ./soffice.d
dtrace: script './soffice.d' matched 10 probes
^C

 maj_fault                                         
           value  ------------- Distribution ------------- count    
               7 |                                         0        
               8 |@@@@@@@@@                                88       
               9 |@@@@@@@@@@@@@@@@@@@@                     194      
              10 |@                                        18       
              11 |                                         0        
              12 |                                         0        
              13 |                                         2        
              14 |                                         0        
              15 |                                         1        
              16 |@@@@@@@@                                 82       
              17 |                                         0        
              18 |                                         0        
              19 |                                         2        
              20 |                                         0        

  zfod                                              
           value  ------------- Distribution ------------- count    
             < 0 |                                         0        
               0 |@@@@@@@                                  525      
               1 |@@@@@@@@                                 605      
               2 |@@                                       208      
               3 |@@@                                      280      
               4 |                                         4        
               5 |                                         0        
               6 |                                         0        
               7 |                                         0        
               8 |                                         44       
               9 |@@                                       161      
              10 |                                         2        
              11 |                                         0        
              12 |                                         0        
              13 |                                         4        
              14 |                                         0        
              15 |                                         29       
              16 |@@@@@@@@@@@@@@                           1048     
              17 |                                         24       
              18 |                                         0        
              19 |                                         0        
              20 |                                         1        
              21 |                                         0        
              22 |                                         3        
              23 |                                         0        

  as_fault                                          
           value  ------------- Distribution ------------- count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@@                            4139     
               1 |@@@@@@@                                  2249     
               2 |@@@@@@@                                  2402     
               3 |@                                        594      
               4 |                                         56       
               5 |                                         0        
               6 |                                         0        
               7 |                                         0        
               8 |                                         189      
               9 |@@                                       929      
              10 |                                         39       
              11 |                                         0        
              12 |                                         0        
              13 |                                         6        
              14 |                                         0        
              15 |                                         297      
              16 |@@@@                                     1349     
              17 |                                         24       
              18 |                                         0        
              19 |                                         21       
              20 |                                         1        
              21 |                                         0        
              22 |                                         92       
              23 |                                         0

The output shows some StarOffice behavior with respect to the virtual memory system. For example, the maj_fault probe didn't fire until a new instance of the application was started. As you would hope, a “warm start” of StarOffice did not result in new major faults. The as_fault output shows an initial burst of activity, latency while the user located the menu to create a new drawing, another period of idleness, and a final burst of activity when the user clicked on a new presentation. The zfod output shows that creating the new presentation induced significant pressure for zero-filled pages, but only for a short period of time.

The next iteration of DTrace investigation in this example would depend on the direction you want to explore. If you want to understand the source of the demand for zero-filled pages, you could aggregate on ustack() in a zfod enabling. You might want to establish a threshold for zero-filled pages and use the stop() destructive action to stop the offending process when the threshold is exceeded. This approach would enable you to use more traditional debugging tools like truss(1) or mdb(1). The vminfo provider enables you to associate statistics seen in the output of conventional tools like vmstat(1M) with the applications that are inducing the systemic behavior.