openSolaris 2008 - Monitoring Network Performance - System Administration Guide: Network Services

Monitoring Network Performance

Table 30-1 describes the commands that are available for monitoring network performance.

Table 30-1 Network Monitoring Commands

Command	Description
`ping`	Look at the response of hosts on the network.
`spray`	Test the reliability of your packet sizes. This command can tell you whether the network is delaying packets or dropping packets.
`snoop`	Capture packets from the network and trace the calls from each client to each server.
`netstat`	Display network status, including state of the interfaces that are used for TCP/IP traffic, the IP routing table, and the per-protocol statistics for `UDP`, `TCP`, `ICMP`, and `IGMP`.
`nfsstat`	Display a summary of server and client statistics that can be used to identify NFS problems.

How to Check the Response of Hosts on the Network

Check the response of hosts on the network with the ping command.

$ ping hostname

If you suspect a physical problem, you can use ping to find the response time of several hosts on the network. If the response from one host is not what you would expect, you can investigate that host. Physical problems could be caused by the following:

Loose cables or connectors
Improper grounding
No termination
Signal reflection

For more information about this command, see ping(1M).

Example 30-1 Checking the Response of Hosts on the Network

The simplest version of ping sends a single packet to a host on the network. If ping receives the correct response, the command prints the message host is alive.

$ ping elvis
elvis is alive

With the -s option, ping sends one datagram per second to a host. The command then prints each response and the time that was required for the round trip. An example follows.

$ ping -s pluto
64 bytes from pluto (123.456.78.90): icmp_seq=0. time=3.82 ms
64 bytes from pluto (123.456.78.90): icmp_seq=5. time=0.947 ms
64 bytes from pluto (123.456.78.90): icmp_seq=6. time=0.855 ms
^C
----pluto PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
 
round-trip (ms) min/avg/max/sttdev = 0.855/1.87/3.82/1.7

How to Send Packets to Hosts on the Network

Test the reliability of your packet sizes with the spray command.

$ spray [ -c count -d interval -l packet-size] hostname

-i count: Number of packets to send.
-d interval: Number of microseconds to pause between sending packets. If you do not use a delay, you might deplete the buffers.
-l packet-size: Is the packet size.
hostname: Is the system to send packets.

For more information about this command, see spray(1M).

Example 30-2 Sending Packets to Hosts on the Network

The following example sends 100 packets to a host (-c 100), with a packet size of 2048 bytes (-l 2048). The packets are sent with a delay time of 20 microseconds between each burst (-d 20).

$ spray -c 100 -d 20 -l 2048 pluto
sending 100 packets of length 2048 to pluto ...
no packets dropped by pluto
279 packets/sec, 573043 bytes/sec

How to Capture Packets From the Network

To capture packets from the network and trace the calls from each client to each server, use snoop. This command provides accurate timestamps that enable some network performance problems to be isolated quickly. For more information, see snoop(1M).

# snoop

Dropped packets could be caused by insufficient buffer space or an overloaded CPU.

How to Check the Network Status

To display network status information, such as statistics about the state of network interfaces, routing tables, and various protocols, use the netstat command.

$ netstat [-i] [-r] [-s]

-i: Displays the state of the TCP/IP interfaces
-r: Displays the IP routing table
-s: Displays statistics for the UDP, TCP, ICMP, and IGMP protocols

For more information, see netstat(1M).

Examples–Checking the Network Status

The following example shows output from the netstat -i command, which displays the state of the interfaces that are used for TCP/IP traffic.

$ netstat -i
Name  Mtu  Net/Dest    Address      Ipkts  Ierrs Opkts  Oerrs Collis Queue
lo0   8232 software    localhost     1280   0     1280     0       0    0
eri0   1500 loopback    venus      1628480   0   347070    16   39354    0

This display shows the number of packets that a machine has transmitted and has received on each interface. A machine with active network traffic should show both Ipkts and Opkts continually increasing.

Calculate the network collisions rate by dividing the number of collision counts (Collis) by the number of out packets (Opkts). In the previous example, the collision rate is 11 percent. A network-wide collision rate that is greater than 5 to 10 percent can indicate a problem.

Calculate the error rate for the input packets by dividing the number of input errors by the total number of input packets (Ierrs/Ipkts). The error rate for the output packets is the number of output errors divided by the total number of output packets (Oerrs/Opkts). If the input error rate is high, at over 0.25 percent, the host might be dropping packets.

The following example shows output from the netstat -s command, which displays the per-protocol statistics for the UDP, TCP, ICMP, and IGMP protocols.

UDP
    udpInDatagrams      =196543    udpInErrors         =     0
    udpOutDatagrams     =187820
 
TCP
    tcpRtoAlgorithm     =     4    tcpRtoMin           =   200
    tcpRtoMax           = 60000    tcpMaxConn          =    -1
    tcpActiveOpens      = 26952    tcpPassiveOpens     =   420
    tcpAttemptFails     =  1133    tcpEstabResets      =     9
    tcpCurrEstab        =    31    tcpOutSegs          =3957636
    tcpOutDataSegs      =2731494   tcpOutDataBytes     =1865269594
    tcpRetransSegs      = 36186    tcpRetransBytes     =3762520
    tcpOutAck           =1225849   tcpOutAckDelayed    =165044
    tcpOutUrg           =     7    tcpOutWinUpdate     =   315
    tcpOutWinProbe      =     0    tcpOutControl       = 56588
    tcpOutRsts          =   803    tcpOutFastRetrans   =   741
    tcpInSegs           =4587678
    tcpInAckSegs        =2087448   tcpInAckBytes       =1865292802
    tcpInDupAck         =109461    tcpInAckUnsent      =     0
    tcpInInorderSegs    =3877639   tcpInInorderBytes   =-598404107
    tcpInUnorderSegs    = 14756    tcpInUnorderBytes   =17985602
    tcpInDupSegs        =    34    tcpInDupBytes       = 32759
    tcpInPartDupSegs    =   212    tcpInPartDupBytes   =134800
    tcpInPastWinSegs    =     0    tcpInPastWinBytes   =     0
    tcpInWinProbe       =   456    tcpInWinUpdate      =     0
    tcpInClosed         =    99    tcpRttNoUpdate      =  6862
    tcpRttUpdate        =435097    tcpTimRetrans       = 15065
    tcpTimRetransDrop   =    67    tcpTimKeepalive     =   763
    tcpTimKeepaliveProbe=     1    tcpTimKeepaliveDrop =     0

IP
    ipForwarding        =     2    ipDefaultTTL        =   255
    ipInReceives        =11757234  ipInHdrErrors       =     0
    ipInAddrErrors      =     0    ipInCksumErrs       =     0
    ipForwDatagrams     =     0    ipForwProhibits     =     0
    ipInUnknownProtos   =     0    ipInDiscards        =     0
    ipInDelivers        =4784901   ipOutRequests       =4195180
    ipOutDiscards       =     0    ipOutNoRoutes       =     0
    ipReasmTimeout      =    60    ipReasmReqds        =  8723
    ipReasmOKs          =  7565    ipReasmFails        =  1158
    ipReasmDuplicates   =     7    ipReasmPartDups     =     0
    ipFragOKs           = 19938    ipFragFails         =     0
    ipFragCreates       =116953    ipRoutingDiscards   =     0
    tcpInErrs           =     0    udpNoPorts          =6426577
    udpInCksumErrs      =     0    udpInOverflows      =   473
    rawipInOverflows    =     0

ICMP
    icmpInMsgs          =490338    icmpInErrors        =     0
    icmpInCksumErrs     =     0    icmpInUnknowns      =     0
    icmpInDestUnreachs  =   618    icmpInTimeExcds     =   314
    icmpInParmProbs     =     0    icmpInSrcQuenchs    =     0
    icmpInRedirects     =   313    icmpInBadRedirects  =     5
    icmpInEchos         =   477    icmpInEchoReps      =    20
    icmpInTimestamps    =     0    icmpInTimestampReps =     0
    icmpInAddrMasks     =     0    icmpInAddrMaskReps  =     0
    icmpInFragNeeded    =     0    icmpOutMsgs         =   827
    icmpOutDrops        =   103    icmpOutErrors       =     0
    icmpOutDestUnreachs =    94    icmpOutTimeExcds    =   256
    icmpOutParmProbs    =     0    icmpOutSrcQuenchs   =     0
    icmpOutRedirects    =     0    icmpOutEchos        =     0
    icmpOutEchoReps     =   477    icmpOutTimestamps   =     0
    icmpOutTimestampReps=     0    icmpOutAddrMasks    =     0
    icmpOutAddrMaskReps =     0    icmpOutFragNeeded   =     0
    icmpInOverflows     =     0

IGMP:
        0 messages received
        0 messages received with too few bytes
        0 messages received with bad checksum
        0 membership queries received
        0 membership queries received with invalid field(s)
        0 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        0 membership reports sent

The following example shows output from the netstat -r command, which displays the IP routing table.

Routing Table:
  Destination        Gateway           Flags  Ref   Use    Interface
------------------ -------------------- ----- ----- ------ ---------
localhost            localhost             UH       0   2817  lo0
earth-bb             pluto                 U        3  14293  eri0
224.0.0.0            pluto                 U        3      0  eri0
default              mars-gate             UG       0  14142

The fields in the netstat -r report are described in Table 30-2.

Table 30-2 Output From the `netstat` `-r` Command

Field Name		Description
`Flags`	`U` `G` `H` `D`	The route is up. The route is through a gateway. The route is to a host. The route was dynamically created by using a redirect.
`Ref`		Shows the current number of routes that share the same link layer.
`Use`		Indicates the number of packets that were sent out.
`Interface`		Lists the network interface that is used for the route.

How to Display NFS Server and Client Statistics

The NFS distributed file service uses a remote procedure call (RPC) facility that translates local commands into requests for the remote host. The remote procedure calls are synchronous. The client application is blocked or suspended until the server has completed the call and has returned the results. One of the major factors that affects NFS performance is the retransmission rate.

If the file server cannot respond to a client's request, the client retransmits the request a specified number of times before the client quits. Each retransmission imposes system overhead and increases network traffic. Excessive retransmissions can cause network performance problems. If the retransmission rate is high, you could look for the following:

Overloaded servers that complete requests too slowly
An Ethernet interface that is dropping packets
Network congestion, which slows the packet transmission

Table 30-3 describes the nfsstat options to display client and server statistics.

Table 30-3 Commands for Displaying Client/Server Statistics

Command	Display
`nfsstat -c`	Client statistics
`nfsstat -s`	Server statistics
`netstat -m`	Network statistics for each file system

Use nfsstat -c to show client statistics, and nfsstat -s to show server statistics. Use netstat -m to display network statistics for each file system. For more information, see nfsstat(1M).

Examples–Displaying NFS Server and Client Statistics

The following example displays RPC and NFS data for the client pluto.

$ nfsstat -c

Client rpc:
Connection oriented:
calls    badcalls  badxids  timeouts newcreds  badverfs   timers     
1595799  1511      59       297      0         0          0          
cantconn nomem     interrupts 
1198      0         7          
Connectionless:
calls    badcalls  retrans  badxids  timeouts  newcreds   badverfs   
80785    3135      25029    193      9543      0          0          
timers   nomem     cantsend   
17399    0         0          

Client nfs:
calls    badcalls  clgets   cltoomany  
1640097  3112      1640097  0          
Version 2: (46366 calls)
null     getattr   setattr  root     lookup     readlink  read       
0 0%     6589 14%  2202 4%  0 0%     11506 24%  0 0%      7654 16%   
wrcache  write     create   remove   rename     link      symlink    
0 0%     13297 28% 1081 2%  0 0%     0 0%       0 0%      0 0%       
mkdir    rmdir     readdir  statfs     
24 0%    0 0%      906 1%   3107 6%    
Version 3: (1585571 calls)
null    getattr    setattr  lookup     access     readlink  read     
0 0%    508406 32% 10209 0% 263441 16% 400845 25% 3065 0%  117959 7%
write    create     mkdir    symlink    mknod    remove   rmdir 
69201 4% 7615 0%    42 0%    16 0%      0 0%     7875 0%  51 0%      
rename   link       readdir  readdir+   fsstat   fsinfo   pathconf   
929 0%   597 0%     3986 0%  185145 11% 942 0%   300 0%   583 0%     
commit     
4364 0%    
 
Client nfs_acl:
Version 2: (3105 calls)
null       getacl     setacl     getattr    access     
0 0%       0 0%       0 0%       3105 100%  0 0%       
Version 3: (5055 calls)
null       getacl     setacl     
0 0%       5055 100%  0 0%

The output of the nfsstat -c command is described in Table 30-4.

Table 30-4 Output From the `nfsstat -c` Command

Field	Description
`calls`	The total number of calls that were sent.
`badcalls`	The total number of calls that were rejected by RPC.
`retrans`	The total number of retransmissions. For this client, the number of retransmissions is less than 1 percent, or approximately 10 timeouts out of 6888 calls. These retransmissions might be caused by temporary failures. Higher rates might indicate a problem.
`badxid`	The number of times that a duplicate acknowledgment was received for a single NFS request.
`timeout`	The number of calls that timed out.
`wait`	The number of times a call had to wait because no client handle was available.
`newcred`	The number of times the authentication information had to be refreshed.
`timers`	The number of times the time-out value was greater than or equal to the specified time-out value for a call.
`readlink`	The number of times a `read` was made to a symbolic link. If this number is high, at over 10 percent, then there could be too many symbolic links.

The following example shows output from the nfsstat -m command.

pluto$ nfsstat -m
/usr/man from pluto:/export/svr4/man
Flags: vers=2,proto=udp,auth=unix,hard,intr,dynamic,
        rsize=8192, wsize=8192,retrans=5
 Lookups: srtt=13 (32ms), dev=10 (50ms), cur=6 (120ms)
 All:     srtt=13 (32ms), dev=10 (50ms), cur=6 (120ms)

This output of the nfsstat -m command, which is displayed in milliseconds, is described in Table 30-5.

Table 30-5 Output From the `nfsstat -m` Command

Field	Description
`srtt`	The smoothed average of the round-trip times
`dev`	The average deviations
`cur`	The current “expected” response time

If you suspect that the hardware components of your network are creating problems, you need to look closely at the cabling and connectors.