openSolaris 2008 - IPMP Failure Detection and Recovery Features - System Administration Guide: IP Services

IPMP Failure Detection and Recovery Features

The in.mpathd daemon handles the following types of failure detection:

Link-based failure detection, if supported by the NIC driver
Probe-based failure detection, when test addresses are configured
Detection of interfaces that were missing at boot time

The in.mpathd(1M) man page completely describes how the in.mpathd daemon handles the detection of interface failures.

Link-Based Failure Detection

Link-based failure detection is always enabled, provided that the interface supports this type of failure detection. The following Sun network drivers are supported in the current release of the Solaris OS:

hme
eri
ce
ge
bge
qfe
dmfe

To determine whether a third-party interface supports link-based failure detection, refer to the manufacturer's documentation.

These network interface drivers monitor the interface's link state and notify the networking subsystem when that link state changes. When notified of a change, the networking subsystem either sets or clears the RUNNING flag for that interface, as appropriate. When the daemon detects that the interface's RUNNING flag has been cleared, the daemon immediately fails the interface.

Probe-Based Failure Detection

The in.mpathd daemon performs probe-based failure detection on each interface in the IPMP group that has a test address. Probe-based failure detection involves the sending and receiving of ICMP probe messages that use test addresses. These messages go out over the interface to one or more target systems on the same IP link. For an introduction to test addresses, refer to Test Addresses. For information on configuring test addresses, refer to How to Configure an IPMP Group With Multiple Interfaces.

The in.mpathd daemon determines which target systems to probe dynamically. Routers that are connected to the IP link are automatically selected as targets for probing. If no routers exist on the link, in.mpathd sends probes to neighbor hosts on the link. A multicast packet that is sent to the all hosts multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, determines which hosts to use as target systems. The first few hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd cannot find routers or hosts that responded to the ICMP echo packets, in.mpathd cannot detect probe-based failures.

You can use host routes to explicitly configure a list of target systems to be used by in.mpathd. For instructions, refer to Configuring Target Systems.

To ensure that each interface in the IPMP group functions properly, in.mpathd probes all the targets separately through all the interfaces in the IPMP group. If no replies are made in response to five consecutive probes, in.mpathd considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. However, you can tune the failure detection time in the /etc/default/mpathd file. For instructions, go to How to Configure the /etc/default/mpathd File.

For a repair detection time of 10 seconds, the probing rate is approximately one probe every two seconds. The minimum repair detection time is twice the failure detection time , 20 seconds by default, because replies to 10 consecutive probes must be received. The failure and repair detection times apply only to probe-based failure detection.

Group Failures

A group failure occurs when all interfaces in an IPMP group appear to fail at the same time. The in.mpathd daemon does not perform failovers for a group failure. Also, no failover occurs when all the target systems fail at the same time. In this instance, in.mpathd flushes all of its current target systems and discovers new target systems.

Detecting Physical Interface Repairs

For the in.mpathd daemon to consider an interface to be repaired, the RUNNING flag must be set for the interface. If probe-based failure detection is used, the in.mpathd daemon must receive responses to 10 consecutive probe packets from the interface before that interface is considered repaired. When an interface is considered repaired, any addresses that failed over to another interface then fail back to the repaired interface. If the interface was configured as “active” before it failed, after repair that interface can resume sending and receiving traffic.

What Happens During Interface Failover

The following two examples show a typical configuration and how that configuration automatically changes when an interface fails. When the hme0 interface fails, notice that all data addresses move from hme0 to hme1.

Example 30-1 Interface Configuration Before an Interface Failure

hme0: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
     inet 192.168.85.19 netmask ffffff00 broadcast 192.168.85.255
     groupname test
hme0:1: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
     index 2 inet 192.168.85.21 netmask ffffff00 broadcast 192.168.85.255
hme1: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
8     inet 192.168.85.20 netmask ffffff00 broadcast 192.168.85.255
     groupname test
hme1:1: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
     index 2 inet 192.168.85.22 netmask ffffff00 broadcast 192.168.85.255
hme0: flags=a000841<UP,RUNNING,MULTICAST,IPv6,NOFAILOVER> mtu 1500 index 2
     inet6 fe80::a00:20ff:feb9:19fa/10
     groupname test
hme1: flags=a000841<UP,RUNNING,MULTICAST,IPv6,NOFAILOVER> mtu 1500 index 2
     inet6 fe80::a00:20ff:feb9:1bfc/10
     groupname test

Example 30-2 Interface Configuration After an Interface Failure

hme0: flags=19000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,FAILED> mtu 0 index 2
        inet 0.0.0.0 netmask 0 
        groupname test
hme0:1: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> 
        mtu 1500 index 2 inet 192.168.85.21 netmask ffffff00 broadcast 10.0.0.255
hme1: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.85.20 netmask ffffff00 broadcast 192.168.85.255
        groupname test
hme1:1: flags=9000843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 
        index 2 inet 192.168.85.22 netmask ffffff00 broadcast 10.0.0.255
hme1:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
        inet 192.168.85.19 netmask ffffff00 broadcast 192.168.18.255
hme0: flags=a000841<UP,RUNNING,MULTICAST,IPv6,NOFAILOVER,FAILED> mtu 1500 index 2
        inet6 fe80::a00:20ff:feb9:19fa/10 
        groupname test
hme1: flags=a000841<UP,RUNNING,MULTICAST,IPv6,NOFAILOVER> mtu 1500 index 2
        inet6 fe80::a00:20ff:feb9:1bfc/10 
        groupname test

You can see that the FAILED flag is set on hme0 to indicate that this interface has failed. You can also see that hme1:2 has been created. hme1:2 was originally hme0. The address 192.168.85.19 then becomes accessible through hme1.

Multicast memberships that are associated with 192.168.85.19 can still receive packets, but they now receive packets through hme1. When the failover of address 192.168.85.19 from hme0 to hme1 occurred, a dummy address 0.0.0.0 was created on hme0. The dummy address was created so that hme0 can still be accessed. hme0:1 cannot exist without hme0. The dummy address is removed when a subsequent failback takes place.

Similarly, failover of the IPv6 address from hme0 to hme1 occurred. In IPv6, multicast memberships are associated with interface indexes. Multicast memberships also fail over from hme0 to hme1. All the addresses that in.ndpd configured also moved. This action is not shown in the examples.

The in.mpathd daemon continues to probe through the failed interface hme0. After the daemon receives 10 consecutive replies for a default repair detection time of 20 seconds, the daemon determines that the interface is repaired. Because the RUNNING flag is also set on hme0, the daemon invokes the failback. After failback, the original configuration is restored.

For a description of all error messages that are logged on the console during failures and repairs, see the in.mpathd(1M) man page.