3.14. The GFS2 Withdraw Function
The GF2S withdraw function is a data integrity feature of GFS2 file systems in a cluster. If the GFS2 kernel module detects an inconsistency in a GFS2 file system following an I/O operation, the file system becomes unavailable to the cluster. The I/O operation stops and the system waits for further I/O operations to error out, preventing further damage. When this occurs, you can stop any other services or applications manually, after which you can reboot and remount the GFS2 file system to replay the journals. If the problem persists, you can unmount the file system from all nodes in the cluster and perform file system recovery with the fsck.gfs2
command. The GFS withdraw function is less severe than a kernel panic, which would cause another node to fence the node.
If your system is configured with the gfs2
startup script enabled and the GFS2 file system is included in the /etc/fstab
file, the GFS2 file system will be remounted when you reboot. If the GFS2 file system withdrew because of perceived file system corruption, it is recommended that you run the fsck.gfs2
command before remounting the file system. In this case, in order to prevent your file system from remounting at boot time, you can perform the following procedure:
-
Temporarily disable the startup script on the affected node with the following command:
# chkconfig gfs2 off
-
Reboot the affected node, starting the cluster software. The GFS2 file system will not be mounted.
-
Unmount the file system from every node in the cluster.
-
Run the fsck.gfs2
on the file system from one node only to ensure there is no file system corruption.
-
Re-enable the startup script on the affected node by running the following command:
# chkconfig gfs2 on
-
Remount the GFS2 file system from all nodes in the cluster.
An example of an inconsistency that would yield a GFS2 withdraw is an incorrect block count. When the GFS kernel deletes a file from a file system, it systematically removes all the data and metadata blocks associated with that file. When it is done, it checks the block count. If the block count is not one (meaning all that is left is the disk inode itself), that indicates a file system inconsistency since the block count did not match the list of blocks found.
You can override the GFS2 withdraw function by mounting the file system with the -o errors=panic
option specified. When this option is specified, any errors that would normally cause the system to withdraw cause the system to panic instead. This stops the node's cluster communications, which causes the node to be fenced.