Repairing Damaged Data
The following sections describe how to identify the type of data corruption and
how to repair the data, if possible.
ZFS uses checksumming, redundancy, and self-healing data to minimize the chances of data
corruption. Nonetheless, data corruption can occur if the pool isn't redundant, if corruption
occurred while the pool was degraded, or an unlikely series of events conspired
to corrupt multiple copies of a piece of data. Regardless of the source,
the result is the same: The data is corrupted and therefore no longer
accessible. The action taken depends on the type of data being corrupted, and
its relative value. Two basic types of data can be corrupted:
Pool metadata – ZFS requires a certain amount of data to be parsed to open a pool and access datasets. If this data is corrupted, the entire pool or complete portions of the dataset hierarchy will become unavailable.
Object data – In this case, the corruption is within a specific file or directory. This problem might result in a portion of the file or directory being inaccessible, or this problem might cause the object to be broken altogether.
Data is verified during normal operation as well as through scrubbing. For more
information about how to verify the integrity of pool data, see Checking ZFS Data Integrity.
Identifying the Type of Data Corruption
By default, the zpool status command shows only that corruption has occurred, but not
where this corruption occurred. For example:
# zpool status tank -v
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 1 0 0
mirror ONLINE 1 0 0
c2t0d0 ONLINE 2 0 0
c1t1d0 ONLINE 2 0 0
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE
tank 6 0-512
# zpool status
pool: monkey
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
monkey ONLINE 0 0 0
c1t1d0s6 ONLINE 0 0 0
c1t1d0s7 ONLINE 0 0 0
errors: 8 data errors, use '-v' for a list
Each error indicates only that an error occurred at the given point in
time. Each error is not necessarily still present on the system. Under normal
circumstances, this situation is true. Certain temporary outages might result in data corruption
that is automatically repaired once the outage ends. A complete scrub of the
pool is guaranteed to examine every active block in the pool, so the
error log is reset whenever a scrub finishes. If you determine that the
errors are no longer present, and you don't want to wait for
a scrub to complete, reset all errors in the pool by using the
zpool online command.
If the data corruption is in pool-wide metadata, the output is slightly different.
For example:
# zpool status -v morpheus
pool: morpheus
id: 1422736890544688191
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
see: https://www.sun.com/msg/ZFS-8000-72
config:
morpheus FAULTED corrupted data
c1t10d0 ONLINE
In the case of pool-wide corruption, the pool is placed into the
FAULTED state, because the pool cannot possibly provide the needed redundancy level.
Repairing a Corrupted File or Directory
If a file or directory is corrupted, the system might still be
able to function depending on the type of corruption. Any damage is effectively unrecoverable
if no good copies of the data exist anywhere on the system. If
the data is valuable, you have no choice but to restore the
affected data from backup. Even so, you might be able to recover from
this corruption without restoring the entire pool.
If the damage is within a file data block, then the file
can safely be removed, thereby clearing the error from the system. Use the
zpool status -v command to display a list of filenames with persistent errors. For
example:
# zpool status -v
pool: monkey
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
monkey ONLINE 0 0 0
c1t1d0s6 ONLINE 0 0 0
c1t1d0s7 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/monkey/a.txt
/monkey/bananas/b.txt
/monkey/sub/dir/d.txt
/monkey/ghost/e.txt
/monkey/ghost/boo/f.txt
The preceding output is described as follows:
If the full path to the file is found and the dataset is mounted, the full path to the file is displayed. For example:
/monkey/a.txt
If the full path to the file is found, but the dataset is not mounted, then the dataset name with no preceding slash (/), followed by the path within the dataset to the file, is displayed. For example:
monkey/ghost:/e.txt
If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it , as is the case for a dnode_t, then the dataset name followed by the object's number is displayed. For example:
monkey/dnode:<0x0>
If an object in the meta-object set (MOS) is corrupted, then a special tag of <metadata>, followed by the object number, is displayed.
If the damage is within a file data block, then the file can
safely be removed, thereby clearing the error from the system. The first step
is to try to locate the file by using the find command
and specify the object number that is identified in the zpool status output under
DATASET/OBJECT/RANGE output as the inode number to find. For example:
# find -inum 6
Then, try removing the file with the rm command. If this command doesn't
work, the corruption is within the file's metadata, and ZFS cannot determine which
blocks belong to the file in order to remove the corruption.
If the corruption is within a directory or a file's metadata, the
only choice is to move the file elsewhere. You can safely move any
file or directory to a less convenient location, allowing the original object to
be restored in place.
Repairing ZFS Storage Pool-Wide Damage
If the damage is in pool metadata that damage prevents the pool
from being opened, then you must restore the pool and all its data
from backup. The mechanism you use varies widely by the pool configuration and
backup strategy. First, save the configuration as displayed by zpool status so that you can
recreate it once the pool is destroyed. Then, use zpool destroy -f to
destroy the pool. Also, keep a file describing the layout of the datasets
and the various locally set properties somewhere safe, as this information will become
inaccessible if the pool is ever rendered inaccessible. With the pool configuration and
dataset layout, you can reconstruct your complete configuration after destroying the pool. The data
can then be populated by using whatever backup or restoration strategy you use.