What Is ZFS?
The ZFS file system is a revolutionary new file system that fundamentally changes
the way file systems are administered, with features and benefits not found in
any other file system available today. ZFS has been designed to be robust,
scalable, and simple to administer.
ZFS Pooled Storage
ZFS uses the concept of storage pools to manage physical storage. Historically, file systems
were constructed on top of a single physical device. To address multiple devices
and provide for data redundancy, the concept of a volume manager was introduced to
provide the image of a single device so that file systems would not
have to be modified to take advantage of multiple devices. This design added
another layer of complexity and ultimately prevented certain file system advances, because the file
system had no control over the physical placement of data on the virtualized
volumes.
ZFS eliminates the volume management altogether. Instead of forcing you to create virtualized
volumes, ZFS aggregates devices into a storage pool. The storage pool describes the
physical characteristics of the storage (device layout, data redundancy, and so on,) and
acts as an arbitrary data store from which file systems can be created.
File systems are no longer constrained to individual devices, allowing them to share
space with all file systems in the pool. You no longer need to
predetermine the size of a file system, as file systems grow automatically within
the space allocated to the storage pool. When new storage is added, all
file systems within the pool can immediately use the additional space without additional
work. In many ways, the storage pool acts as a virtual memory system.
When a memory DIMM is added to a system, the operating system doesn't
force you to invoke some commands to configure the memory and assign it
to individual processes. All processes on the system automatically use the additional memory.
Transactional Semantics
ZFS is a transactional file system, which means that the file system state
is always consistent on disk. Traditional file systems overwrite data in place, which
means that if the machine loses power, for example, between the time a
data block is allocated and when it is linked into a directory, the
file system will be left in an inconsistent state. Historically, this problem was
solved through the use of the fsck command. This command was responsible for
going through and verifying file system state, making an attempt to repair any
inconsistencies in the process. This problem caused great pain to administrators and was
never guaranteed to fix all possible problems. More recently, file systems have introduced
the concept of journaling. The journaling process records action in a separate
journal, which can then be replayed safely if a system crash occurs. This
process introduces unnecessary overhead, because the data needs to be written twice, and
often results in a new set of problems, such as when the journal
can't be replayed properly.
With a transactional file system, data is managed using copy on write semantics. Data
is never overwritten, and any sequence of operations is either entirely committed or
entirely ignored. This mechanism means that the file system can never be corrupted
through accidental loss of power or a system crash. So, no need for
a fsck equivalent exists. While the most recently written pieces of data might
be lost, the file system itself will always be consistent. In addition, synchronous
data (written using the O_DSYNC flag) is always guaranteed to be written before returning,
so it is never lost.
Checksums and Self-Healing Data
With ZFS, all data and metadata is checksummed using a user-selectable algorithm. Traditional
file systems that do provide checksumming have performed it on a per-block basis,
out of necessity due to the volume management layer and traditional file system
design. The traditional design means that certain failure modes, such as writing a
complete block to an incorrect location, can result in properly checksummed data that
is actually incorrect. ZFS checksums are stored in a way such that these
failure modes are detected and can be recovered from gracefully. All checksumming and
data recovery is done at the file system layer, and is transparent to
applications.
In addition, ZFS provides for self-healing data. ZFS supports storage pools with varying
levels of data redundancy, including mirroring and a variation on RAID-5. When a
bad data block is detected, ZFS fetches the correct data from another redundant
copy, and repairs the bad data, replacing it with the good copy.
Unparalleled Scalability
ZFS has been designed from the ground up to be the most
scalable file system, ever. The file system itself is 128-bit, allowing for 256
quadrillion zettabytes of storage. All metadata is allocated dynamically, so no need exists to
pre-allocate inodes or otherwise limit the scalability of the file system when it
is first created. All the algorithms have been written with scalability in mind.
Directories can have up to 248 (256 trillion) entries, and no limit exists
on the number of file systems or number of files that can be
contained within a file system.
ZFS Snapshots
A snapshot is a read-only copy of a file system or volume. Snapshots
can be created quickly and easily. Initially, snapshots consume no additional space within
the pool.
As data within the active dataset changes, the snapshot consumes space by continuing
to reference the old data. As a result, the snapshot prevents the data
from being freed back to the pool.
Simplified Administration
Most importantly, ZFS provides a greatly simplified administration model. Through the use of
hierarchical file system layout, property inheritance, and automanagement of mount points and NFS
share semantics, ZFS makes it easy to create and manage file systems without
needing multiple commands or editing configuration files. You can easily set quotas or
reservations, turn compression on or off, or manage mount points for numerous file
systems with a single command. Devices can be examined or repaired without having
to understand a separate set of volume manager commands. You can take an
unlimited number of instantaneous snapshots of file systems. You can backup and restore
individual file systems.
ZFS manages file systems through a hierarchy that allows for this simplified management
of properties such as quotas, reservations, compression, and mount points. In this model,
file systems become the central point of control. File systems themselves are very
cheap (equivalent to a new directory), so you are encouraged to create a
file system for each user, project, workspace, and so on. This design allows
you to define fine-grained management points.