1.1. Foreward

When migrating from another operating system such as Microsoft Windows to another; one thing that will profoundly affect the end user greatly will be the differences between the filesystems.

What are filesystems?

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk. The word is also used to refer to a partition or disk that is used to store the files or the type of the filesystem. Thus, one might say I have two filesystems meaning one has two partitions on which one stores files, or that one is using the extended filesystem, meaning the type of the filesystem.

The difference between a disk or partition and the filesystem it contains is important. A few programs (including, reasonably enough, programs that create filesystems) operate directly on the raw sectors of a disk or partition; if there is an existing file system there it will be destroyed or seriously corrupted. Most programs operate on a filesystem, and therefore won't work on a partition that doesn't contain one (or that contains one of the wrong type).

Before a partition or disk can be used as a filesystem, it needs to be initialized, and the bookkeeping data structures need to be written to the disk. This process is called making a filesystem.

Most UNIX filesystem types have a similar general structure, although the exact details vary quite a bit. The central concepts are superblock, inode, data block, directory block, and indirection block. The superblock contains information about the filesystem as a whole, such as its size (the exact information here depends on the filesystem). An inode contains all information about a file, except its name. The name is stored in the directory, together with the number of the inode. A directory entry consists of a filename and the number of the inode which represents the file. The inode contains the numbers of several data blocks, which are used to store the data in the file. There is space only for a few data block numbers in the inode, however, and if more are needed, more space for pointers to the data blocks is allocated dynamically. These dynamically allocated blocks are indirect blocks; the name indicates that in order to find the data block, one has to find its number in the indirect block first.

Like UNIX, Linux chooses to have a single hierarchical directory structure. Everything starts from the root directory, represented by /, and then expands into sub-directories instead of having so-called 'drives'. In the Windows environment, one may put one's files almost anywhere: on C drive, D drive, E drive etc. Such a file system is called a hierarchical structure and is managed by the programs themselves (program directories), not by the operating system. On the other hand, Linux sorts directories descending from the root directory / according to their importance to the boot process.

If you're wondering why Linux uses the frontslash / instead of the backslash \ as in Windows it's because it's simply following the UNIX tradition. Linux, like Unix also chooses to be case sensitive. What this means is that the case, whether in capitals or not, of the characters becomes very important. So this is not the same as THIS. This feature accounts for a fairly large proportion of problems for new users especially during file transfer operations whether it may be via removable disk media such as floppy disk or over the wire by way of FTP.

The filesystem order is specific to the function of a file and not to its program context (the majority of Linux filesystems are 'Second Extended File Systems', short 'EXT2' (aka 'ext2fs' or 'extfs2') or are themselves subsets of this filesystem such as ext3 and Reiserfs). It is within this filesystem that the operating system determines into which directories programs store their files.

If you install a program in Windows, it usually stores most of its files in its own directory structure. A help file for instance may be in C:\Program Files\[program name]\ or in C:\Program Files\[program-name]\help or in C:\Program Files\[program -name]\humpty\dumpty\doo. In Linux, programs put their documentation into /usr/share/doc/[program-name], man(ual) pages into /usr/share/man/man[1-9] and info pages into /usr/share/info. They are merged into and with the system hierarchy.

As all Linux users know, unless you mount a partition or a device, the system does not know of the existence of that partition or device. This might not appear to be the easiest way to provide access to your partitions or devices, however it offers the advantage of far greater flexibility when compared to other operating systems. This kind of layout, known as the unified filesystem, does offer several advantages over the approach that Windows uses. Let's take the example of the /usr directory. This sub-directory of the root directory contains most of the system executables. With the Linux filesystem, you can choose to mount it off another partition or even off another machine over the network using an innumerable set of protocols such as NFS (Sun), Coda (CMU) or AFS (IBM). The underlying system will not and need not know the difference. The presence of the /usr directory is completely transparent. It appears to be a local directory that is part of the local directory structure.

Compliance requires that:


 +---------+-----------------+-------------+
 |         | shareable       | unshareable |
 +---------+-----------------+-------------+
 |static   | /usr            | /etc        |
 |         | /opt            | /boot       |
 +---------+-----------------+-------------+
 |variable | /var/mail       | /var/run    |
 |         | /var/spool/news | /var/lock   |
 +---------+-----------------+-------------+

 "Shareable" files are defined as those that can be stored on one host and
 used on others. "Unshareable" files are those that are not shareable. For
 example, the files in user home directories are shareable whereas device
 lock files are not. "Static" files include binaries, libraries,
 documentation files and other files that do not change without system
 administrator intervention. "Variable" files are defined as files that
 are not static.

Another reason for this unified filesystem is that Linux caches a lot of disk accesses using system memory while it is running to accelerate these processes. It is therefore vitally important that these buffers are flushed (get their content written to disk), before the system closes down. Otherwise files are left in an undetermined state which is of course a very bad thing. Flushing is achieved by 'unmounting' the partitions during proper system shutdown. In other words, don't switch your system off while it's running! You may get away with it quite often, since the Linux file system is very robust, but you may also wreak havoc upon important files. Just hit ctrl-alt-del or use the proper commands (e.g. shutdown, poweroff, init 0). This will shut down the system in a decent way which will thus, guarantee the integrity of your files.

Many of us in the Linux community have come to take for granted the existence of excellent books and documents about Linux, an example being those produced by the Linux Documentation Project. We are used to having various packages taken from different sources such as Linux FTP sites and distribution CD-ROMs integrate together smoothly. We have come to accept that we all know where critical files like mount can be found on any machine running Linux. We also take for granted CD-ROM based distributions that can be run directly from the CD and which consume only a small amount of physical hard disk or a RAM disk for some variable files like /etc/passwd, etc. This has not always been the case.

During the adolescent years of Linux during the early to mid-90s each distributor had his own favorite scheme for locating files in the directory hierarchy. Unfortunately, this caused many problems. The Linux File System Structure is a document, which was created to help end this anarchy. Often the group, which creates this document or the document itself, is referred to as the FSSTND. This is short for file system standard". This document has helped to standardize the layout of file systems on Linux systems everywhere. Since the original release of the standard, most distributors have adopted it in whole or in part, much to the benefit of all Linux users.

Since the first draft of the standard, the FSSTND project has been coordinated by Daniel Quinlan and development of this standard has been through consensus by a group of developers and Linux enthusiasts. The FSSTND group set out to accomplish a number of specific goals. The first goal was to solve a number of problems that existed with the current distributions at the time. Back then, it was not possible to have a shareable /usr partition, there was no clear distinction between /bin and /usr/bin, it was not possible to set up a diskless workstation, and there was just general confusion about what files went where. The second goal was to ensure the continuation of some reasonable compatibility with the de-facto standards already in use in Linux and other UNIX-like operating systems. Finally, the standard had to gain widespread approval by the developers, distributors, and users within the Linux community. Without such support, the standard would be pointless, becoming just another way of laying out the file system.

Fortunately, the FSSTND has succeeded though there are also some goals that the FSSTND project did not set out to achieve. The FSSTND does not try to emulate the scheme of any specific commercial UNIX operating system (e.g. SunOS, AIX, etc.) Furthermore, for many of the files covered by the FSSTND, the standard does not dictate whether the files should be present, merely where the files should be if they are present. Finally, for most files, the FSSTND does not attempt to dictate the format of the contents of the files. (There are some specific exceptions when several different packages may need to know the file formats to work together properly. For example, lock files that contain the process ID of the process holding the lock.) The overall objective was to establish the location where common files could be found, if they existed on a particular machine. The FSSTND project began in early August 1993. Since then, there have been a number of public revisions of this document. The latest, v2.3 was released on January 29, 2004.

If you're asking "What's the purpose of all this? Well, the answer depends on who you are. If you are a Linux user, and you don't administrate your own system then the FSSTND ensures that you will be able to find programs where you'd expect them to be if you've already had experience on another Linux machine. It also ensures that any documentation you may have makes sense. Furthermore, if you've already had some experience with Unix before, then the FSSTND shouldn't be too different from what you're currently using, with a few exceptions. Perhaps the most important thing is that the development of a standard brings Linux to a level of maturity authors and commercial application developers feel they can support.

If you administer your own machine then you gain all the benefits of the FSSTND mentioned above. You may also feel more secure in the ability of others to provide support for you, should you have a problem. Furthermore, periodic upgrades to your system are theoretically easier. Since there is an agreed-upon standard for the locations of files, package maintainers can provide instructions for upgrading that will not leave extra, older files lying around your system inhabiting valuable disk space. The FSSTND also means that there is more support from those providing source code packages for you to compile and install yourself. The provider knows, for example, where the executable for sed is to be found on a Linux machine and can use that in his installation scripts or Makefiles.

If you run a large network, the FSSTND may ease many of your NFS headaches, since it specifically addresses the problems which formerly made shared implementations of /usr impractical. If you are a distributor, then you will be affected most by the Linux FSSTND. You may have to do a little extra work to make sure that your distribution is FSSTND-compliant, but your users (and hence your business) will gain by it. If your system is compliant, third party add-on packages (and possibly your own) will integrate smoothly with your system. Your users will, of course, gain all the benefits listed above, and many of your support headaches will be eased. You will benefit from all the discussion and thought that has been put into the FSSTND and avoid many of the pitfalls involved in designing a filesystem structure yourself. If you adhere to the FSSTND, you will also be able to take advantage of various features that the FSSTND was designed around. For example, the FSSTND makes "live" CD-ROMs containing everything except some of the files in the / and /var directories possible. If you write documentation for Linux, the FSSTND makes it much easier to do so, which makes sense to the Linux community. You no longer need to worry about the specific location of lock files on one distribution versus another, nor are you forced to write documentation that is only useful to the users of a specific distribution. The FSSTND is at least partly responsible for the recent explosion of Linux books being published.

If you are a developer, the existence of the FSSTND greatly eases the possibility for potential problems. You can know where important system binaries are found, so you can use them from inside your programs or your shell scripts. Supporting users is also greatly eased, since you don't have to worry about things like the location of these binaries when resolving support issues. If you are the developer of a program that needs to integrate with the rest of the system, the FSSTND ensures that you can be certain of the steps to meet this end. For example, applications such as kermit, which access the serial ports, need to know they can achieve exclusive access to the TTY device. The FSSTND specifies a common method of doing this so that all compliant applications can work together. That way you can concentrate on making more great software for Linux instead of worrying about how to detect and deal with the differences in flavors of Linux. The widespread acceptance of the FSSTND by the Linux community has been crucial to the success of both the standard and operating system. Nearly every modern distribution conforms to the Linux FSSTND. If your implementation isn't at least partially FSSTND compliant, then it is probably either very old or you built it yourself. The FSSTND itself contains a list of some of the distributions that aim to conform to the FSSTND. However, there are some distributions that are known to cut some corners in their implementation of FSSTND.

By no means does this mean that the standard itself is complete. There are still unresolved issues such as the organization of architecture-independent scripts and data files /usr/share. Up until now, the i386 has been the primary platform for Linux, so the need for standardization of such files was non-existent.

The rapid progress in porting Linux to other architectures (MC680x0, Alpha, MIPS, PowerPC) suggests that this issue will soon need to be dealt with. Another issue that is under some discussion is the creation of an /opt directory as in SVR4. The goal for such a directory would be to provide a location for large commercial or third party packages to install themselves without worrying about the requirements made by FSSTND for the other directory hierarchies. The FSSTND provides the Linux community with an excellent reference document and has proven to be an important factor in the maturation of Linux. As Linux continues to evolve, so will the FSSTND.

Now, that we have seen how things should be, let's take a look at the real world. As you will see, the implementation of this concept on Linux isn't perfect and since Linux has always attracted individualists who tend to be fairly opinionated, it has been a bone of contention among users for instance which directories certain files should be put into. With the arrival of different distributions, anarchy has once again descended upon us. Some distributions put mount directories for external media into the / directory, others into /mnt. Red Hat based distributions feature the /etc/sysconfig sub-hierarchy for configuration files concerning input and network devices. Other distributions do not have this directory at all and put the appropriate files elsewhere or even use completely different mechanisms to do the same thing. Some distributions put KDE into /opt/, others into /usr.

But even within a given file system hierarchy, there are inconsistencies. For example, even though this was never the intention of the XFree86 group, XFree86 does indeed have its own directory hierarchy.

These problems don't manifest themselves as long as you compile programs yourself. You can adapt configure scripts or Makefiles to your system's configuration or to your preference. It's a different story if you install pre-compiled packages like RPMs though. Often these are not adaptable from one file system hierarchy to another. What's worse: some RPMs might even create their own hierarchy. If you, say, install a KDE RPM from the SuSE Linux distribution on your Mandrake system, the binary will be put into /opt/kde2/bin. And thus it won't work, because Mandrake expects it to be in /usr/bin. There are of course ways to circumvent this problem but the current situation is clearly untenable. Thus, all the leading Linux distributors have joined the Linux Standard Base project, which is attempting to create a common standard for Linux distributions. This isn't easy, since changing the file system hierarchy means a lot of work for distributors so every distributor tries to push a standard which will allow them to keep as much of their own hierarchy as possible. The LSB will also encompass the proposals made by the Filesystem Hierarchy Standard project (FHS, former FSSTND).