previous contents up next

Unix for Advanced Users

17. Light Sysadmin Tasks

17.3. Checking and Fixing Filesystem Errors

17.3.1. How do filesystems sustain damage? What is fsck?

When a machine loses power without being properly shut down, data can be lost. On a multi-user machine, data loss is almost assured. The reason for this is that, for efficiency's sake, not every change to a file is written directly to the files itself. Instead, the changes are recorded in memory buffers that temporarily hold part of the files' contents. That way small changes can be collected in the buffer and written to disk all at once, sidestepping the extreme slowness of disk I/O. The act of writing a buffer's contents to disk is called flushing or syncing. When a machine goes down without syncing, the changes in memory are lost.

17.3.2. What is fsck?

Fortunately, it is at least possible to deduce which files were open when the machine went down and restore the filesystem to a consistent state. The command that does this work is called fsck (in abbreviation of "filesystem check"). Since the Unix operating systems each support multiple filesystem types, fsck is usually a front-end for different filesystem-specific programs. Because filesystems vary in their features, those back-end programs will vary in their options. In operating systems where this is the case, fsck can take different filesystem-specific options to pass to those back ends.

In addition, fsck syntax varies among Unix implementations. However, the semantics are fairly consistent. The main useful options to fsck allow you to

17.3.3. Running fsck

fsck performs low-level operations on a disk, treating it as a
raw device rather than a filesystem (i.e. a character device rather than a block device). For that reason, it is necessary to unmount a filesystem before checking it. Otherwise high-level filesystem activity would cause the memory buffers to drift out of sync with the disk, effectively causing the whole problem that required you to run fsck, all over again.

The root (/) filesystem cannot be unmounted, since it contains the fsck executable itself. This is the reason it is good to make root filesystems small, to reduce the chance that they will sustain damage and need to be repaired. If the root filesystem does sustain damage, it must be repaired either by booting the system in single-user mode to assure that there will be no filesystem activity other than the work of fsck itself, or by using emergency boot media with a root image and another copy of the fsck executable, and then unmounting the root filesystem. The first method will fail in the case that the fsck executable itself is damaged, so the second method is preferrable.

previous contents up next