These classnotes are depreciated. As of 2005, I no longer teach the classes. Notes will remain online for legacy purposes

UNIX01/Understanding UNIX Files

Classnotes | UNIX01 | RecentChanges | Preferences

When you look at a hard disk under Unix, you see a tree of named directories and files. Normally you won't need to look any deeper than that, but it does become useful to know what's going on underneath if you have a disk crash and need to try to salvage files.

Low-level disk and file system structure

Unix divides the disk into disk partitions. Each partition is a continuous span of blocks that's used separately from any other partition, either as a file system or as swap space. The original reasons for partitions had to do with crash recovery in a world of much slower and more error-prone disks; the boundaries between them reduce the fraction of your disk likely to become inaccessible or corrupted by a random bad spot on the disk. The lowest-numbered partition on a disk is often treated specially, as a boot partition where you can put a kernel to be booted.

Each partition is either swap space (used to implement virtual memory) or a file system used to hold files. Swap-space partitions are just treated as a linear sequence of blocks. File systems, on the other hand, need a way to map file names to sequences of disk blocks. Because files grow, shrink, and change over time, a file's data blocks will not be a linear sequence but may be scattered all over its partition (from wherever the operating system can find a free block when it needs one). This scattering effect is called fragmentation, and is not as much of a problem under systems like Linux.

File names and directories

Within each file system, the mapping from names to blocks is handled through a structure called an i-node. There's a pool of these things near the ``bottom'' (lowest-numbered blocks) of each file system. Each i-node describes one file. File data blocks (including directories) live above the i-nodes (in higher-numbered blocks).

Every i-node contains a list of the disk block numbers in the file it describes. (Actually this is a half-truth, only correct for small files, but the rest of the details aren't important here.) Note that the i-node does not contain the name of the file.

Names of files live in directory structures. A directory structure just maps names to i-node numbers. This is why, in Unix, a file can have multiple true names (or hard links); they're just multiple directory entries that happen to point to the same i-node.

Mount points

In the simplest case, your entire Unix file system lives in just one disk partition. While you'll see this arrangement on some small personal Unix systems, it's unusual. More typical is for it to be spread across several disk partitions, possibly on different physical disks. So, for example, your system may have one small partition where the kernel lives, a slightly larger one where OS utilities live, and a much bigger one where user home directories live.

The only partition you'll have access to immediately after system boot is your root partition, which is (almost always) the one you booted from. It holds the root directory of the file system, the top node from which everything else hangs.

The other partitions in the system have to be attached to this root in order for your entire, multiple-partition file system to be accessible. About midway through the boot process, your Unix will make these non-root partitions accessible. It will mount each one onto a directory on the root partition.

For example, if you have a Unix directory called `/usr', it is probably a mount point to a partition that contains many programs installed with your Unix but not required during initial boot.

How a file gets looked up

Now we can look at the file system from the top down. When you open a file (such as, say, /home/esr/WWW/ldp/fundamentals.sgml) here is what happens:

Your kernel starts at the root of your Unix file system (in the root partition). It looks for a directory there called `home'. Usually `home' is a mount point to a large user partition elsewhere, so it will go there. In the top-level directory structure of that user partition, it will look for a entry called `esr' and extract an i-node number. It will go to that i-node, notice that its associated file data blocks are a directory structure, and look up `WWW'. Extracting that i-node, it will go to the corresponding subdirectory and look up `ldp'. That will take it to yet another directory i-node. Opening that one, it will find an i-node number for `fundamentals.sgml'. That i-node is not a directory, but instead holds the list of disk blocks associated with the file.



Classnotes | UNIX01 | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited July 19, 2003 3:35 am (diff)
Search:
(C) Copyright 2003 Samuel Hart
Creative Commons License
This work is licensed under a Creative Commons License.