Lecture 10: File Systems

C SC 340 Lecture 10: File Systems

Part One: File Systems Interface

Introduction

A file is the logical unit of secondary storage. It is a named collection of related information on secondary storage. Secondary storage includes most magnetic and optical media such as disks, CDs, and tapes.

We will focus on disk storage. All user data on secondary storage must reside in a file. The OS also uses secondary storage for process/memory/storage management such as process swapping space. We will focus on user files.

A file system is the logical collection of files. We call it logical because a single device may contain multiple files systems (e.g. partitions); likewise a single file system may occupy multiple devices (e.g. Unix device mounting).

File and File System Management Issues

We will cover a number of issues relevant to files and file systems. First we focus on file issues:

types and internal structure (from an OS standpoint),
data attributes,
operations,
access methods
protection

Then we introduce directories as special types of files, and the same issues. Finally, we look at file systems and in particular file system organization and storage allocation.

The OS View of File Types

Files of different types are created by different programs: word processors, compilers, etc. Should the OS recognize file types as a service to the user?

Windows philosophy: use filename extensions, a period followed by one or more characters. The user (or software) can associate a filename extension with an application.
Unix philosophy: categorize files simply as being either text or binary (anything other than text). Not very fine grained. It relies on file contents, not name, to yield clues as to the file's type but only very crudely. Executable files contain a magic number which the OS can sense.
Macintosh philosophy: places the name of the associated application directly into the file.

The OS View of File Structure

A file's internal structure depends on the program that created it. A similar question applies here: Should the OS know anything about a file's structure?

Process analogy (we will do this frequently): A process has a particular data structure including text segment, data segment, stack segment, and process management variables. The OS knows a lot about the process structure! Segmented memory management techniques organize the process in memory based on its segmented structure.

The file management equivalent of pages/frames is blocks, the unit of disk storage. The OS may see the file as merely a collection of blocks without regard to its internal structure.
Unix philosophy: a file is a byte stream, an unstructured sequence of bytes. It is the responsibility of the application to impose structure on the raw bytes. This philosophy is reflected in the basic I/O system calls read() and write(), which simply read/write the specified number of bytes starting at the specified memory location (buffer).
Alternative philosophy: OS uses knowledge of certain internal file structures to improve performance of file operations in the applications that use them. This ties the OS to the application, requiring such OS support to be installed when a new application is installed.

File Attributes (data)

Certain file management information must be stored with the file. It is not normally stored in the file itself, but in a directory structure. The kinds of data stored include name, type (sometimes), disk location (for OS use), size, protection, ownership, usage times (creation, last modification, last use).

File Operations

The OS provides system calls through which programmers can work with files. Typical operations include create, delete, rename, execute, open, close, read, write, append, seek, and truncate.

The typical life-cycle of a file goes something like this:

create, open, write, write, write, ... , close
open, read, write, read, write, ... , close
delete

Every read and write operation makes use of the current-file-position pointer.

OS variable associated with an open file
indicates where the next read/write operation will take place.
initialized when the file is opened, to either the beginning or the end (if opened for appending) of the file.
updated with each read/write operation.
can be manipulated by the programmer through the seek operation listed above.

The Unix system calls for basic file I/O are creat(), open(), read(), write(), lseek(), and close(). Open is the most complex of these, with options for read-only, write-only, read-write, append, truncate, create-if-not-found, and more.

When a file is opened, an entry is added to an OS open-file table.

programmer is returned a table entry pointer or similar data structure as its handle for subsequent file access.
A multiuser system maintains both a system-wide table and a per-process table.
system-wide table permits sharable files to occupy only one table entry when opened by multiple processes.
per-process table entry is deleted when the file is closed; the system-wide table entry is not deleted until all processes using the file have closed it (keep a file open count incremented upon open and decremented upon close).
Because every open file requires one or two OS table entries, it is important for system performance that programmers close files when finished with them.

The OS limits the number of open files a process can have. To test this limit in Unix, write a program containing an infinite loop that opens a file without closing it. The open() system call returns an error value (-1) when your limit is reached.

File Access Methods

File data are accessed either sequentially or directly (randomly).

Sequential is based on the tape concept: fixed read/write head and linear media moving past it.
Direct is based on the disk concept: movable read/write head and rotating media.
Either access method works well on physical disks; the direct access method does not work well on physical tapes.

Addresses in a file may be specified in different units: records, blocks, or bytes.

File addresses are numbered sequentially starting at 0.
This is a logical address because it is the same regardless of where the file is physically stored.
Every location is thus relative to the beginning of the file.
Process analogy: process logical memory address space, which starts at 0 regardless of where it will be physically loaded into memory.

Directory Structures

Mere mortal users would not be able to effectively manage files were they not organized by partitions and directories (called folders in the Windows world).

The top level of organization is partitions, which are logical storage devices. A physical device may contain one or more partitions; a partition may cover more than one physical device. Sometimes also called volumes.

Each partition contains a device directory, a table with information about all the files it contains. We will also assume the directory can contain other directories, called subdirectories. The directory entry is where file attributes are stored.

Like files, directories have certain defined structures, attributes and operations. Unlike a regular file, it is essential for the OS to be familiar with the internal structure of a directory (its table). Operations such as creating, deleting, reading and writing are applied to a directory but they must be implemented in a special way.

Consider directory deletion. Deleting a non-empty directory requires OS policy -- should the files it contains be deleted too? If not, who should adopt them? Windows deletes them; Unix allows the deletion of a non-empty directory but only through a special command switch which deletes both the directory and its files; DOS prohibits the deletion of a non-empty directory.

Directory operations include:

create a file in the directory
delete a file from the directory
rename a file in the directory
search a directory for a given file
list a directory's contents
traverse the file system

We assume the directories in a file system form a tree structure. This means the file system has a single root directory which contains files and subdirectories, and the subdirectories in turn contain other files and other subdirectories. Any given file or subdirectory is contained in exactly one directory. A sketch that shows directories and folders as nodes and the "contains" relationship as links would thus resemble a tree.

While working with a file system, each user has a current directory. These contain the files of current interest.

The location of every file can be specified using a pathname which lists each directory in the unique path through the tree to its location. The pathname can either be relative starting from the current directory or absolute starting from the root directory.

The user can traverse the file system using appropriate commands or actions. If commands are used, the absolute or relative pathname of the destination directory can be specified directly.

The problem with tree-structured directories is they do not allow files to be shared among multiple directories. Such shared files would allow all members of a group project to have shared files listed in directories that they individually own. A file system that allows such sharing forms an acyclic graph instead of a tree -- there can be more than one directory path to a given file.

Such sharing is implemented through links. There is one physical copy of the file plus one or more links to it. There are two approaches to implementing links:

A symbolic link (aka soft link) implements a link by storing the pathname of the shared file in the directory table.
A non symbolic link (aka hard link) implements a link by duplicating the directory table entry of the shared file.

Each approach has its issues, particularly where file movements and deletions are concerned.

If the shared file is renamed or moved to a different directory, its physical location on disk remains the same and a hard link remains valid. A soft link would become invalid since the file pathname is changed.
Deletion is tricky too. If the shared file is deleted, all soft links to it become invalid. Deleting of a shared file with hard links works OK if the file has an associated reference count, a count of the number of hard links to it. If one user "deletes" the file, the reference count is decremented. The actual file is deleted only if the reference count is reduced to 0.

File System Mounting

One of the OS responsibilities in maintaining a file system is mounting, which maps a physical device to one or more logical partitions (file system entities). This is necessary before a user process can reference the partition. Mounting is done at boot time and may also be done "on the fly" as devices are attached and removed.

Windows mounts devices to drive letters (A, C, D, etc) and presents the collection of logical drives as its file system. Note that devices may be attached via network connections and mapped to logical drives.
Macintosh searches devices for presence of file system structures, including file system name, and and places a corresponding icon on the desktop. The file systems are then referenced by name.
Unix maintains a single file system with a root directory called "/" (slash). A device is mapped to a file name using the mount command. Device files are normally found in the "/dev" directory. Thus individual devices do not have a "root identity" as they do in Windows or Mac.

File Protection

The protection of files refers to assuring they cannot be "improperly" accessed. This includes assuring that only authorized users may access the file, and assuring that only authorized operations can be performed. The term controlled access refers to allowing access to some users but not others and allowing some operations but not others.

Access controls can be defined for any of the file operations listed above.

Limiting access to authorized users is accomplished by defining an access list for each file that the OS checks before allowing access. Two approaches to implementing the access list are:

a list of users, and permitted access for each user (access control list, or ACL)
a fixed list of user categories with permitted access for each category

The first approach is very precise but is variable length and slow to use and maintain. The second approach is less precise but is fixed length and quick to use and maintain.

File Access Control in Windows

To see Windows file protection, right-click on a file icon and select "Properties". Check the resulting window for the "General" and "Security" tabs. The General tab implements some protections, such as read-only access. The Security tab details which users or groups of users are allowed access, and what access is allowed for them.

File Access Control in Unix/Linux

To see Unix/Linux protection, type the "ls -l" command to produce a detailed listing of files in the current directory. The leftmost column will contain a string of 10 characters consisting of the following characters: -, r, w, x, d. The string actually represents 4 groups of information:

The first character indicates what kind of file it is: - for regular file, d for directory, and a couple others more rarely seen such as l for link and s for socket.
The next 3 characters represent access rights for the file owner. Every file has a owner ID as one of its attributes. There are 3 kinds of access: read, write, and execute. The first of the three characters indicates whether read access is allowed (r) or not (-). The second indicates whether write access is allowed (w) or not (-). The third indicates whether execute access is allowed (x) or not (-).
The next 3 characters represent access rights for the file group. A named group containing a list of its members is separately maintained. Every file has a group ID as one of its attributes. The 3 access rights are defined the same as for owner.
The last 3 characters represent access rights for everyone else, the world. The 3 access rights are defined the same as for owner.

For example, a Unix file with protection code "-rw-r--r--" is a regular file to which everyone has read access but only the owner has write access. A Unix file with protection code "-rwx--x--x" is a regular file which anyone can execute (it is an executable program or script), but only the owner can read or write it. File protection values are set when the file is created, and changed by an authorized user with the chmod (change mode) command or system call.

Part Two: File Systems Implementation

Introduction

OSs provide consistent control and access to secondary storage through a file system. Design of a file system involves multiple layers of concern, here listed from the top-down:

logical file system which is the API
file-organization module which maps logical file structures to physical ones.
basic file system which interacts with device drivers to request operations on physical addresses
device drivers and interrupt handlers to push bits between the device and memory
storage devices themselves

Application programs then interface with the API to use the file system. Examples include Unix command shells, Windows Explorer, word processors, compilers.

We have already covered a significant portion of the file system API: operations for files and directories.

File System Storage Allocation

File space is allocated in logical units called blocks. These are ultimately translated into physical storage locations. In the case of disks, the physical location of a block is an ordered triple: < cylinder, track, sector >. The logical-physical translation is non-trivial. We'll limit our discussion to logical blocks, which are typically 512 or 1024 bytes long.

Assume that disk blocks, like file blocks, are a linear resource numbered sequentially starting at 0. This is analogous to memory frames and process pages; both are numbered sequentially starting at 0.

There are 3 basic strategies for allocating disk blocks to files:

contiguous allocation, which allocates consecutively-numbered disk blocks.
linked allocation, which allocates non-consecutive blocks and organizes them into a linked list.
indexed allocation, a variation of linked allocation in which the links are organized into a single per-file structure called the index block.

Contiguous allocation

Contiguous allocation has the same properties, advantages and disadvantages as its memory counterpart. Overhead is low (directory entry need only store starting block and length) and both sequential and direct access are fast but finding the right hole (e.g. first fit, best fit) is expensive and external fragmentation results. Files that dynamically grow are also bothersome.

Linked allocation

Any available disk block can be allocated to a file block. There is additional overhead. Directory entry needs only store starting block at minimum, but each data block must contain link to next block. This means the file block and disk block cannot be the same size. The link overhead can be reduced by allocating blocks in multi-block clusters. This however increases internal fragmentation.

Linked allocation supports sequential access well (traversing the linked list) but is dismal for direct access (reduces almost to the mag tape model).

Both the link overhead and direct access problems can be reduced by collecting all links into a single file allocation table (FAT) stored at the beginning of a partition. The FAT contains one entry per disk block and is indexed by disk block number. Each entry contains the link that would otherwise occupy part of the disk block. If the FAT is cached in main memory, direct access becomes considerably faster.

Indexed allocation

Any available disk block can be allocated to a file block. Instead of organizing the blocks into a linked list or collecting the list pointers into a per-partition FAT, the indexed organization collects the list pointers into a per-file table, indexed by the file's block number, called the index block. Each entry contains the corresponding disk block. The index block is kept in the file's directory entry. This is analogous to a process page table.

This solves the direct access problem of linked allocation, since the index block entry of any file block can be reached in one access.

The problem with indexed allocation is how to organize the index block -- the number of entries needed is the same as the file length in blocks. But a file can be as small as one block or as large as the partition itself! This is analogous to the page table size problem. Here are approaches to that problem:

start with an index block which is one block long. When it is full, allocate another block for the index block and link to it. Thus the index block is a linked list of blocks. Access to blocks toward the end of the file is much slower than to those toward the beginning.
Organize it in two or more levels with indirection. Each entry in the primary index block points to a secondary level index block. This is analogous to multi-level page table organization. Access to any disk block requires two (or however many levels there are) steps.
Combine direct and indirect indexing. This is best illustrated by Unix and described below.

Unix combines direct and indirect indexing. The index block is contained in the inode (short for index node). The inode contains:

12 (or so) entries that point directly to the disk block. For a file which is 12 or fewer blocks long, this is all that's needed.
one entry that points to a single indirect block, which is a second level index block.
one entry that points to a double indirect block, which is an index block containing pointers to single indirect blocks.
one entry that points to a triple indirect block, which is an index block containing pointers to double indirect blocks.
If you assume that blocks are 1024 (1K) bytes and block numbers are 4 bytes each, then the above allows a very large file system and or a file to be quite long!

This supports about 4.2 billion block numbers, and each block is 1K bytes long. The maximum file size is thus 4 terabytes!

If file size is 12K bytes or less, only direct indexing is needed (most Unix files are very small).
The single indirect block handles the next 1024/4 = 256 blocks. The max length is 12K + 256K = 268K bytes for a file, requiring no more than one level of indirection.
The double indirect block extends an additional 256 single indirect blocks, or 256 * 256 = 65536 blocks. The max length is now 268K + 65536K = 65804K bytes for a file (just over 64 MB), requiring no more than two levels of indirection.

How important is this?

The choice of allocation and indexing method is critical to OS performance simply because secondary storage devices are SO SLOW compared to the memory and processing units. Nearly any performance improvement will have significant impact on overall system performance. Suppose an indexing improvement saves 1 millisecond per disk access. During 1 millisecond, the processor possibly executes over 100,000 machine instructions. So even if the improvement requires an additional say 10,000 lines of C code, it is probably worthwhile.

File System Free Space Management

A file system's free-space list organizes (disk) blocks that are available to be allocated to growing files. As with any disk operation, performance issues are critical because the devices are so slow.

Note that the FAT method of linked allocation incorporates the free-space list; it is just one additional linked list.

The greatest performance benefit comes from caching the free-space list in main memory. Coherency is an obvious issue, but there is also the issue of memory requirements. If each block is 1K and a drive's capacity is 60 GB, the drive contains 60 million blocks and thus a drive could easily have a free-space list 50 million or more blocks long!

Note that allocating blocks in clusters cuts the memory requirements considerably.

The free-space list may be implemented as a bit vector. The block number (0 to n) is bit position index into a bit string. If a bit position contains 0, the corresponding block is available (if 1, it is occupied). Advantages is small space required, disadvantage is the variable length of time to find an available block (have to search the vector).

An alternative is to maintain a linked list of free blocks. This is what the FAT does. It could be organized either as a stack or a queue. Either one limits access only to the ends and thus access time is constant.

[ C SC 340 | Peter Sanderson | Math Sciences server | Math Sciences home page | Otterbein ]

Last updated:
Peter Sanderson (PSanderson@otterbein.edu)