Chapter 16 - File Systems Persistent storage: storage that will continue to exist after a program that uses or creates it completes. Sometimes called secondary storage, since the devices commonly used to store permanent objects are farther down the storage hierarchy. Examples: disks, CD-ROMs, tapes, etc. Disks are user-unfriendly (imagine having to access info by sector numbers only).
Files and File Systems A file system provides a convenient way for users to manage their data. A file is a sequence of bytes of arbitrary length. Files are implemented by the operating system to provide persistent storage. A file system provides a way of storing, naming and protecting files. Accessing file data occurs through layers: File system interface (system calls), device driver interface and disk hardware interface (Figure 16.1). By tracing the file concept down the line you can see what each part manages (Figure 16.2).
Files and File Systems A useful abstraction is typical in a file system: File name space (directory structure): human-readable strings File data space: actual data blocks of the files Note how the system calls reflect this abstraction: open() maps a string to a local file ID. read(), write(), close() use the local file ID, not a string. Other (UNIX) file system calls that deal with the file name space also use strings (mkdir(), unlink(), etc.).
Logical File Structure Files are referenced by the operating system in three possible atomic forms (Figure 16.4): Bytes (“flat file”) - typical of UNIX Fixed-length records - think: tapes Variable-length records - think: database records In general, older OSes tended to support multiple file formats. Trend is towards treating files as mere byte streams, letting the application layers decided how to impose structure. High-end database servers can even manage the drive directly, bypassing the file system altogether (Figure 16.5). Files vary from 0 bytes to very large (usually limited by “native math” of machine’s registers).
Logical File Structure 32 bits allows for a theoretical 4 GB file (assuming byte addressed files). Notice on xi, however, that the “df” command displays some file systems that “break” the 32-bit barrier :) df -k /real/barracuda9 As discussed earlier, files also have metadata (name, type, size, owner, group(s), permissions, timestamps, disk & data block pointers, etc.). UNIX: stat() and fstat() return metadata; ls command displays them.
File Naming Virtually all modern OSes use a hierarchical file naming system. Note that the separator character distinguishing path components is different (/ = UNIX, \ = DOS/Win, : = Macintosh). Different limits exist as to what are legal file names and how long they can be: DOS FAT filesystem: case insensitive “8+3”/component. Win95 FAT: Kludge of the first order; up to 255 chars. WinNT NTFS: case insensitive; up to 255 chars. UNIX: case sensitive; from 14 (old limit) to 255 chars/component. Macintosh: case insensitive (even though stored sensitively); up to 31 chars/component.
File Naming Tree example: Figure 16.6. Note presence of an alias used to connect children of different parents. In UNIX this is done with the ln command, which has two types of links: hard and soft (aka symbolic). In Win this is called a shortcut; Macintoshes call it an alias. Absolute path name: full path from the “root” of the file system tree (UNIX: “/” prefix; DOS/Win: “\” prefix). Note that for DOS/Win the drive letters represent roots of separate trees. Current (working) directory: allows use of relative path names by use of an absolute prefix. Displayed via pwd command in UNIX and CD in DOS/Win.
File Naming Note how current working directory obeys the locality model we saw in the memory chapters -- file objects that are used by a program tend to hang around together. The hierarchical file system can allow for variations (Figure 16.7); but it can be dangerous (16.7-c). UNIX (shells and web servers, actually) uses the ~ character in a path name to indicate the home directory of a particular user (~jtbauer == /home/cs46/jtbauer or where ever my home may be).
File Naming Conventions File naming conventions are sometimes a necessary part of the operating system semantics (.COM and .EXE files in DOS/Win) or merely a set of conventions (most UNIX file extensions). Typical extensions exist for a variety of OSes, programs and applications: .c, .txt, .s, .OBJ, .o, .a, .LIB, .EXE, .COM, .tex, .gif, .jpg, .mov, .avi, .ps, .Z, .gz, .mif, .DOC, .h, .cpp, .c++, .pas, etc., etc., etc.! File system operations Figures 16.8, 16.9 & 16.10 categorize file system operations into three: operations on files, operations on open files & operations on directories.
File System Implementation File systems are typically layered, to provide useful abstractions at various levels. Figure 16.11 diagrams typical file system data structures: Process Descriptor contains an open file pointer array, used to point a processes’ open files to entries in the open file table. The open file table is a system-wide OS-managed table of entries for all opened files. It typically contains: Current file position File status info (R/W, locks, file type, etc.) Pointer to the file descriptor/device driver/pipe data structure The file descriptor table is an in-memory copy of disk-resident file descriptors
File System Implementation File descriptor table points to information about a particular file: owner, file protection info, timestamps, location on disk Note that the disk drive also contains other information: File system info File descriptors Directories File data In some file systems these data structures are intermingled.
File System Implementation Control/data flow for open() (Figure 16.12) and read() (Figure 16.13): Left hand side shows the data structures involved. Right hand side shows the flow of control through the file system layers. Notice the distinction between the logical file system and the physical file system: The logical file system deals with logical byte offsets, logical blocks and logical block numbers in a disk-independent fashion. The physical file system deals with physical blocks on actual disk drives. Notice that memory caching is used to improve performance.
File System Implementation Physical file systems connect to the appropriate I/O system, identified by a device number. A device switch (jump table) maps the device number to the address of the corresponding device driver (Figure 16.14). UNIX-style operating systems use special files to address the device drivers (try “ls -l /dev”). The fork() system call duplicates the parent’s open files (Figure 16.15). Other system calls modify various parts of the file system data structure, depending on the operation being performed (Figs. 16.16-16.18).
File System Implementation Notice how use of the VM system’s page tables allows for copy avoiding (Fig. 16.19). File system directory implementation: Maps component names to file descriptors. Sometimes the FDs are in the directory, other times the directory contains pointers to the FDs elsewhere. Name/path resolution algorithm: Figure 16.20. Notice that typically directories are implemented as files. UNIX: directory contains name to inode number mappings; the inode (information node) is the UNIX term for a file descriptor and it contains the metadata of the file. Try “od -cx dirname” and “ls -i” on UNIX. Skip section 16.6 (Example File System Implementation).