Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science 213 © 2006 Donald Acton 244 The Role of Unix I/O File system works at the block level Applications work at the byte level Unix I/O converts.

Similar presentations


Presentation on theme: "Computer Science 213 © 2006 Donald Acton 244 The Role of Unix I/O File system works at the block level Applications work at the byte level Unix I/O converts."— Presentation transcript:

1 Computer Science 213 © 2006 Donald Acton 244 The Role of Unix I/O File system works at the block level Applications work at the byte level Unix I/O converts the byte level access to block level operations Application Unix I/O File System Disk Drive File System Layering

2 Computer Science 213 © 2006 Donald Acton 245 Unix I/O API Some of the most common Unix I/O API functions used by applications are: –open() –close() –read() –write() –lseek()

3 Computer Science 213 © 2006 Donald Acton 246 Opening Files Opening a file informs the kernel that an application wants to access a file Allows the kernel to set aside resources int source_fd; if ((source_fd = open(argv[1], O_RDONLY)) < 0) { perror("Open source failed:"); exit(2); }

4 Computer Science 213 © 2006 Donald Acton 247 Opening cont’d Open returns a small integer called a file descriptor Application passes this value back to the kernel in subsequent requests to work with a file Each process created starts with three open files: –0: standard input (stdin) –1: standard output (stdout) –2: standard error (stderr)

5 Computer Science 213 © 2006 Donald Acton 248 Closing Files Closing a file tells the kernel it may free resources associated with managing the file int rc; if ((rc = close(source_fd)) < 0){ perror("close"); exit(10); }

6 Computer Science 213 © 2006 Donald Acton 249 Reading Files Each open file has a notion of a current position in the stream of bytes read() copies bytes from the current file position to memory and updates the file position read() returns the number of bytes read –If bytes read < 0  –read may return fewer bytes than requested (short reads) error

7 Computer Science 213 © 2006 Donald Acton 250 Read Example char buf[512]; int chars_read; chars_read = read(source_fd, buf, sizeof(buf)); while (chars_read > 0) { // Do something chars_read = read(source_fd, buf, sizeof(buf)); } if (chars_read < 0) { perror("Reading error:"); exit(5); }

8 Computer Science 213 © 2006 Donald Acton 251 Writing Files Writing copies bytes from memory to the file position and updates position Returns the number of bytes written If bytes written < 0  It is possible that fewer bytes were written than requested (short writes) this is not an error, but certainly a challenge to deal with error

9 Computer Science 213 © 2006 Donald Acton 252 Writing Example while (chars_read > 0) { if (write(stdout, buf, chars_read) < chars_read) { perror("Write problems:"); exit(4); } // Do another read and work }

10 Computer Science 213 © 2006 Donald Acton 253 Seek Causes the logical position in the file to change (i.e. where the next read or write will commence from) Position can be changed –To absolute offset in file –Relative to the current location –Relative to the end of the file

11 Computer Science 213 © 2006 Donald Acton 254 Seek example long new_offset; new_offset = lseek(fd, 2346, SEEK_CUR); new_offset = lseek(fd, 10, SEEK_SET); new_offset = lseek(fd, 25, SEEK_END);

12 Computer Science 213 © 2006 Donald Acton 255 Unix I/O Example Simple program that copies contents of file named by argument 1 to file named by argument 2 (i.e. the cp command) cs213copy fname1 [fname2]

13 Computer Science 213 © 2006 Donald Acton 256 Pseudo Code open argument 1 for input open argument 2 for output (if present) if arg 2 present then connect stdout to this file read from input while read succeeds write to stdout read from input

14 Computer Science 213 © 2006 Donald Acton 257 Unix I/O Copy Command // Includes int main(int argc, char **argv) { // Check arguments int source_fd; if ((source_fd = open(argv[1], O_RDONLY)) < 0) { perror("Open source failed:"); exit(2); } int dest_fd; if (argc > 2) { if ((dest_fd = open(argv[2], O_WRONLY | O_CREAT, 0600)) < 0) { perror("Destination open failed:"); int rc; if ((rc = close(source_fd)) < 0) { perror("close"); exit(10); } exit(3); } dup2(dest_fd, STDOUT_FILENO); } char buf[512]; int chars_read; chars_read = read(source_fd, buf, sizeof(buf)); while (chars_read > 0) { if (write(STDOUT_FILENO, buf, chars_read) < chars_read) { perror("Write problems:"); exit(4); } chars_read = read(source_fd, buf, sizeof(buf)); } if (chars_read < 0) { perror("Reading error:"); exit(5); }

15 Computer Science 213 © 2006 Donald Acton 258 1) Unix I/O #include int main(int argc, char **argv) { if (argc <= 1) { printf("Usage: cs213cp source_file [destination_file]\n"); exit(1); }

16 Computer Science 213 © 2006 Donald Acton 259 2) Unix I/O int source_fd; if ((source_fd = open(argv[1], O_RDONLY)) < 0) { perror("Open source failed:"); exit(2); }

17 Computer Science 213 © 2006 Donald Acton 260 3) Unix I/O int dest_fd; if (argc > 2) { if ((dest_fd = open(argv[2], O_WRONLY | O_CREAT, 0600)) < 0) { perror("Destination open failed:"); int rc; if ((rc = close(source_fd)) < 0) { perror("close"); exit(10); } exit(3); } dup2(dest_fd, STDOUT_FILENO); }

18 Computer Science 213 © 2006 Donald Acton 261 4) Unix I/O char buf[512]; int chars_read; chars_read = read(source_fd, buf, sizeof(buf)); while (chars_read > 0) { if (write(STDOUT_FILENO, buf, chars_read) < chars_read) { perror("Write problems:"); exit(4); } chars_read = read(source_fd, buf, sizeof(buf)); } if (chars_read < 0) { perror("Reading error:"); exit(5); }

19 Computer Science 213 © 2006 Donald Acton 262 Unix I/O By making everything appear to be a file, the kernel can provide a single simple interface for performing I/O to a variety of devices Recall the basic operations are: –Opening and closing files open() and close() –Changing the current file position lseek() –Reading and writing files read() and write()

20 Computer Science 213 © 2006 Donald Acton 263 Adding Other Devices Most devices tend to be producers or consumers of streams of data and fit UNIX I/O API model described Mouseproducer Joystickproducer Keyboardproducer DisplayConsumer Audio deviceconsumer Tapeboth

21 Computer Science 213 © 2006 Donald Acton 264 New Devices Disk UNIX I/O Application File System Disk Drive KeyboardTerminalTapeAudio

22 Computer Science 213 © 2006 Donald Acton 265 Getting data to/from the hardware There are 2 main issues to deal with – buffering of data going to and from the disk – I/O requests that are not block aligned or in block multiples Application Unix I/O File System Disk Drive File System Layering

23 Computer Science 213 © 2006 Donald Acton 266 File Descriptors Calls to routines like open(), socket(), accept() and pipe() return file descriptors A file descriptor is just a small integer When this “integer” is passed back to the kernel via calls like read() or write() the kernel manipulates the opened “file” the descriptor corresponds to

24 Computer Science 213 © 2006 Donald Acton 267 The Kernel’s View of a File Descriptor Each process has associated with it a fixed size file descriptor table The file descriptor is just the index into this table! Each active entry in the table identifies an entry in a shared system wide open file table Entries are created in the open file table each time open() succeeds

25 Computer Science 213 © 2006 Donald Acton 268 Open File Table Entries in the open file table identify the I/O target in a v-node table Open file table keeps current position and reference count of its usage v-node – virtual inode, basically a cache of an inode –may contain pointers to buffers/caches for the file/device –identifies legal operations on a file/device

26 Computer Science 213 © 2006 Donald Acton 269 The Kernel View fd 0 fd 1 fd 2 fd 3 fd 4 Descriptor table (one table per process) Open file table (shared by all processes) v-node table File pos refcnt=1... stderr stdout stdin File access... File size File type File A Adapted from: Computer Systems: A Programmer’s Perspective The above is one struct in the open file table

27 Computer Science 213 © 2006 Donald Acton 270 v-node role UNIX I/O Application File System Disk Drive KeyboardTerminalTapeAudio

28 Computer Science 213 © 2006 Donald Acton 271 To the Device Unix I/O uses the open file table and v-node table to determine the “device” specific code for the standard operations (open, close read, write…) These routines use buffers identified by the v-node table Buffers are caches of on disk blocks Changes to buffers result in writes being scheduled

29 Computer Science 213 © 2006 Donald Acton 272 write() lseek(fd, 931, SET_SEEK); –Change file position in open file table to 931 write(fd, buff, 128); –If block #1 (bytes 512 – 1023) not cached - read it –If block #2 (bytes 1024 – 1535) not cached - read it –Change bytes 931- 1023, and 1024-1058 –Have blocks 1 and 2 scheduled for writing to disk

30 Computer Science 213 © 2006 Donald Acton 273 read() lseek(fd, 500, SET_SEEK); –Change file position in open file table to 500 read(fd, buff, 1024); –If any of blocks 0 (0 – 511), 1 (512-1023) or 2 (1024 – 1535) not cached order them read –Transfer bytes 500 – 511, 512 – 1023, and 1024 – 1523 to buff when blocks availability

31 Computer Science 213 © 2006 Donald Acton 274 Sharing Files At this point we have –File descriptors –The open file table –V-nodes It is relatively easy to explain what happens when file sharing results from: –Opens in the same process –Opens in different processes –Forks

32 Computer Science 213 © 2006 Donald Acton 275 Actions on open() fd 0 fd 1 fd 2 fd 3 fd 4 Descriptor table (one table per process) Open file table (shared by all processes) v-node table File pos refcnt=1... File pos refcnt=1... stderr stdout stdin File access... File size File type File access... File size File type File A File B fd = open("B",…) Adapted from: Computer Systems: A Programmer’s Perspective

33 Computer Science 213 © 2006 Donald Acton 276 Same File Different Process Descriptor table (one table per process) Open file table (shared by all processes) v-node table File pos refcnt=1... File pos refcnt=1... fd 0 fd 1 fd 2 fd 3 fd 4 stderr stdout stdin File access... File size File type File A fd = open("A",…) fd 0 fd 1 fd 2 fd 3 fd 4 stderr stdout stdin Adapted from: Computer Systems: A Programmer’s Perspective

34 Computer Science 213 © 2006 Donald Acton 277 Same File Same Process Descriptor table (one table per process) Open file table (shared by all processes) v-node table File pos refcnt=1... File pos refcnt=1... fd 0 fd 1 fd 2 fd 3 fd 4 stderr stdout stdin File access... File size File type File A fd = open("A",…); Adapted from: Computer Systems: A Programmer’s Perspective

35 Computer Science 213 © 2006 Donald Acton 278 Close() Empty fd 0 fd 1 fd 2 fd 3 fd 4 Descriptor table (one table per process) Open file table (shared by all processes) v-node table (shared by all processes) File pos refcnt=1... File pos refcnt=1... stderr stdout stdin File access... File size File type File access... File size File type File A File B close(4); refcnt=0

36 Computer Science 213 © 2006 Donald Acton 279 I/O Redirection COMOX(114): ls > /tmp/out The above causes standard output (file descriptor 1) to be set to /tmp/out fd 0 fd 1 fd 2 fd 3 fd 4 Process file descriptor table stderr stdout stdin File pos refcnt=4 terminal File access... File size File type File access... File size File type File pos refcnt=1... /tmp/out refcnt=3... Adapted from: Computer Systems: A Programmer’s Perspective

37 Computer Science 213 © 2006 Donald Acton 280 dup2 The Unix system call dup2, which has the form dup2(fd, newfd), copies fd to newfd in the descriptor table. a b fd 0 fd 1 fd 2 fd 3 fd 4 b b fd 0 fd 1 fd 2 fd 3 fd 4 dup2(4,1) Adapted from: Computer Systems: A Programmer’s Perspective

38 Computer Science 213 © 2006 Donald Acton 281 dup2 example Process file descriptor table File pos terminal File access... File size File type File access... File size File type File pos... /tmp/out... open("/tmp/foo",…); dup2(4,1); close(4); refcnt=1 fd 0 fd 1 fd 2 fd 3 fd 4 refcnt=0 refcnt=2

39 Computer Science 213 © 2006 Donald Acton 282 Pipe and fork fd 0 fd 1 fd 2 fd 3 fd 4 Parent v-node table stderr stdout stdin fd 0 fd 1 fd 2 fd 3 fd 4 Child stderr stdout stdin KeyboardTerminalpipe1pipe2 pipe() fork() dup2(3,1) close(4) dup(3, 0) close(3) close(4) close(3) Data

40 Computer Science 213 © 2006 Donald Acton 283 Protocols The mechanism to allow processes to communicate across machines Perhaps it should really be protocols Topics –Bandwidth/latency –Packets –Converting streams to packets –Making things reliable

41 Computer Science 213 © 2006 Donald Acton 284 Application Given what we know, are there interesting things we can do at the application layer to speed things up? Making a system call is several orders of magnitude more expensive than a function call Application Unix I/O File System Disk Drive File System Layering

42 Computer Science 213 © 2006 Donald Acton 285 Caching in the Application Applications can use caching to improve performance just like the kernel Most I/O has both –Spatial locality –Temporal locality –An application level cache in the form of the Standard I/O library attempts to take advantage of this Unix I/O File System Disk Drive File System Layering Buffered I/O Application

43 Computer Science 213 © 2006 Donald Acton 286 STDIO (Caching) Each Unix I/O call has a corresponding stdio call –open()  fopen(), close  fclose() –read()  fread(), write()  fwrite() Instead of returning a file descriptor fopen() returns a FILE * The FILE struct contains: –actual file descriptor –pointer to a buffer –position in buffer –other bookkeeping information

44 Computer Science 213 © 2006 Donald Acton 287 How it works - writes When fwrite() is called bytes are copied to the stream buffer If the stream buffer fills during the fwrite() –write() called to “write” the stream buffer –Stream buffer cleared

45 Computer Science 213 © 2006 Donald Acton 288 fwrite() Buffer Buffer offset fd Kernel boundary write() Cached File Block

46 Computer Science 213 © 2006 Donald Acton 289 How it works - reads When fread() is called bytes are copied from the stream buffer to the application designated location If the stream buffer empties during the fread() –read() called to refill the stream buffer –Position in stream buffer reset

47 Computer Science 213 © 2006 Donald Acton 290 fread() Buffer Buffer offset fd Kernel boundary read() Cached File Block

48 Computer Science 213 © 2006 Donald Acton 291 Analysis Costs over doing a system call –Need extra buffer space –One extra set of copies –Bookkeeping to ensure the stream buffer exactly matches real file location –I/O to random locations can be inefficient Advantage over system call –If application I/O requests much less data than underlying buffer holds then greatly reduces the number of system calls –System calls are very expensive

49 Computer Science 213 © 2006 Donald Acton 292 What are files good for? A bulk storage mechanism A more permanent form of storing information A form of interprocess communication –The mere existence of a file can mean something –Data in a file can be a message to a process that doesn’t exist yet

50 Computer Science 213 © 2006 Donald Acton 293 Sharing data on disk write() Application 1Application 2 read() Hi ?

51 Computer Science 213 © 2006 Donald Acton 294 Two processes, same time As the file access times between the two processes narrows just what one process sees relative to the actions of the other becomes unpredictable Two common problems –Lost update –Inconsistent retrievals

52 Computer Science 213 © 2006 Donald Acton 295 The Lost Update Withdraw(A, 4); Deposit(B, 4); –Bal = A.read(); 100 –A.write(Bal – 4); 96 –Bal = B.read() 200 –B.write(Bal + 4) 204 Withdraw(C, 3); Deposit(B, 3); –Bal = C.read(); 300 –C.write(Bal – 3); 297 –Bal = B.read() 200 –B.write(Bal + 3) 203

53 Computer Science 213 © 2006 Donald Acton 296 Aside - cache consistency The previous problem illustrates the issue of cache consistency The values read from disk and then used are cached Multiple programs cache and change the same data simultaneously without regard for one another Result

54 Computer Science 213 © 2006 Donald Acton 297 Inconsistent Retrievals Withdraw(A, 5); Deposit(B, 5); –Bal = A.read(); 200 –A.write(Bal – 5); 195 –Bal = B.read(); 100 –B.write(bal + 5) 105 TotalAccounts(); –Bal = A.read(); 195 –Bal += B.read() 295 –Bal += C.read() …

55 Computer Science 213 © 2006 Donald Acton 298 Are these familiar types of problems? How was it solved? Would the same solution work here? Would it scale?

56 Computer Science 213 © 2006 Donald Acton 299 The problems It would “work” but there are some obvious problems Suppose there were thousands of files –Need a rule to determine semaphore name from file name –Semaphores need to be created before being used and must exist even if the file isn’t open –What happens if a process exits and forgets to release the semaphore? –There are limits on the number of semaphores the system will support

57 Computer Science 213 © 2006 Donald Acton 300 Bigger Problems Semaphores only work if every application uses them – like to be able to force usage Semaphore would lock the whole file Locking sub-regions of a file –Requires additional semaphores –Compounds the problem of knowing which semaphore locks what file region –Need to know in advance what regions will be locked – what if the file grows?

58 Computer Science 213 © 2006 Donald Acton 301 Semaphores  File locks Locks only needed when a file is open Not all files need locks Inode/v-node combination can –Specify that locks are mandatory (inode) –Enforce locking (v-node) –Dynamically create/manage lock regions (v-node) –Track what process has a lock (v-node) –automatic lock release when process exits

59 Computer Science 213 © 2006 Donald Acton 302 Lock a file region int sharedData; Lock aLock; … aLock.acquire (); read or write sharedData aLock.release (); … Shared Region

60 Computer Science 213 © 2006 Donald Acton 303 lockf() lockf(fd, function, size) F_UNLOCK F_LOCK F_TLOCK F_TEST An open file descriptor that allows writing Starting from the current file position, the number of bytes to lock

61 Computer Science 213 © 2006 Donald Acton 304 Using lockf() int main(int argc, char **argv) { int fd = open(argv[1], O_RDWR); if ((status = lockf(fd, F_TLOCK, 60)) < 0) { printf("locked\n"); lockf(fd,F_LOCK, 60); }

62 Computer Science 213 © 2006 Donald Acton 305 The Lost Update (2) Withdraw(A, 4); Deposit(B, 4); –Bal = A.read(); 100 –A.write(Bal – 4); 96 –Bal = B.read() 200 –B.write(Bal + 4) 204 Withdraw(C, 3); Deposit(B, 3); –Bal = C.read(); 300 –C.write(Bal – 3); 297 –Bal = B.read() 200 –B.write(Bal + 3) 203

63 Computer Science 213 © 2006 Donald Acton 306 Types of lock requests Regular lock (really a writer lock) –Only one acquisition allowed at a time Read lock –Allows multiple readers to hold the lock at the same time – increased concurrency –Basically prevents a writer from making changes Write lock –Only one acquisition allowed at a time –Prevents read lock from being acquired

64 Computer Science 213 © 2006 Donald Acton 307 Reader – Writer locks int sharedData; Lock aLock; … aLock.acquireWrite (); write sharedData aLock.release (); … aLock.acquireRead (); read sharedData aLock.release (); … Shared Region

65 Computer Science 213 © 2006 Donald Acton 308 Implementing Locks Each lock requires –Lists of process IDS Process with lock Processes waiting for lock –Regions – what part of the file is being locked and how (read/write)

66 Computer Science 213 © 2006 Donald Acton 309 Where are locks implemented? Requirements –Must be (potentially) 1 per file –All processes must be able to locate the lock –Created on demand (sort of) What kernel data structure associated with file management has these properties?

67 Computer Science 213 © 2006 Donald Acton 310 Locking and Vnodes Descriptor table (one table per process) Open file table (shared by all processes) v-node table File pos refcnt=1... File pos refcnt=1... fd 0 fd 1 fd 2 fd 3 fd 4 stderr stdout stdin File access... File size File type File A fd = open("A",…) fd 0 fd 1 fd 2 fd 3 fd 4 stderr stdout stdin Adapted from: Computer Systems: A Programmer’s Perspective

68 Computer Science 213 © 2006 Donald Acton 311 Are locks enough? Locks can control concurrency Sometimes a collection of actions need to be atomic –Locks can’t ensure this in the face of failures –Undoing (rolling back) things can be a challenge

69 Computer Science 213 © 2006 Donald Acton 312 Transactions - Definition A transaction is a sequence of data operations with the following properties: –A Atomic – all or nothing –C Consistent - consistent state in => consistent state out –I Independent - partial results are not visible to concurrent transactions –D Durable - once completed, new state survives crashes

70 Computer Science 213 © 2006 Donald Acton 313 Transaction Operations tid = beginTx() –Start a new transaction and return a transaction identifier status = commitTX(tid) –Cause the transaction to commit –Return success indication if transaction committed otherwise return failure indication

71 Computer Science 213 © 2006 Donald Acton 314 Transaction Operations cont’d abortTX(tid) –Abort the transaction and cause all files to take on the values they had before the transaction started readTX(tid, file values) –Read the given “values” from a file and associate the read with the indicated transaction

72 Computer Science 213 © 2006 Donald Acton 315 Transaction Operations cont’d writeTX(tid, values) –Write the given values to the file and associate the write with the indicated transaction

73 Computer Science 213 © 2006 Donald Acton 316 Example transaction tid = beginTX(); readTX(tid, &a, file_to_read_from, …); readTX(tid, &b, file_to_read_from, …); perform computations writeTX(tid, &a, file_to_write_to,...); readTX(tid, &c, file_to_read_from, …); if (error reading) { abortTX(tid); return; } perform computations writeTX(tid, &c, file_to_write_to, …) commitTX(tid);

74 Computer Science 213 © 2006 Donald Acton 317 Ensuring Atomicity Problem –ensure all changes get made or none get made If no failure, it’s easy –just do the updates If failure occurs while updates are performed must either –Go back to the initial state –Go to the final state

75 Computer Science 213 © 2006 Donald Acton 318 Strategy Use another file, called a log file, to record our intentions Write information to indicate –That a transaction has started –The new values a file is to have –That a transaction has committed –That a transaction has aborted –The transaction can be truncated

76 Computer Science 213 © 2006 Donald Acton 319 Logging Persistent (on disk) log –records information to support recovery and abort Types of log records –begin, update, abort, commit, and truncate Atomic update –atomic operation is write of commit record to disk –transaction committed iff commit record in log

77 Computer Science 213 © 2006 Donald Acton 320 Ways to log the “values” Value logging –write new value of modified data to log –simple, but not always space efficient or easy hard for some things such as malloc and system calls Operation logging –write name of operation and its arguments –usually used for roll forward logging

78 Computer Science 213 © 2006 Donald Acton 321 Transaction and persistent data transaction log data memory part of data

79 Computer Science 213 © 2006 Donald Acton 322 Logging for Roll Forward For each transactional update –Change in-memory copy –Write new value to log –Do not change on-disk copy until commit Commit –Write commit record to log –Write changed data to disk –Write truncate record to log Abort –Write abort record to log –Invalidate in-memory data –Nothing to do with on disk copies

80 Computer Science 213 © 2006 Donald Acton 323 Roll forward recovery When the system restarts after a failure –use log to roll forward committed transactions –normal access stopped until recover is completed

81 Computer Science 213 © 2006 Donald Acton 324 Recovery Continued Complete committed, but un-truncated transactions –for every trans with a commit but no truncate –read new values from log and update disk values –write truncate record to log Abort all uncommitted trans –for every trans with no commit or abort –write abort record to log

82 Computer Science 213 © 2006 Donald Acton 325 Logging/Recover Example Application Actions –tid = beginTX –ReadTX(tid, &a, …) –ReadTX(tid, &b, …) –WriteTX(tid, &b, …) –WriteTX(tid, &a, …) –commitTX(tid) Write out a and b to real file Write truncate to log Log File Records –BEGIN –NVAL –COMMIT –TRUNC

83 Computer Science 213 © 2006 Donald Acton 326 Role of Locking Locks must still be acquired to prevent inconsistent retrieval and lost updates Upon first time access of a value its source must be locked Locks released after all writes to real file completed (or reads if no writes being done) Locks are also used on the log file

84 Computer Science 213 © 2006 Donald Acton 327 Log File Log file can be shared by different processes Writes are always done to the end Before doing a write, a lock is acquired and released upon write completion Write consists of one or more log records

85 Computer Science 213 © 2006 Donald Acton 328 Roll backwards logging This is the opposite of redo or roll- forward logging Instead of writing new values to the log file old values are written Real files are updated before commit is written On abort, log is used to restore old values

86 Computer Science 213 © 2006 Donald Acton 329 Undo logging - roll backward Normal operation For each transactional update –write old value to log –modify data and write to disk any time Commit –ensure that all updates have been written to disk –write commit record to log Abort –use log to recover disk to old values

87 Computer Science 213 © 2006 Donald Acton 330 Undo logging - roll backward Recovery When the system restarts after a failure –use log to rollback uncommitted transactions –normal access stopped until recovery completed Undo effect of any uncommitted transactions –for every trans with no commit or abort use log to recover disk to old values –write abort record to log

88 Computer Science 213 © 2006 Donald Acton 331 Logging/Recover Example Application Actions –tid = beginTX –ReadTX(tid, &a, …) –ReadTX(tid, &b, …) –WriteTX(tid, &b, …) –WriteTX(tid, &a, …) –commitTX(tid) Ensure updated a and b written to real file Write commit to log Log File Records –BEGIN –OVAL –COMMIT

89 Computer Science 213 © 2006 Donald Acton 332 Outstanding problems? What about disk write order? –When application writes to disk the operating system decides write time and order –This is a problem for transactions Keeping the log file from growing infinitely large –Log file truncation

90 Computer Science 213 © 2006 Donald Acton 333 fsync() The order of writes is important For example in redo logging –All new values must be written to the log file before the commit is written –All updates to the “real” files need to be onto disk before truncate is written fsync(fd) – will not return until all outstanding writes on the file descriptor are complete

91 Computer Science 213 © 2006 Donald Acton 334 fsync() cont’d fsync() does not guarantee that writes go to the disk in program order If disk write order is important (e.g. when commit is written) then –Call fsync() before writing commit –Write commit –Call fsync() again Could also open file with O_SYNC option

92 Computer Science 213 © 2006 Donald Acton 335 Shrinking the Log File (Truncation) Truncation is the process of –removing unneeded records from transaction log For redo logging –remove transactions with truncate or abort records For undo logging –Remove transactions with commit or abort records

93 Computer Science 213 © 2006 Donald Acton 336 Layering - revisited STDIO and transaction systems are layers within the application layer Notice that layers don’t have to extend completely across the level they are in When using a layer don’t circumvent it –Example - when using STDIO don’t get the file descriptor and then do your own reads or writes and continue to use the f*() calls

94 Computer Science 213 © 2006 Donald Acton 337 Application Application Layering UNIX I/O File System Disk Drive KeyboardTerminalTapeAudio STDIOTransaction System

95 Computer Science 213 © 2006 Donald Acton 338 Layering in the File System Disks present very similar interfaces but the precise way to control different disk types differ To simplify the task of dealing with different disk types the notion of a virtual disk interface is used Each time a new type of drive is introduced one simply implements the virtual interface

96 Computer Science 213 © 2006 Donald Acton 339 Yet Another Layer SCSIESDI Virtual Disk Interface UNIX I/O File System Other Devices IDE Disk Drive Application STDIOTransaction System

97 Computer Science 213 © 2006 Donald Acton 340 Extending the File System Layering makes it “easy” to extend the file system architecture provided the various boundaries are well defined Example: –Journaling/logging file systems –Network File Systems (NFS) –iSCSI Just insert the new service at the appropriate layer

98 Computer Science 213 © 2006 Donald Acton 341 File System Inserting New Functionality SCSIIDE iSCSI Virtual Disk Interface Unix FFSLogging FS NFS Client Network Protocol Stack UNIX I/O Application Virtual Disk Interface Other Devices

99 Computer Science 213 © 2006 Donald Acton 342 Layering Yet Again! Application programs Operating system Hardware General Layering Structure Application Transport Network Link Network Layering Application Unix I/O File System Disk Drive File System Layering


Download ppt "Computer Science 213 © 2006 Donald Acton 244 The Role of Unix I/O File system works at the block level Applications work at the byte level Unix I/O converts."

Similar presentations


Ads by Google