(a) What is the output generated by this program? In fact the output is not uniquely defined, i.e., it is not necessarily the same in each execution. What are the possible outputs? (Assume that there are no errors or failures.) [20 pts] CPS 310 first midterm exam, 2/26/2014 Your name please: Part 1. More fun with forks (b) Briefly justify/explain your answer for (a). Try to characterize the set of all possible outputs. [20 pts] int main() { int i = 0; if (fork() != 0) { i = i + 1; if (fork() != 0) exit(0); } fork(); printf("%d\n", i); } /200 /60 /20 /40 Six possible combinations of two 1’s and two 0’s. 1100, 1010, 1001, 0110, 0101, 0011 To get this right, you had to understand a few of the subtleties of fork(). We can see that any process running this program prints at most a single integer. If it prints an integer, then it prints just before it exits. The processes have no dependencies (e.g., wait*) that would constrain their execution order. Thus they are concurrent. Therefore they could complete in any order, and therefore their outputs could appear in any order. There are four processes that print an integer. The initial process (call it P0) forks a child C0 at line 4. The child C0 skips the if block but then forks a child (call it C01) at line 9. Both C0 and C01 print a “0”, since neither increments i. On the other branch, P0 continues into the if at line 4, increments i, and forks another child C1. P0 then exits at line 7. C1 continues and hits the fork at line 9 to fork a child C11. C1 and C11 both print a “1” since i==1 in C1 before the fork of C11, and therefore i==1 in C11 as well (C11 starts with a cloned copy of its parent C1’s virtual address space).
Part 2. True/False The following true/false questions pertain to the classic C/Unix (or Android) environment as discussed in class. For each statement, indicate (in the space on the left) whether it is true (T) or false (F). Please add a brief comment to explain your answer in the space provided. [40 points] (a)The first user-mode instruction that executes in any Unix process is in a system call stub. True-ish: every process starts in a return from a fork() trap, which executes the instruction after the trap, which is in a system call stub. The possible exception is init, i.e., the first process after boot, which is “hand-crafted” by the kernel. Only a handful of students got this. Generous partial credit, but no points for saying stubs execute in kernel mode. (b) Every thread context switch results from a timer interrupt or a sleep operation. False: a context switch could also occur as a result of some thread exiting, making the core available, or a higher-priority thread becoming ready to run, preempting a running thread. Some people said (in effect) that a context switch could occur at any time, for any reason. That was half credit if you said that the kernel could switch any time it has control via a trap or a fault: generally it would not, but it is true that it could, and some may. But it’s only worth half, since there is a muuuch better answer. A few students said a context switch could also result from a thread blocking. But that’s what I meant by “a sleep operation”. (c) A successful exec* system call trap never returns to the process that invoked it. This is a bad question. Only one person gave me the answer I expected, a sure sign of a problem. Sure enough, a review of my slides reveals some ambiguity on this point, specifically a slide that says the exec* syscall “never returns”. So in recording scores I bumped everyone’s score up by 4 points. The correct answer is False! Exec* does return control to the process that invoked it: refer to the familiar figure above right. But the statement is “truthy”. Exec* makes dramatic changes to the process address space and to the saved thread context before returning from the trap to re-enter user mode. In particular, it overlays/replaces the process user address space with initial state for the program to be executed, and it makes changes to the saved register context to redirect control into main(). So an exec* syscall trap never returns to the program that invoked it. But it’s the same process, with the same process ID. CPS 310 first midterm exam, 2/26/2014, page 2 of 6 These are 4 points each. I assigned points based on your T/F answer and the explanatory statement. A check means you got the points. A horizontal line means you got half the points. A backward slash means I didn’t give you any points. fork parent fork child wait exit parent program initializes child context time exec
(d) A successful exec* system call reads data from a file (among other things that it does). True. In particular, the program to be executed is a file, and the exec* system call must read at least portions of that file (e.g., its header) to obtain a list of sections and their locations in the file and in the virtual memory, so that it knows how to set up the address space. Inside the kernel, the exec* syscall handler calls some of the internal code also used for the read syscall. (e) A machine fault always indicates some kind of error in the software. False: a fault could also be a virtual page fault, which is not an error, but a normal expected occurrence indicating a reference to a virtual page that is not in machine memory, i.e., a miss in the page cache. Some students said faults could also indicate a hardware error. That was worth 2 or 4 points depending on… e.g., mentioning a power-failing fault got you all the points. (f) Interrupt handlers execute entirely in kernel mode. True. Handlers for traps, faults, and interrupts execute in the kernel. Some answers said an interrupt handler could call into user mode. Interrupt handlers may interact with user-mode code, but they generally do that by waking up a thread that has been designated to handle the interaction and is waiting for the interrupt handler to wake it up. Once that thread is running, you’re not in the interrupt handler anymore. Alternatively, the kernel might notify a user program of an interrupt by redirecting a thread, i.e., changing its saved PC to point somewhere else in the user program before resuming it. For example, it might notify a user process of a timer tick interrupt (indicating a certain interval of time had passed) by redirecting the process main thread into a registered user-mode handler procedure for the SIGALARM signal. But then the signal handler executes on the thread, so it is not part of the interrupt handler. (g) A child process runs with the user ID of its parent. True or false. A child inherits the user ID of its parent, but the child may change its user ID by either of two means. First, it might execute a setuid system call (e.g., like the login program). Second, it might use an exec* syscall to execute a program file that has the setuid bit set (e.g., like the sudo program). (h) A pipe can be used to communicate only among siblings of a common parent. False. It is true that communication through a pipe is limited to processes that descend from the process that created the pipe (including the creator itself): the descriptors returned by pipe() syscall can be inherited by a child, but that is the only way to transmit them to another process. Even so, a process may use a pipe to communicate with itself, or to communicate with a child, as in the childin/childpipe examples.
(i) A running program may read its standard input (stdin) from a network socket. True. A socket descriptor may be used in the same way as any I/O descriptor, e.g., using read/write/close/dup2 system calls, and other obscure system calls that a process might use to operate on its stdin. For example, in a recitation I discussed a simple example of a multi-process server called catserver. It accepts a connection from the network, forks a child, passes the socket to the child (using dup2) as the child’s stdin and stdout, and exec*s a cat (or other designated program) in the child. You can connect to catserver through a network with a client program (e.g., telnet). Like any self-respecting network server, catserver can handle many concurrent client sessions. Some students mentioned that sockets are bidirectional, i.e., they can transfer data from client to server or from server to client. That’s true, but it doesn’t stop them from being used as stdin, eventhough stdin only requires data transfer in one direction. In fact, because sockets are bidirectional, the same socket may be used for both stdin and stdout, as in the catserver example above. (j) Multiple processes in a single pipeline can execute at the same time. True. The processes in a pipeline job are concurrent. It is true that they might wait for one another, i.e., if a data transfer buffer for the pipe inside the kernel is full or empty. But they might not ever have to wait for one another. In that case they may execute at the same time, e.g., on different cores.
Part 3. Reference counts The following questions pertain to the classic C/Unix (or Android) environment as discussed in class. Answer each question with a few phrases or maybe a sentence or two. [40 points] (a)The Unix kernel uses reference counting to manage the lifetimes of various objects. The reference counts are incremented and decremented during the execution of various system calls. List five system calls that increment reference counts on objects in the kernel. Fork, exec*, open, socket, pipe, dup*, link, mkdir, mmap, symlink. (b) List five system calls that decrement reference counts on objects in the kernel. Exit, exec*, close, dup2, unlink (remove), rmdir, unmap. (c) Are there any cases in which a fault handler might increment or decrement reference counts on objects in the kernel? Cite example(s) and/or explain. Yes. A fault handler might terminate a process, which is equivalent to exit: it closes all open descriptors and releases (unmaps) all virtual memory segments referenced by the process. (d)Are there any cases in which the kernel might store a reference count on disk? Cite example(s) and/or explain. Yes. In particular, hard links for a file (inode) are examples of reference counts. Those are are stored on disk: the file system name tree (folders, directories, file names) persist across system restarts. Some students suggested that the kernel would not store a reference count on disk because it could not protect the count from a user process. But the kernel controls access to the disk. User programs access the disk via system calls, which pass through the kernel. The kernel does not allow user programs to corrupt/destroy the file system, e.g., by writing nonsense into inodes (unless the process is running with userID root/superuser). CPS 310 first midterm exam, 2/26/2014, page 3 of 6
(e) True or false: dangling references cannot occur when reference counting is used (correctly). Explain. Uh, that’s true, right? That’s the purpose of reference counting: to free objects when you’re done with them, and not before. If you free them only after you’re done with them, then you won’t have a dangling reference, right? Because when you’re done with the reference, that means exactly that you won’t try to use it again. (?) I intended this to be easy and straightforward but lots of students tried to second-guess me. You couuuuld have a dangling reference: the identifier might still be there in memory! If a dangling reference resides in memory and no thread tries to use it, is it still a dangling reference? Some students noted that you could have dangling references on one kind of object even if you used reference counting for some other kind of object. Ah…true…I hadn’t thought of that. Or (my favorite): even if you use reference counting correctly, you couuuld still make a mistake! Anyway, I gave credit as seemed appropriate for clarity and correctness. (f) List two operations in Android that increment reference counts maintained by Android system software. What does Android do when these reference counts go to zero? We discussed two examples of reference-counting in Android: (1) Components in an app process are reference-counted. Each time a component is activated (via an intent), Android increments the reference count on the process. (2) Android keeps a reference count for the number of clients bound to a Service component. When a new client binds to the Service (via an intent), Android increments the reference count on the Service. In general, when the reference count on a component goes to zero, the system MAY deactivate it, e.g., after leaving it cached in memory for awhile in case it becomes active again. Similarly (going back to (1)) if an app process has no active components, then the system MAY deactivate the process, e.g., if the system is low on available memory. CPS 310 first midterm exam, 2/26/2014, page 3 of 6
Part 4. Cats As you know, cat is a simple standard Unix program that invokes read/write system calls in a loop to transfer bytes from its standard input (stdin) to its standard output (stdout). These questions ask you to explain various interactions of cats (processes running the cat program) with one another and with the kernel. (a)Consider this command line to a standard shell: “cat <in | cat”. How does the second cat know when it is done, i.e., what causes it to exit? [10 points] The second cat is done when it reads an EOF (end-of-file) from its stdin, i.e., from the first cat via a pipe. A pipe read returns EOF when the pipe buffer is empty AND no process has the pipe open for writing. I knocked just a few points off for missing some of the details. Some students who mentioned EOF suggested that the first cat writes an EOF into the pipe. But it doesn’t: EOF isn’t really a character, even though you can signal an EOF at the terminal using ctrl-d (done?). EOF is a condition: the read system call returns zero. It means there is no more data to read: e.g., the pipe is closed, the terminal is ctrl-done, there are no more bytes in the file, the peer on a socket has disconnected, and all the bytes obtained from the object have already been read. (b) Suppose that the program called empty is the null C program: int main() {}. Consider this command line to a standard shell: “cat | empty”. What does it do? How does the cat know when it is done, i.e., what causes it to exit? [10 points] The second process (running empty) exits immediately, closing the read end of the pipe. The first process (running cat) sleeps waiting for terminal input, and after receiving a line, attempts to write the line to the pipe. If a pipe is closed for reading (no process has its read side open) then data written into the pipe can never be read. In this case, the kernel delivers a SIGPIPE signal to any process that tries to write to the pipe via a write system call. The default action of SIGPIPE is to terminate the process. CPS 310 first midterm exam, 2/26/2014, page 4 of 6
(c) Consider this command line to a standard shell: “cat out”. You may assume that the current directory resides on a disk, and that “in” is a file with some substantial amount of data in it. This question asks you to explain how this cat consumes memory and CPU time. Please answer on the following page. [60 points] First, how much memory does it consume? Draw a picture of the page table and the segments of the virtual address space, with your best guess of the total size. I am looking for a rough sketch of these data structures as they would reside in the memory of the machine. Details vary, so you may make any reasonable simplifying assumptions about the machine or the cat program, but please note them in your answer. For this answer, I was looking for a list/cartoon of segments in the address space, a page table indexed by Virtual Page Number and pointing into machine memory, and maybe (extra bonus) an inode block map pointing to where the files in and out are stored on disk. All of these figures can be found in the slides. The important points are that cat reads (using a read syscall) from in into a fixed-size memory buffer and then writes the buffer contents to out (using a write system call). Since the file in is large, it might go through the read/write loop many times. But it reuses the same buffer each time: the amount of buffer memory is fixed. In fact, cat is very small: the text, stack, heap, and global data segments are probably one or two pages each. The buffer might be a few pages. A complete answer would note (or illustrate) that page tables are typically hierarchical, so that the page table would also be small, e.g., just a few page frames. I didn’t hear much about that, so this topic might need more exposure. Second, how much CPU time does it consume? How does the CPU time vary with the size of the file in? How much of the time is spent in kernel mode vs. user mode? Draw a rough sketch of how the cat transitions between user mode, kernel mode, and sleep states as it executes through time. What events cause the transitions? As noted above, cat might go through its read/write loop many times. On each iteration: -From cat program in user mode, trap to kernel for read syscall -Read syscall runs on the same thread in kernel mode, initiates disk I/O, and blocks (sleep). -Disk I/O completes, interrupt handler wakes up thread. -Thread, still running in kernel mode in read syscall, copies data from kernel buffer into cat’s buffer in user space. -Return to user mode in cat program. Cat immediately traps back to kernel for write syscall. -Write syscall runs on the same thread in kernel mode, copies data from cat’s buffer into kernel buffer. -Write syscall initiates disk I/O, and blocks (sleep). -Disk I/O completes, interrupt handler wakes up thread. -Thread returns to cat program in user mode.
So, the important points are: -Cat spends most of its time sleeping, waiting for disk I/O. -Most of the rest of the time is spent in kernel mode (e.g., copying). -Cat spends very little time per iteration in user mode. -The total time spent in user mode and kernel mode scales with the number of iterations. -i.e., it scales with the size of the file. Your mileage may vary! In these answers, I was looking for a general confident awareness of what is going on. There are two kinds of trouble spots that lost points: omissions and garbled explanations. The most common omissions were the hierarchical page table, the kernel’s role in copying data to/from user space, and the detail that cats spend most of their time sleeping. I got several forms of garble: -Some students are confused about what happens on a kernel trap. The core switches to kernel mode and the thread keeps executing in kernel mode, in the syscall handler. Some students said that the thread sleeps until the kernel returns. That seems like it makes sense: that’s what happens when you call a server with RPC. But kernel mode is different: the thread/process merely enters the kernel and keeps executing. -Some students talked about pipes, but there are no pipes in the example. There was some confusion about how the buffering works: some students talked about the buffer being “full” or “empty”. That language applies to pipes, but file reads and writes transfer data between a kernel buffer and disk using DMA, and then interrupt when the transfer is done: the interrupt handler wakes up any thread waiting for the transfer. I generally let this go since we haven’t talked much about disk I/O and file systems yet. -There are still some students who think cat would inhale (read) the entire input file into memory and only then write it out. E.g., some students said that the memory cost would scale with the size of the file. But cat and other Unix programs don’t work that way: they read and write in chunks iteratively. It is true that the kernel may use surplus memory to cache the file, e.g., keep a copy in memory in case the file is accessed again later. But caching is optional: in fact,”cat out” can handle files that may be much larger than memory.