Download presentation
Presentation is loading. Please wait.
Published bySharlene Bradford Modified over 9 years ago
1
CS162 Operating Systems and Systems Programming Midterm Review March 18, 2013
2
Midterm Review.2 CS162 ©UCB Fall 2013 Overview Synchronization, Critical Sections Deadlock CPU Scheduling Memory Multiplexing, Address Translation Caches, TLBs Page Allocation and Replacement Kernel/User Interface
3
Midterm Review.3 CS162 ©UCB Fall 2013 Synchronization, Critical sections
4
Midterm Review.4 CS162 ©UCB Fall 2013 Definitions Synchronization: using atomic operations to ensure cooperation between threads Mutual Exclusion: ensuring that only one thread does a particular thing at a time –One thread excludes the other while doing its task Critical Section: piece of code that only one thread can execute at once –Critical section is the result of mutual exclusion –Critical section and mutual exclusion are two ways of describing the same thing.
5
Midterm Review.5 CS162 ©UCB Fall 2013 Locks: using interrupts Key idea: maintain a lock variable and impose mutual exclusion only during operations on that variable int value = FREE; Acquire() { disable interrupts; if (value == BUSY) { put thread on wait queue; Go to sleep(); // Enable interrupts? } else { value = BUSY; } enable interrupts; } Release() { disable interrupts; if (anyone on wait queue) { take thread off wait queue Place on ready queue; } else { value = FREE; } enable interrupts; }
6
Midterm Review.6 CS162 ©UCB Fall 2013 Better locks: using test&set test&set (&address) { /* most architectures */ result = M[address]; M[address] = 1; return result; } Release() { // Short busy-wait time while (test&set(guard)); if anyone on wait queue { take thread off wait queue Place on ready queue; } else { value = FREE; } guard = 0; int guard = 0; int value = FREE; Acquire() { // Short busy-wait time while (test&set(guard)); if (value == BUSY) { put thread on wait queue; go to sleep() & guard = 0; } else { value = BUSY; guard = 0; } }
7
Midterm Review.7 CS162 ©UCB Fall 2013 Does this busy wait? If so, how long does it wait? Why is this better than the version that disables interrupts? Remember that a thread can be context-switched while holding a lock. What implications does this have on the performance of the system?
8
Midterm Review.8 CS162 ©UCB Fall 2013 Semaphores Definition: a Semaphore has a non-negative integer value and supports the following two operations: –P(): an atomic operation that waits for semaphore to become positive, then decrements it by 1 »Think of this as the wait() operation –V(): an atomic operation that increments the semaphore by 1, waking up a waiting P, if any »This of this as the signal() operation –You cannot read the integer value! The only methods are P() and V(). –Unlike lock, there is no owner (thus no priority donation). The thread that calls V() may never call P(). Example: Producer/consumer
9
Midterm Review.9 CS162 ©UCB Fall 2013 Condition Variables Condition Variable: a queue of threads waiting for something inside a critical section –Key idea: allow sleeping inside critical section by atomically releasing lock at time we go to sleep –Contrast to semaphores: Can’t wait inside critical section Operations: –Wait(&lock) : Atomically release lock and go to sleep. Re- acquire lock later, before returning. –Signal() : Wake up one waiter, if any –Broadcast() : Wake up all waiters Rule: Must hold lock when doing condition variable ops!
10
Midterm Review.10 CS162 ©UCB Fall 2013 Complete Monitor Example (with condition variable) Here is an (infinite) synchronized queue Lock lock; Condition dataready; Queue queue; AddToQueue(item) { lock.Acquire();// Get Lock queue.enqueue(item);// Add item dataready.signal();// Signal any waiters lock.Release();// Release Lock } RemoveFromQueue() { lock.Acquire();// Get Lock while (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue();// Get next item lock.Release();// Release Lock return(item); }
11
Midterm Review.11 CS162 ©UCB Fall 2013 Mesa vs. Hoare monitors Need to be careful about precise definition of signal and wait. Consider a piece of our dequeue code: while (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue();// Get next item –Why didn’t we do this? if (queue.isEmpty()) { dataready.wait(&lock); // If nothing, sleep } item = queue.dequeue();// Get next item Answer: depends on the type of scheduling –Hoare-style (most textbooks): »Signaler gives lock, CPU to waiter; waiter runs immediately »Waiter gives up lock, processor back to signaler when it exits critical section or if it waits again –Mesa-style (most real operating systems): »Signaler keeps lock and processor »Waiter placed on ready queue with no special priority »Practically, need to check condition again after wait
12
Midterm Review.12 CS162 ©UCB Fall 2013 Reader/Writer Two types of lock: read lock and write lock (i.e. shared lock and exclusive lock) Shared locks is an example where you might want to do broadcast/wakeAll on a condition variable. If you do it in other situations, correctness is generally preserved, but it is inefficient because one thread gets in and then the rest of the woken threads go back to sleep.
13
Midterm Review.13 CS162 ©UCB Fall 2013 Read/Writer Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okToRead.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okToWrite.signal(); lock.Release(); } Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okToWrite.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okToRead.broadcast(); } lock.Release(); } What if we remove this line?
14
Midterm Review.14 CS162 ©UCB Fall 2013 Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okToRead.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okToWrite.broadcast(); lock.Release(); } Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okToWrite.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okToRead.broadcast(); } lock.Release(); } What if we turn signal to broadcast?
15
Midterm Review.15 CS162 ©UCB Fall 2013 Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.signal(); lock.Release(); } Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release(); } What if we turn okToWrite and okToRead into okContinue?
16
Midterm Review.16 CS162 ©UCB Fall 2013 Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.signal(); lock.Release(); } Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release(); } R1 arrives W1, R2 arrive while R1 reads R1 signals R2
17
Midterm Review.17 CS162 ©UCB Fall 2013 Read/Writer Revisited Reader() { // check into system lock.Acquire(); while ((AW + WW) > 0) { WR++; okContinue.wait(&lock); WR--; } AR++; lock.release(); // read-only access AccessDbase(ReadOnly); // check out of system lock.Acquire(); AR--; if (AR == 0 && WW > 0) okContinue.broadcast(); lock.Release(); } Writer() { // check into system lock.Acquire(); while ((AW + AR) > 0) { WW++; okContinue.wait(&lock); WW--; } AW++; lock.release(); // read/write access AccessDbase(ReadWrite); // check out of system lock.Acquire(); AW--; if (WW > 0){ okToWrite.signal(); } else if (WR > 0) { okContinue.broadcast(); } lock.Release(); } Need to change to broadcast! Why?
18
Midterm Review.18 CS162 ©UCB Fall 2013 Example Synchronization Problems Fall 2012 Midterm Exam #3 Spring 2013 Midterm Exam #2 Hard Synchronization Problems from Past Exams –http://inst.eecs.berkeley.edu/~cs162/fa13/hand- outs/synch-problems.htmlhttp://inst.eecs.berkeley.edu/~cs162/fa13/hand- outs/synch-problems.html
19
Midterm Review.19 CS162 ©UCB Fall 2013 Deadlock
20
Midterm Review.20 CS162 ©UCB Fall 2013 Four requirements for Deadlock Mutual exclusion –Only one thread at a time can use a resource. Hold and wait –Thread holding at least one resource is waiting to acquire additional resources held by other threads No preemption –Resources are released only voluntarily by the thread holding the resource, after thread is finished with it Circular Wait –There exists a set {T 1, …, T n } of waiting threads »T 1 is waiting for a resource that is held by T 2 »T 2 is waiting for a resource that is held by T 3 »… »T n is waiting for a resource that is held by T 1
21
Midterm Review.21 CS162 ©UCB Fall 2013 Resource Allocation Graph Examples T1T1 T2T2 T3T3 R1R1 R2R2 R3R3 R4R4 Simple Resource Allocation Graph T1T1 T2T2 T3T3 R1R1 R2R2 R3R3 R4R4 Allocation Graph With Deadlock T1T1 T2T2 T3T3 R2R2 R1R1 T4T4 Allocation Graph With Cycle, but No Deadlock Recall: –request edge – directed edge T 1 R j –assignment edge – directed edge R j T i
22
Midterm Review.22 CS162 ©UCB Fall 2013 T1T1 T2T2 T3T3 R2R2 R1R1 T4T4 Deadlock Detection Algorithm Only one of each type of resource look for loops More General Deadlock Detection Algorithm –Let [X] represent an m-ary vector of non-negative integers (quantities of resources of each type): [FreeResources]: Current free resources each type [Request X ]: Current requests from thread X [Alloc X ]: Current resources held by thread X –See if tasks can eventually terminate on their own [Avail] = [FreeResources] Add all nodes to UNFINISHED do { done = true Foreach node in UNFINISHED { if ([Request node ] <= [Avail]) { remove node from UNFINISHED [Avail] = [Avail] + [Alloc node ] done = false } } } until(done) –Nodes left in UNFINISHED deadlocked
23
Midterm Review.23 CS162 ©UCB Fall 2013 Banker’s algorithm Method of deadlock avoidance. Process specifies max resources it will ever request. Process can initially request less than max and later ask for more. Before a request is granted, the Banker’s algorithm checks that the system is “safe” if the requested resources are granted. Roughly: Some process could request it’s specified Max and eventually finish; when it finishes, that process frees up its resources. A system is safe if there exists some sequence by which processes can request their max, terminate, and free up their resources so that all other processes can finish (in the same way).
24
Midterm Review.24 CS162 ©UCB Fall 2013 Example Deadlock Problems Spring 2004 Midterm Exam #4 Spring 2011 Midterm Exam #2
25
Midterm Review.25 CS162 ©UCB Fall 2013 CPU Scheduling
26
Midterm Review.26 CS162 ©UCB Fall 2013 Scheduling Assumptions CPU scheduling big area of research in early 70’s Many implicit assumptions for CPU scheduling: –One program per user –One thread per program –Programs are independent In general unrealistic but they simplify the problem –For instance: is “fair” about fairness among users or programs? »If I run one compilation job and you run five, you get five times as much CPU on many operating systems The high-level goal: Dole out CPU time to optimize some desired parameters of system USER1USER2USER3USER1USER2 Time
27
Midterm Review.27 CS162 ©UCB Fall 2013 Scheduling Metrics Waiting Time: time the job is waiting in the ready queue –Time between job’s arrival in the ready queue and launching the job Service (Execution) Time: time the job is running Response (Completion) Time: –Time between job’s arrival in the ready queue and job’s completion –Response time is what the user sees: »Time to echo a keystroke in editor »Time to compile a program Response Time = Waiting Time + Service Time Throughput: number of jobs completed per unit of time –Throughput related to response time, but not same thing: »Minimizing response time will lead to more context switching than if you only maximized throughput
28
Midterm Review.28 CS162 ©UCB Fall 2013 Scheduling Policy Goals/Criteria Minimize Response Time –Minimize elapsed time to do an operation (or job) Maximize Throughput –Two parts to maximizing throughput »Minimize overhead (for example, context-switching) »Efficient use of resources (CPU, disk, memory, etc) Fairness –Share CPU among users in some equitable way –Fairness is not minimizing average response time: »Better average response time by making system less fair –Tradeoff: fairness gained by hurting average response time!
29
Midterm Review.29 CS162 ©UCB Fall 2013 First-Come, First-Served (FCFS) Scheduling First-Come, First-Served (FCFS) –Also “First In, First Out” (FIFO) or “Run until done” »In early systems, FCFS meant one program scheduled until done (including I/O) »Now, means keep CPU until thread blocks Example: ProcessBurst Time P 1 24 P 2 3 P 3 3 –Suppose processes arrive in the order: P 1, P 2, P 3 The Gantt Chart for the schedule is: –Waiting time for P 1 = 0; P 2 = 24; P 3 = 27 –Average waiting time: (0 + 24 + 27)/3 = 17 –Average completion time: (24 + 27 + 30)/3 = 27 Convoy effect: short process behind long process P1P1 P2P2 P3P3 2427300
30
Midterm Review.30 CS162 ©UCB Fall 2013 Round Robin (RR) FCFS Scheme: Potentially bad for short jobs! –Depends on submit order –If you are first in line at supermarket with milk, you don’t care who is behind you, on the other hand… Round Robin Scheme –Each process gets a small unit of CPU time (time quantum), usually 10-100 milliseconds –After quantum expires, the process is preempted and added to the end of the ready queue –n processes in ready queue and time quantum is q »Each process gets 1/n of the CPU time »In chunks of at most q time units »No process waits more than (n-1)q time units Performance –q large FCFS –q small Interleaved –q must be large with respect to context switch, otherwise overhead is too high (all overhead)
31
Midterm Review.31 CS162 ©UCB Fall 2013 What if we Knew the Future? Could we always mirror best FCFS? Shortest Job First (SJF): –Run whatever job has the least amount of computation to do Shortest Remaining Time First (SRTF): –Preemptive version of SJF: if job arrives and has a shorter time to completion than the remaining time on the current job, immediately preempt CPU These can be applied either to a whole program or the current CPU burst of each program –Idea is to get short jobs out of the system –Big effect on short jobs, only small effect on long ones –Result is better average response time
32
Midterm Review.32 CS162 ©UCB Fall 2013 Discussion SJF/SRTF are the best you can do at minimizing average response time –Provably optimal (SJF among non-preemptive, SRTF among preemptive) –Since SRTF is always at least as good as SJF, focus on SRTF Comparison of SRTF with FCFS and RR –What if all jobs the same length? »SRTF becomes the same as FCFS (i.e., FCFS is best can do if all jobs the same length) –What if jobs have varying length? »SRTF (and RR): short jobs not stuck behind long ones
33
Midterm Review.33 CS162 ©UCB Fall 2013 Summary Scheduling: selecting a process from the ready queue and allocating the CPU to it FCFS Scheduling: –Run threads to completion in order of submission –Pros: Simple (+) –Cons: Short jobs get stuck behind long ones (-) Round-Robin Scheduling: –Give each thread a small amount of CPU time when it executes; cycle between all ready threads –Pros: Better for short jobs (+) –Cons: Poor when jobs are same length (-)
34
Midterm Review.34 CS162 ©UCB Fall 2013 Summary (cont’d) Shortest Job First (SJF)/Shortest Remaining Time First (SRTF): –Run whatever job has the least amount of computation to do/least remaining amount of computation to do –Pros: Optimal (average response time) –Cons: Hard to predict future, Unfair Multi-Level Feedback Scheduling: –Multiple queues of different priorities –Automatic promotion/demotion of process priority in order to approximate SJF/SRTF Lottery Scheduling: –Give each thread a number of tokens (short tasks more tokens) –Reserve a minimum number of tokens for every thread to ensure forward progress/fairness
35
Midterm Review.35 CS162 ©UCB Fall 2013 Example CPU Scheduling Problems Spring 2011 Midterm Exam #4 Fall 2012 Midterm Exam #4 Spring 2013 Midterm Exam #5
36
Midterm Review.36 CS162 ©UCB Fall 2013 Memory Multiplexing, Address Translation
37
Midterm Review.37 CS162 ©UCB Fall 2013 Important Aspects of Memory Multiplexing Controlled overlap: –Processes should not collide in physical memory –Conversely, would like the ability to share memory when desired (for communication) Protection: –Prevent access to private memory of other processes »Different pages of memory can be given special behavior (Read Only, Invisible to user programs, etc). »Kernel data protected from User programs »Programs protected from themselves Translation: –Ability to translate accesses from one address space (virtual) to a different one (physical) –When translation exists, processor uses virtual addresses, physical memory uses physical addresses –Side effects: »Can be used to avoid overlap »Can be used to give uniform view of memory to programs
38
Midterm Review.38 CS162 ©UCB Fall 2013 Why Address Translation? Prog 1 Virtual Address Space 1 Prog 2 Virtual Address Space 2 Code Data Heap Stack Code Data Heap Stack Data 2 Stack 1 Heap 1 OS heap & Stacks Code 1 Stack 2 Data 1 Heap 2 Code 2 OS code OS data Translation Map 1Translation Map 2 Physical Address Space
39
Midterm Review.39 CS162 ©UCB Fall 2013 Addr. Translation: Segmentation vs. Paging Base0Limit0V Base1Limit1V Base2Limit2V Base3Limit3N Base4Limit4V Base5Limit5N Base6Limit6N Base7Limit7V OffsetSeg # Virtual Address Base2Limit2V + Physical Address > Error Physical Address Offset Virtual Page # Virtual Address: PageTablePtr page #0 page #2 page #3 page #4 page #5 V,R page #1 V,R V,R,W N page #1 V,R Check Perm Access Error Physical Page #
40
Midterm Review.40 CS162 ©UCB Fall 2013 Review: Address Segmentation 1111 stack heap code data Virtual memory view Physical memory view data heap stack 0000 0001 0000 0101 0000 0111 0000 1110 000 0000 0100 0000 1000 0000 1100 0000 1111 0000 Seg #baselimit 000001 000010 0000 010101 000010 0000 100111 00001 1000 111011 00001 0000 seg #offset code
41
Midterm Review.41 CS162 ©UCB Fall 2013 Review: Address Segmentation 1111 stack heap code data Virtual memory view Physical memory view data heap stack 0000 0100 0000 1000 0000 1100 0000 seg #offset code What happens if stack grows to 1110 0000? 1110 0000 0000 0001 0000 0101 0000 0111 0000 1110 000 Seg #baselimit 000001 000010 0000 010101 000010 0000 100111 00001 1000 111011 00001 0000
42
Midterm Review.42 CS162 ©UCB Fall 2013 Review: Address Segmentation 1111 stack heap code data Virtual memory view Physical memory view data heap stack 0000 0100 0000 1000 0000 1100 0000 1110 0000 seg #offset code No room to grow!! Buffer overflow error 0000 0001 0000 0101 0000 0111 0000 1110 000 Seg #baselimit 000001 000010 0000 010101 000010 0000 100111 00001 1000 111011 00001 0000
43
Midterm Review.43 CS162 ©UCB Fall 2013 Review: Paging 1111 stack heap code data Virtual memory view 0000 0100 0000 1000 0000 1100 0000 1111 0000 page #offset Physical memory view data code heap stack 0000 0001 0000 0101 000 0111 000 1110 0000 11111 11101 11110 11100 11101 null 11100 null 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 Page Table
44
Midterm Review.44 CS162 ©UCB Fall 2013 Review: Paging 1111 stack heap code data Virtual memory view 0000 0100 0000 1000 0000 1100 0000 page #offset Physical memory view data code heap stack 0000 0001 0000 0101 000 0111 000 1110 0000 11111 11101 11110 11100 11101 null 11100 null 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 Page Table 1110 0000 What happens if stack grows to 1110 0000?
45
Midterm Review.45 CS162 ©UCB Fall 2013 stack Review: Paging 1111 stack heap code data Virtual memory view 0000 0100 0000 1000 0000 1100 0000 page #offset Physical memory view data code heap stack 11111 11101 11110 11100 11101 10111 11100 10110 11011 null 11010 null 11001 null 11000 null 10111 null 10110 null 10101 null 10100 null 10011 null 10010 10000 10001 01111 10000 01110 01111 null 01110 null 01101 null 01100 null 01011 01101 01010 01100 01001 01011 01000 01010 00111 null 00110 null 00101 null 00100 null 00011 00101 00010 00100 00001 00011 00000 00010 Page Table 0000 0001 0000 0101 000 0111 000 1110 0000 Allocate new pages where room! Challenge: Table size equal to the # of pages in virtual memory! 1110 0000
46
Midterm Review.46 CS162 ©UCB Fall 2013 Would you choose Segmentation or Paging? Design a Virtual Memory system
47
Midterm Review.47 CS162 ©UCB Fall 2013 Design a Virtual Memory system How would you choose your Page Size?
48
Midterm Review.48 CS162 ©UCB Fall 2013 How would you choose your Page Size? Page Table Size TLB Size Internal Fragmentation Disk Access Design a Virtual Memory system
49
Midterm Review.49 CS162 ©UCB Fall 2013 You have a 32 bit address space and 2 KB pages. In a single-level page table, how many bits would you use for the index and how many for the offset? Design a Virtual Memory system
50
Midterm Review.50 CS162 ©UCB Fall 2013 2 KB = 2 11 bytes 11 bit Offset 32 bit address space -11 bit offset = 21 bits 21 bit Virtual Page Number 031 Virtual Page Number Byte Offset 11 Design a Virtual Memory system
51
Midterm Review.51 CS162 ©UCB Fall 2013 Design a Virtual Memory system Page Table VPN0 VPN1 VPN2 VPN3 …… PTE1 PTE2 PTE3 PTE4 …… What does each page table entry look like (PTE)?
52
Midterm Review.52 CS162 ©UCB Fall 2013 What does each page table entry look like (PTE)? Design a Virtual Memory system sizeof(VPN) == sizeof(PPN) == 21 bits 21 bit Physical Page Number Don’t forget the 4 auxiliary bits: Valid (resident in memory) Read (has read permissions) Write (has write permissions) Execute (has execute permissions) = 25 bit PTE 32 – 25 = 7 unused bits PPN VRW 214 E Unused 7
53
Midterm Review.53 CS162 ©UCB Fall 2013 If you only had 16 MB of physical memory, how many bits of the PPN field would actually be used? Design a Virtual Memory system
54
Midterm Review.54 CS162 ©UCB Fall 2013 Design a Virtual Memory system PPN VRW 134 E Unused 15 If you only had 16 MB of physical memory, how many bits of the PPN field would actually be used? 16 MB = 2 24 bytes With an 11 bit offset… 24-11 = 13 bit Physical Page Number
55
Midterm Review.55 CS162 ©UCB Fall 2013 How large would the page table be? Design a Virtual Memory system
56
Midterm Review.56 CS162 ©UCB Fall 2013 How large would the page table be? 2 21 Entries Total Size: 2 21 x 4 B = 8MB Design a Virtual Memory system PPN VRW E Unused PPN VRW E Unused PPN VRW E Unused PPN VRW E Unused 4 Byte Entry
57
Midterm Review.57 CS162 ©UCB Fall 2013 stack Review: Two-Level Paging 1111 stack heap code data Virtual memory view 0000 0100 0000 1000 0000 1100 0000 page1 #offset Physical memory view data code heap stack 0000 0001 0000 0101 000 0111 000 1110 0000 page2 # 111 110 null 101 null 100 011 null 010 001 null 000 11 11101 10 11100 01 10111 00 10110 11 01101 10 01100 01 01011 00 01010 11 00101 10 00100 01 00011 00 00010 11 null 10 10000 01 01111 00 01110 Page Tables (level 2) Page Table (level 1) 1110 0000
58
Midterm Review.58 CS162 ©UCB Fall 2013 stack Review: Two-Level Paging stack heap code data Virtual memory view 1001 0000 Physical memory view data code heap stack 0000 0001 0000 1000 0000 1110 0000 111 110 null 101 null 100 011 null 010 001 null 000 11 11101 10 11100 01 10111 00 10110 11 01101 10 01100 01 01011 00 01010 11 00101 10 00100 01 00011 00 00010 11 null 10 10000 01 01111 00 01110 Page Tables (level 2) Page Table (level 1) In best case, total size of page tables ≈ number of pages used by program. But requires one additional memory access!
59
Midterm Review.59 CS162 ©UCB Fall 2013 Physical Address: (40-50 bits) 12bit Offset Physical Page # X86_64: Four-level page table! 9 bits 12 bits 48-bit Virtual Address: Offset Virtual P2 index Virtual P1 index 8 bytes PageTablePtr Virtual P3 index Virtual P4 index 9 bits 4096-byte pages (12 bit offset) Page tables also 4k bytes (pageable)
60
Midterm Review.60 CS162 ©UCB Fall 2013 stack Review: Inverted Table 1111 stack heap code data Virtual memory view 0000 0100 0000 1000 0000 1100 0000 page #offset Physical memory view data code heap stack 11111 11101 11110 11100 11101 10111 11100 10110 10010 10000 10001 01111 10000 01110 01011 01101 01010 01100 01001 01011 10000 01010 00011 00101 00010 00100 00001 00011 00000 00010 Inverted Table hash(virt. page #) = physical page # 0000 0001 0000 0101 000 0111 000 1110 0000 Total size of page table ≈ number of pages used by program. But hash more complex.
61
Midterm Review.61 CS162 ©UCB Fall 2013 Address Translation Comparison AdvantagesDisadvantages SegmentationFast context switching: Segment mapping maintained by CPU External fragmentation Paging (single-level page) No external fragmentation Large size: Table size ~ virtual memory Internal fragmentation Paged segmentation No external fragmentation Table size ~ memory used by program Multiple memory references per page access Internal fragmentation Multi-level pages Inverted TableHash function more complex
62
Midterm Review.62 CS162 ©UCB Fall 2013 Example Memory Multiplexing, Address Translation Problems Spring 2004 Midterm Exam #5 Fall 2009 Midterm Exam #4 Spring 2013 Midterm Exam #3b
63
Midterm Review.63 CS162 ©UCB Fall 2013 Caches, TLBs
64
Midterm Review.64 CS162 ©UCB Fall 2013 Compulsory (cold start): first reference to a block –“Cold” fact of life: not a whole lot you can do about it –Note: When running “billions” of instruction, Compulsory Misses are insignificant Capacity: –Cache cannot contain all blocks access by the program –Solution: increase cache size Conflict (collision): –Multiple memory locations mapped to same cache location –Solutions: increase cache size, or increase associativity Two others: –Coherence (Invalidation): other process (e.g., I/O) updates memory –Policy: Due to non-optimal replacement policy Review: Sources of Cache Misses
65
Midterm Review.65 CS162 ©UCB Fall 2013 Example: Block 12 placed in 8 block cache 0 1 2 3 4 5 6 7Block no. Direct mapped: block 12 (01100) can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 0 1 2 3 4 5 6 7 Block no. Set 0 Set 1 Set 2 Set 3 Fully associative: block 12 can go anywhere 0 1 2 3 4 5 6 7 Block no. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 32-Block Address Space: 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3Block no. Where does a Block Get Placed in a Cache? 01100 tagindex 011 00 tagindex 01100 tag
66
Midterm Review.66 CS162 ©UCB Fall 2013 Review: Caching Applied to Address Translation Problem: address translation expensive (especially multi-level) Solution: cache address translation (TLB) –Instruction accesses spend a lot of time on the same page (since accesses sequential) –Stack accesses have definite locality of reference –Data accesses have less page locality, but still some… Data Read or Write (untranslated) CPU Physical Memory TLB Translate (MMU) No Virtual Address Physical Address Yes Cached? Save Result
67
Midterm Review.67 CS162 ©UCB Fall 2013 TLB organization How big does TLB actually have to be? –Usually small: 128-512 entries –Not very big, can support higher associativity TLB usually organized as fully-associative cache –Lookup is by Virtual Address –Returns Physical Address What happens when fully-associative is too slow? –Put a small (4-16 entry) direct-mapped cache in front –Called a “TLB Slice” When does TLB lookup occur? –Before cache lookup? –In parallel with cache lookup?
68
Midterm Review.68 CS162 ©UCB Fall 2013 As described, TLB lookup is in serial with cache lookup: Machines with TLBs go one step further: they overlap TLB lookup with cache access. –Works because offset available early Reducing translation time further Virtual Address TLB Lookup V Access Rights PA V page no.offset 10 P page no.offset 10 Physical Address
69
Midterm Review.69 CS162 ©UCB Fall 2013 Here is how this might work with a 4K cache: What if cache size is increased to 8KB? –Overlap not complete –Need to do something else. See CS152/252 Another option: Virtual Caches –Tags in cache are virtual addresses –Translation only happens on cache misses TLB 4K Cache 102 00 4 bytes index 1 K page #disp 20 assoc lookup 32 Hit/ Miss PA Data Hit/ Miss = PA Overlapping TLB & Cache Access
70
Midterm Review.70 CS162 ©UCB Fall 2013 Example Caches, TLB Problems Spring 2011 Midterm Exam #6 Spring 2012 Midterm Exam #6 Spring 2013 Midterm Exam #3a
71
Midterm Review.71 CS162 ©UCB Fall 2013 Putting Everything Together
72
Midterm Review.72 CS162 ©UCB Fall 2013 Paging & Address Translation Physical Address: Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index PageTablePtr Page Table (1 st level) Page Table (2 nd level) Physical Memory:
73
Midterm Review.73 CS162 ©UCB Fall 2013 Page Table (2 nd level) PageTablePtr Page Table (1 st level) Translation Look-aside Buffer Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index Physical Memory: Physical Address: … TLB:
74
Midterm Review.74 CS162 ©UCB Fall 2013 Page Table (2 nd level) PageTablePtr Page Table (1 st level) Virtual Address: Offset Virtual P2 index Virtual P1 index … TLB: Caching Offset Physical Memory: Physical Address: Physical Page # … tag:block: cache: index bytetag
75
Midterm Review.75 CS162 ©UCB Fall 2013 Page Allocation and Replacement
76
Midterm Review.76 CS162 ©UCB Fall 2013 Demand Paging Modern programs require a lot of physical memory –Memory per system growing faster than 25%-30%/year But they don’t use all their memory all of the time –90-10 rule: programs spend 90% of their time in 10% of their code –Wasteful to require all of user’s code to be in memory Solution: use main memory as cache for disk On-Chip Cache Control Datapath Secondary Storage (Disk) Processor Main Memory (DRAM) Second Level Cache (SRAM) Tertiary Storage (Tape) Caching
77
Midterm Review.77 CS162 ©UCB Fall 2013 PTE helps us implement demand paging –Valid Page in memory, PTE points at physical page –Not Valid Page not in memory; use info in PTE to find it on disk when necessary Suppose user references page with invalid PTE? –Memory Management Unit (MMU) traps to OS »Resulting trap is a “Page Fault” –What does OS do on a Page Fault?: »Choose an old page to replace »If old page modified (“D=1”), write contents back to disk »Change its PTE and any cached TLB to be invalid »Load new page into memory from disk »Update page table entry, invalidate TLB for new entry »Continue thread from original faulting location –TLB for new page will be loaded when thread continued! –While pulling pages off disk for one process, OS runs another process from ready queue »Suspended process sits on wait queue Demand Paging Mechanisms
78
Midterm Review.78 CS162 ©UCB Fall 2013 What Factors Lead to Misses? Compulsory Misses: –Pages that have never been paged into memory before –How might we remove these misses? »Prefetching: loading them into memory before needed »Need to predict future somehow! More later. Capacity Misses: –Not enough memory. Must somehow increase size. –Can we do this? »One option: Increase amount of DRAM (not quick fix!) »Another option: If multiple processes in memory: adjust percentage of memory allocated to each one! Conflict Misses: –Technically, conflict misses don’t exist in virtual memory, since it is a “fully-associative” cache Policy Misses: –Caused when pages were in memory, but kicked out prematurely because of the replacement policy –How to fix? Better replacement policy
79
Midterm Review.79 CS162 ©UCB Fall 2013 Replacement Policies (Con’t) LRU (Least Recently Used): –Replace page that hasn’t been used for the longest time –Programs have locality, so if something not used for a while, unlikely to be used in the near future. –Seems like LRU should be a good approximation to MIN. How to implement LRU? Use a list! –On each use, remove page from list and place at head –LRU page is at tail Problems with this scheme for paging? –List operations complex »Many instructions for each hardware access In practice, people approximate LRU (more later) Page 6Page 7Page 1Page 2 Head Tail (LRU)
80
Midterm Review.80 CS162 ©UCB Fall 2013 Suppose we have 3 page frames, 4 virtual pages, and following reference stream: –A B C A B D A D B C B Consider FIFO Page replacement: –FIFO: 7 faults. –When referencing D, replacing A is bad choice, since need A again right away Example: FIFO C B A D C B A BCBDADBACBA 3 2 1 Ref: Page:
81
Midterm Review.81 CS162 ©UCB Fall 2013 Suppose we have the same reference stream: –A B C A B D A D B C B Consider MIN Page replacement: –MIN: 5 faults –Look for page not referenced farthest in future. What will LRU do? –Same decisions as MIN here, but won’t always be true! Example: MIN C DC B A BCBDADBACBA 3 2 1 Ref: Page:
82
Midterm Review.82 CS162 ©UCB Fall 2013 Implementing LRU & Second Chance Perfect: –Timestamp page on each reference –Keep list of pages ordered by time of reference –Too expensive to implement in reality for many reasons Second Chance Algorithm: –Approximate LRU »Replace an old page, not the oldest page –FIFO with “use” bit Details –A “use” bit per physical page –On page fault check page at head of queue »If use bit=1 clear bit, and move page at tail (give the page second chance!) »If use bit=0 replace page –Moving pages to tail still complex
83
Midterm Review.83 CS162 ©UCB Fall 2013 Clock Algorithm Clock Algorithm: more efficient implementation of second chance algorithm –Arrange physical pages in circle with single clock hand Details: –On page fault: »Check use bit: 1 used recently; clear and leave it alone 0 selected candidate for replacement »Advance clock hand (not real time) –Will always find a page or loop forever?
84
Midterm Review.84 CS162 ©UCB Fall 2013 Clock Replacement Illustration Max page table size 4 Invariant: point at oldest page –Page B arrives B u:0
85
Midterm Review.85 CS162 ©UCB Fall 2013 Clock Replacement Illustration Max page table size 4 Invariant: point at oldest page –Page B arrives –Page A arrives –Access page A B u:0 A u:0
86
Midterm Review.86 CS162 ©UCB Fall 2013 Clock Replacement Illustration Max page table size 4 Invariant: point at oldest page –Page B arrives –Page A arrives –Access page A –Page D arrives B u:0 A u:1 D u:0
87
Midterm Review.87 CS162 ©UCB Fall 2013 Clock Replacement Illustration Max page table size 4 Invariant: point at oldest page –Page B arrives –Page A arrives –Access page A –Page D arrives –Page C arrives B u:0 A u:1 D u:0 C u:0
88
Midterm Review.88 CS162 ©UCB Fall 2013 B u:0 Clock Replacement Illustration Max page table size 4 Invariant: point at oldest page –Page B arrives –Page A arrives –Access page A –Page D arrives –Page C arrives –Page F arrives –Access page D F u:0 A u:1 D u:0 C u:0
89
Midterm Review.89 CS162 ©UCB Fall 2013 C u:0 E u:0 Max page table size 4 Invariant: point at oldest page –Page B arrives –Page A arrives –Access page A –Page D arrives –Page C arrives –Page F arrives –Access page D –Page E arrives A u:1 A u:0 D u:1 D u:0 Clock Replacement Illustration F u:0
90
Midterm Review.90 CS162 ©UCB Fall 2013 Thrashing If a process does not have “enough” pages, the page-fault rate is very high. This leads to: –low CPU utilization –operating system spends most of its time swapping to disk Thrashing a process is busy swapping pages in and out Questions: –How do we detect Thrashing? –What is best response to Thrashing?
91
Midterm Review.91 CS162 ©UCB Fall 2013 Program Memory Access Patterns have temporal and spatial locality –Group of Pages accessed along a given time slice called the “Working Set” –Working Set defines minimum number of pages needed for process to behave well Not enough memory for Working Set Thrashing –Better to swap out process? Locality In A Memory-Reference Pattern
92
Midterm Review.92 CS162 ©UCB Fall 2013 Working-Set Model working-set window fixed number of page references –Example: 10,000 accesses WS i (working set of Process P i ) = total set of pages referenced in the most recent (varies in time) –if too small will not encompass entire locality –if too large will encompass several localities –if = will encompass entire program D = |WS i | total demand frames if D > physical memory Thrashing –Policy: if D > physical memory, then suspend/swap out processes –This can improve overall system behavior by a lot!
93
Midterm Review.93 CS162 ©UCB Fall 2013 Kernel/User Interface
94
Midterm Review.94 CS162 ©UCB Fall 2013 Dual-Mode Operation Can an application modify its own translation maps or PTE bits? –If it could, could get access to all of physical memory –Has to be restricted somehow To assist with protection, hardware provides at least two modes (Dual-Mode Operation): –“Kernel” mode (or “supervisor” or “protected”) –“User” mode (Normal program mode) –Mode set with bits in special control register only accessible in kernel-mode Intel processors actually have four “rings” of protection: –PL (Privilege Level) from 0 – 3 »PL0 has full access, PL3 has least –Typical OS kernels on Intel processors only use PL0 (“kernel”) and PL3 (“user”)
95
Midterm Review.95 CS162 ©UCB Fall 2013 How to get from Kernel User What does the kernel do to create a new user process? –Allocate and initialize process control block –Read program off disk and store in memory –Allocate and initialize translation map »Point at code in memory so program can execute »Possibly point at statically initialized data –Run Program: »Set machine registers »Set hardware pointer to translation table »Set processor status word for user mode »Jump to start of program How does kernel switch between processes (we learned about this!) ? –Same saving/restoring of registers as before –Save/restore hardware pointer to translation map
96
Midterm Review.96 CS162 ©UCB Fall 2013 User Kernel (System Call) Can’t let inmate (user) get out of padded cell on own –Would defeat purpose of protection! –So, how does the user program get back into kernel? System call: Voluntary procedure call into kernel –Hardware for controlled User Kernel transition –Can any kernel routine be called? »No! Only specific ones –System call ID encoded into system call instruction »Index forces well-defined interface with kernel I/O: open, close, read, write, lseek Files: delete, mkdir, rmdir, chown Process: fork, exit, join Network: socket create, select
97
Midterm Review.97 CS162 ©UCB Fall 2013 User Kernel (Exceptions: Traps and Interrupts) System call instr. causes a synchronous exception (or “trap”) –In fact, often called a software “trap” instruction Other sources of Synchronous Exceptions: –Divide by zero, Illegal instruction, Bus error (bad address, e.g. unaligned access) –Segmentation Fault (address out of range) –Page Fault Interrupts are Asynchronous Exceptions –Examples: timer, disk ready, network, etc…. –Interrupts can be disabled, traps cannot! SUMMARY – On system call, exception, or interrupt: –Hardware enters kernel mode with interrupts disabled –Saves PC, then jumps to appropriate handler in kernel –For some processors (x86), processor also saves registers, changes stack, etc.
98
Midterm Review.98 CS162 ©UCB Fall 2013 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.