CS 162 Discussion Section Week 5 10/7 – 10/11
Today’s Section ●Project discussion (5 min) ●Quiz (5 min) ●Lecture Review (20 min) ●Worksheet and Discussion (20 min)
Project 1 ●Autograder is still up submit proj1-test ●Due 10/8 (Today!) at 11:59 PM submit proj1-code ●Due 10/9 (Tomorrow!) at 11:59 PM Final design doc & Project 1 Group Evals Template posted on Piazza last week Will post Group Evals Link on Piazza Wed Afternoon ●Questions?
Quiz…
Short Answer [Use this one] 1.Name the 4 types of cache misses discussed in class, given their causes: [0.5 each] a)Program initialization, etc. (nothing you can do about them) [Compulsory Misses] b)Two addresses map to the same cache line [Conflict Misses] c)The cache size is too small [Capacity Misses] d)External processor or I/O interference [Coherence Misses] [Choose 1] 2.Which is better when a small number of items are modified frequently: write- back caching or write-through caching? [Write-back] 3.Name one of the two types of locality discussed in lecture that can benefit from some type of caching. [Temporal or Spatial] True/False [Choose 2] 3.Memory is typically allocated in finer grained units with segmentation than with paging. [False] 4.TLB lookups can be performed in parallel with data cache lookups [True] 5.The size of an inverted page table is proportional to the number of pages in virtual memory [False] 6.Conflict misses are possible in a 3-way-set-associative cache [True]
Lecture Review
9.7 10/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Virtualizing Resources Physical Reality: Processes/Threads share the same hardware –Need to multiplex CPU (CPU Scheduling) –Need to multiplex use of Memory (Today) Why worry about memory multiplexing? –The complete working state of a process and/or kernel is defined by its data in memory (and registers) –Consequently, cannot just let different processes use the same memory –Probably don’t want different processes to even have access to each other’s memory (protection)
9.8 10/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Important Aspects of Memory Multiplexing Controlled overlap: –Processes should not collide in physical memory –Conversely, would like the ability to share memory when desired (for communication) Protection: –Prevent access to private memory of other processes »Different pages of memory can be given special behavior (Read Only, Invisible to user programs, etc) »Kernel data protected from User programs Translation: –Ability to translate accesses from one address space (virtual) to a different one (physical) –When translation exists, process uses virtual addresses, physical memory uses physical addresses
9.9 10/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Two Views of Memory Address Space: –All the addresses and state a process can touch –Each process and kernel has different address space Consequently, two views of memory: –View from the CPU (what program sees, virtual memory) –View from memory (physical memory) –Translation box (MMU) converts between the two views Translation helps to implement protection –If task A cannot even gain access to task B’s data, no way for A to adversely affect B With translation, every program can be linked/loaded into same region of user address space Physical Addresses CPU MMU Virtual Addresses Untranslated read or write
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Address Segmentation 1111 heap code data Virtual memory view Physical memory view data heap (0x40) (0x80) (0xC0) seg #offset code (0x10) (0x50) (0x70) (0xE0) Seg #baselimit stack (0xF0) stack
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Address Segmentation 1111 stack heap code data Virtual memory view Physical memory view data heap stack seg #offset code Seg #baselimit What happens if stack grows to ?
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Address Segmentation 1111 stack heap code data Virtual memory view Physical memory view data heap stack seg #offset code Seg #baselimit No room to grow!! Buffer overflow error or resize segment and move segments around to make room
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Paging 1111 stack heap code data Virtual memory view page #offset Physical memory view data code heap stack null null null null null null null null null null null null null null null null null null null Page Table
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Paging 1111 stack heap code data Virtual memory view page #offset Physical memory view data code heap stack null null null null null null null null null null null null null null null null null null null Page Table What happens if stack grows to ?
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 stack Paging 1111 stack heap code data Virtual memory view page #offset Physical memory view data code heap stack null null null null null null null null null null null null null null null null null Page Table Allocate new pages where room! Challenge: Table size equal to # of pages in virtual memory!
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 stack Two-Level Paging 1111 stack heap code data Virtual memory view page1 #offset Physical memory view data code heap stack page2 # null 101 null null null null Page Tables (level 2) Page Table (level 1)
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 stack Two-Level Paging stack heap code data Virtual memory view (0x90) Physical memory view data code heap stack (0x80) null 101 null null null null Page Tables (level 2) Page Table (level 1) In best case, total size of page tables ≈ number of pages used by program virtual memory. Requires one additional memory access!
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Inverted Table 1111 stack heap code data Virtual memory view page #offset Inverted Table hash(virt. page #) = phys. page # h(11111) = h(11110) = h(11101) = h(11100) = h(10010)= h(10001)= h(10000)= h(01011)= h(01010)= h(01001)= h(01000)= h(00011)= h(00010)= h(00001)= h(00000)= stack Physical memory view data code heap stack Total size of page table ≈ number of pages used by program in physical memory. Hash more complex
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Address Translation Comparison AdvantagesDisadvantages SegmentationFast context switching: Segment mapping maintained by CPU External fragmentation Paging (single-level page) No external fragmentation, fast easy allocation Large table size ~ virtual memory Paged segmentation Table size ~ # of pages in virtual memory, fast easy allocation Multiple memory references per page access Two-level pages Inverted TableTable size ~ # of pages in physical memory Hash function more complex
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Caching Concept Cache: a repository for copies that can be accessed more quickly than the original –Make frequent case fast and infrequent case less dominant Caching at different levels –Can cache: memory locations, address translations, pages, file blocks, file names, network routes, etc… Only good if: –Frequent case frequent enough and –Infrequent case not too expensive Important measure: Average Access time = (Hit Rate x Hit Time) + (Miss Rate x Miss Time)
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Why Does Caching Help? Locality! Temporal Locality (Locality in Time): –Keep recently accessed data items closer to processor Spatial Locality (Locality in Space): –Move contiguous blocks to the upper levels Address Space 02 n - 1 Probability of reference Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Compulsory (cold start): first reference to a block –“Cold” fact of life: not a whole lot you can do about it –Note: When running “billions” of instruction, Compulsory Misses are insignificant Capacity: –Cache cannot contain all blocks access by the program –Solution: increase cache size Conflict (collision): –Multiple memory locations mapped to same cache location –Solutions: increase cache size, or increase associativity Two others: –Coherence (Invalidation): other process (e.g., I/O) updates memory –Policy: Due to non-optimal replacement policy Sources of Cache Misses
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Example: Block 12 placed in 8 block cache Block no. Direct mapped: block 12 (01100) can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set Block no. Set 0 Set 1 Set 2 Set 3 Fully associative: block 12 can go anywhere Block no Block Address Space: Block no. Where does a Block Get Placed in a Cache? tagindex tagindex tag
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Easy for Direct Mapped: Only one possibility Set Associative or Fully Associative: –Random –LRU (Least Recently Used) 2-way 4-way 8-way SizeLRU Random LRU Random LRU Random 16 KB5.2%5.7% 4.7%5.3%4.4% 5.0% 64 KB1.9%2.0% 1.5%1.7%1.4% 1.5% 256 KB1.15%1.17% 1.13% 1.13%1.12% 1.12% Which block should be replaced on a miss?
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Write through: The information is written both to the block in the cache and to the block in the lower-level memory Write back: The information is written only to the block in the cache. –Modified cache block is written to main memory only when it is replaced –Question is block clean or dirty? Pros and Cons of each? –WT: »PRO: read misses cannot result in writes »CON: processor held up on writes unless writes buffered –WB: »PRO: repeated writes not sent to DRAM processor not held up on writes »CON: More complex Read miss may require writeback of dirty data What happens on a write?
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Caching Applied to Address Translation Question is one of page locality: does it exist? –Instruction accesses spend a lot of time on the same page (since accesses sequential) –Stack accesses have definite locality of reference –Data accesses have less page locality, but still some… Can we have a TLB hierarchy? –Sure: multiple levels at different sizes/speeds Data Read or Write (untranslated) CPU Physical Memory TLB Translate (MMU) No Virtual Address Physical Address Yes Cached? Save Result
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Overlapping TLB & Cache Access (1/2) Main idea: –Offset in virtual address exactly covers the “cache index” and “byte select” –Thus can select the cached byte(s) in parallel to perform address translation Offset Virtual Page # index tag / page # byte virtual address physical address
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Putting Everything Together: Address Translation Physical Address: Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index PageTablePtr Page Table (1 st level) Page Table (2 nd level) Physical Memory:
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Page Table (2 nd level) PageTablePtr Page Table (1 st level) Putting Everything Together: TLB Offset Physical Page # Virtual Address: Offset Virtual P2 index Virtual P1 index Physical Memory: Physical Address: … TLB:
/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013 Page Table (2 nd level) PageTablePtr Page Table (1 st level) Virtual Address: Offset Virtual P2 index Virtual P1 index … TLB: Putting Everything Together: Cache Offset Physical Memory: Physical Address: Physical Page # … tag:block: cache: index bytetag
Worksheet …