Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cs 61C L18 Cache2.1 Patterson Spring 99 ©UCB CS61C Improving Cache Memory and Virtual Memory Introduction Lecture 18 April 2, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Similar presentations


Presentation on theme: "Cs 61C L18 Cache2.1 Patterson Spring 99 ©UCB CS61C Improving Cache Memory and Virtual Memory Introduction Lecture 18 April 2, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)"— Presentation transcript:

1 cs 61C L18 Cache2.1 Patterson Spring 99 ©UCB CS61C Improving Cache Memory and Virtual Memory Introduction Lecture 18 April 2, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html

2 cs 61C L18 Cache2.2 Patterson Spring 99 ©UCB Review 1/1 °Tag, index, offset to find matching data, support larger blocks, reduce misses °Where in cache? Direct Mapped Cache Conflict Misses if memory addresses compete °Fully Associative to let memory data be in any block: no Conflict Misses °Set Associative: Compromise, simpler hardware than Fully Associative, fewer misses than Direct Mapped °LRU: Use history to predict replacement

3 cs 61C L18 Cache2.3 Patterson Spring 99 ©UCB Outline °Improving Miss Penalty °Improving Writes °Administrivia, More Survey results °Virtual Memory °Virtual Memory and Cache °Translation Lookaside Buffer °Conclusion

4 cs 61C L18 Cache2.4 Patterson Spring 99 ©UCB Improving Caches °In general, minimize Average Access Time: = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate °So far, have look at improving Miss Rate: Larger block size Larger Cache Higher Associativity °What about Miss Penalty?

5 cs 61C L18 Cache2.5 Patterson Spring 99 ©UCB Improving Miss Penalty °When caches started becoming popular, Miss Penalty was about 10 processor clock cycles °Today 500 MHz Processor (2 nanoseconds per clock cycle) and 200 ns to go to DRAM  100 processor clock cycles! °Solution: Place another cache between memory and the processor cache: Second Level (L2) Cache

6 cs 61C L18 Cache2.6 Patterson Spring 99 ©UCB Analyzing a multi-level cache hierarchy °We consider the L2 hit and miss times to include the cost of not finding the data in the L1 cache °Similarly, the L2 cache hit rate is only for accesses which actually make it to the L2 cache

7 cs 61C L18 Cache2.7 Patterson Spring 99 ©UCB So how do we calculate it out? °Access time = L1 hit time * L1 hit rate + L1 miss penalty * L1 miss rate °We simply calculate the L1 miss penalty as being the access time for the L2 cache °Access time = L1 hit time * L1 hit rate + (L2 hit time * L2 hit rate + L2 miss penalty * L2 miss rate) * L1 miss rate

8 cs 61C L18 Cache2.8 Patterson Spring 99 ©UCB Do the numbers for L2 Cache °Assumptions: L1 hit time = 1 cycle, L1 hit rate = 90% L2 hit time (also L1 miss penalty) = 4 cycles, L2 miss penalty= 100 cycles, L2 hit rate = 90% °Access time = L1 hit time * L1 hit rate + (L2 hit time * L2 hit rate + L2 miss penalty * (1 - L2 hit rate) ) * L1 miss rate = 1*0.9 + (4*0.9 + 100*0.1) *(1-0.9) = 0.9 + (13.6) * 0.1 = 2.26 clock cycles

9 cs 61C L18 Cache2.9 Patterson Spring 99 ©UCB What would it be without the L2 cache? °Assume that the L1 miss penalty would be 100 clock cycles °1 *0.9 + (100)* 0.1 °10.9 clock cycles vs. 2.26 with L2 °So gain a benefit from having the second, larger cache before main memory °Today’s L1 cache size 16 KB-64 KB, L2 cache may be 512 KB to 4096 KB

10 cs 61C L18 Cache2.10 Patterson Spring 99 ©UCB What about Writes to Memory? °Simplest Policy: The information is written to both the block in the cache and to the block in the lower-level memory °Problem: Writes operate at speed of lower level memory!

11 cs 61C L18 Cache2.11 Patterson Spring 99 ©UCB Improving Cache Performance: Write Buffer °A Write Buffer is added between Cache and Memory Processor: writes data into cache & write buffer Controller: write buffer contents to memory °Write buffer is just a First In First Out queue: Typical number of entries: 4 Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Processor Cache DRAM Write Buffer

12 cs 61C L18 Cache2.12 Patterson Spring 99 ©UCB Improving Cache Performance: Write Back °Option 2: data is written only to cache block °Modified cache block is written to main memory only when it is replaced Block is unmodified (clean) or modified (dirty) °This scheme called “Write Back”; original scheme called “Write Through” °Advantage of Write Back Repeated writes to same location stay in cache °Disadvantage of Write Back Any block replacement can  write to memory

13 cs 61C L18 Cache2.13 Patterson Spring 99 ©UCB What to do on a Write Miss? °On a read miss, you must bring the block into the cache so that you can complete the read °Option 1: Just like read; bring whole block into cache and then modify bytes needed; “Write Allocate” Write indicates nearby access in future? °Option 2: Only update lower level memory, nothing in cache; “No Write Allocate” Perhaps just clearing memory, no reuse? °Which best for Write Back vs. Write Thru?

14 cs 61C L18 Cache2.14 Patterson Spring 99 ©UCB Avoiding Write Buffer Saturation °Store frequency > 1 / DRAM write cycle Store buffer will overflow no matter how long you make queue °Solution for write buffer saturation: Use a write back 1st level cache, or Install a 2nd level cache (write back) Processor Cache Write Buffer DRAM L2 Cache

15 cs 61C L18 Cache2.15 Patterson Spring 99 ©UCB Another View of the Memory Hierarchy Regs L2 Cache Memory Disk Tape Instr. Operands Blocks Pages Files Upper Level Lower Level Faster Larger Cache Blocks Thus far { { Next: Virtual Memory

16 cs 61C L18 Cache2.16 Patterson Spring 99 ©UCB Virtual Memory °If Principle of Locality allows caches to offer (usually) speed of cache memory with size of DRAM memory, then recursively why not use at next level to give speed of DRAM memory, size of Disk memory? °Called “Virtual Memory” Also allows OS to share memory, protect programs from each other Today, more important for protection vs. just another level of memory hierarchy Historically, it predates caches

17 cs 61C L18 Cache2.17 Patterson Spring 99 ©UCB Administrivia °Project 5: Due 4/14: design and implement a cache (in software) and plug into instruction simulator °8th homework: Due 4/7 7PM Exercises 7.7, 7.8, 7.20, 7.22, 7.32 °Readings 7.2, 7.3, 7.4

18 cs 61C L18 Cache2.18 Patterson Spring 99 ©UCB Suggestions °Microphone volume? Will try wired mike °Separate Project, HW due dates? Will look at separating project, HW if same week °Why not more credit for projects? Reduce reward for not doing own work °Why not more tests? (Note:point totals = F98) Testing vs. lecture time; time to make good test °Why not more units? Workload matches units by end of semester?

19 cs 61C L18 Cache2.19 Patterson Spring 99 ©UCB Questions °Look at X86 since its so prevalent? How many lectures dedicate to this? Can make much dent in 1-2 lectures? °Why recommended 7PM with 8AM deadline? Human nature? Goal + allowed slip °Why front load course? Many courses have more work in 2nd half; also, too many students trying 61C Spring ‘99 °Why not finish in lab? What non trivial task can everyone finish in 50 minutes? Reduced homeworks, expanded lab

20 cs 61C L18 Cache2.20 Patterson Spring 99 ©UCB Comparing the 2 levels of hierarchy °Cache VersionVirtual Memory vers. °Block or LinePage °MissPage Fault °Block Size: 32-64BPage Size: 4K-8KB °Placement:Fully Associative Direct Mapped, N-way Set Associative °Replacement: Least Recently Used LRU or Random(LRU) °Write Thru or BackWrite Back

21 cs 61C L18 Cache2.21 Patterson Spring 99 ©UCB Protection leads to New Terms °Each program is said to have “virtual address space” (e.g., 2 32 Bytes) °Each computer has “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory” °Address translation: map page from virtual address to physical address Allows multiple programs to use (different pages of physical) memory at same time Can only access what OS permits Also allows some pages of virtual memory to be represented on disk, not in main memory (memory hierarchy)

22 cs 61C L18 Cache2.22 Patterson Spring 99 ©UCB Paging Organization (assume 1 KB pages) Addr Trans MAP Page is unit of mapping Page also unit of transfer from disk to physical memory page 0 1K 0 1024 31744 Virtual Memory Virtual Address page 1 page 31 1K 2048 page 2... page 0 0 1024 7168 Physical Address Physical Memory 1K page 1 page 7...

23 cs 61C L18 Cache2.23 Patterson Spring 99 ©UCB Address Mapping: Page Table Virtual Address: page no.offset Page Table Base Reg Page Table located in physical memory (actually, concatenation is more likely than add) index into page table + Physical Memory Address Access Rights: None, Read Only, Read/Write, Executable Page Table Val -id Access Rights Physical Page Address V A.R. P. P. A. V A.R. P. P. A. V A.R. P. P. A....

24 cs 61C L18 Cache2.24 Patterson Spring 99 ©UCB Page Table °A page table is an operating system structure which contains the mapping of virtual addresses to physical locations There are several different ways, all up to the operating system, to keep this data around °Each process running in the operating system has its own page table

25 cs 61C L18 Cache2.25 Patterson Spring 99 ©UCB Address Map, Mathematically Speaking V = {0, 1,..., n - 1} virtual address space (n > m) M = {0, 1,..., m - 1} physical address space MAP: V --> M U {  } address mapping function MAP(a) = a' if data at virtual address a is present in physical address a' and a' in M =  if data at virtual address a is not present in M Processor Name Space V Addr Trans Mechanism OS fault handler Main Memory Secondary Memory a a a' 0 page fault physical address OS performs this transfer

26 cs 61C L18 Cache2.26 Patterson Spring 99 ©UCB Virtual Address and a Cache Processor Trans- lation Cache Main Memory VA PA miss hit data Extra memory access to translate VA to PA! This makes cache access very expensive, and you want to go as fast as possible Why access cache with physical addr. at all? Virtually Addressed caches have a problem! synonym / alias problem: 2 different virtual addresses map to same physical address  2 different cache entries holding data for the same physical address!

27 cs 61C L18 Cache2.27 Patterson Spring 99 ©UCB How Translate Fast? °Observation: since locality in pages of data, must be locality in virtual addresses of those pages °Why not use a cache of virtual to physical address translations to make translation fast? (small is fast) °For historical reasons, cache is called a Translation Lookaside Buffer, or TLB

28 cs 61C L18 Cache2.28 Patterson Spring 99 ©UCB Typical TLB Format VirtualPhysicalDirtyRef Valid Access Address AddressRights Dirty: since use write back, need to know whether or not to write page to disk when replaced Ref: Used to help calculate LRU on replacement TLB just a cache on the page table mappings TLB access time comparable to cache (much less than main memory access time)

29 cs 61C L18 Cache2.29 Patterson Spring 99 ©UCB TLB Miss (simplified format) °If the address is not in the TLB, MIPS traps to the operating system When in the operating system, we don't do translation °The operating system knows which program caused the TLB fault, page fault, and knows what the virtual address desired was requested So we look the data up in the page table 291 valid virtual physical

30 cs 61C L18 Cache2.30 Patterson Spring 99 ©UCB If the data is in memory °We simply add the entry to the TLB, evicting an old entry from the TLB 7321 291 valid virtual physical

31 cs 61C L18 Cache2.31 Patterson Spring 99 ©UCB What if the data is on disk? °We load the page off the disk into a free block of memory, using a DMA transfer Meantime we switch to some other process waiting to be run °When the DMA is complete, we get an interrupt and update the process's page table So when we switch back to the task, the desired data will be in memory

32 cs 61C L18 Cache2.32 Patterson Spring 99 ©UCB What if we don't have enough memory? °We chose some other page belonging to a program and transfer it onto the disk if it is dirty If clean (other copy up-to-date), just overwrite that data We chose the page to evict based on replacement policy (e.g., LRU) °And update that program's page table to reflect the fact that its memory moved somewhere else

33 cs 61C L18 Cache2.33 Patterson Spring 99 ©UCB Translation Look-Aside Buffers TLBs usually small, typically 128 - 256 entries Like any other cache, the TLB can be fully associative, set associative, or direct mapped Processor TLB Lookup Cache Main Memory VA PA miss hit data Trans- lation hit miss

34 cs 61C L18 Cache2.34 Patterson Spring 99 ©UCB “And in Conclusion” 1/1 °Apply Principle of Locality Recursively °Reduce Miss Penalty? add a (L2) cache °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings vs. tag/data in cache °Virtual memory to Physical Memory Translation too slow? Add a cache of Virtual to Physical Address Translations, called a TLB °Next: WrapUp Memory Hierarchy


Download ppt "Cs 61C L18 Cache2.1 Patterson Spring 99 ©UCB CS61C Improving Cache Memory and Virtual Memory Introduction Lecture 18 April 2, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)"

Similar presentations


Ads by Google