Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual Memory January 27th, 2010 John Kubiatowicz Electrical.

Similar presentations


Presentation on theme: "EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual Memory January 27th, 2010 John Kubiatowicz Electrical."— Presentation transcript:

1 EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual Memory January 27th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley CS252 S05

2 Review: Control and Pipelining
Control VIA State Machines and Microprogramming Just overlap tasks; easy if tasks are independent Speed Up  Pipeline Depth; if ideal CPI is 1, then: Hazards limit performance on computers: Structural: need more HW resources Data (RAW,WAR,WAW): need forwarding, compiler scheduling Control: delayed branch, prediction Exceptions, Interrupts add complexity 1/27/2010 CS252-S10, Lecture 03

3 Memory Hierarchy Review
1/27/2010 CS252-S10, Lecture 03

4 Since 1980, CPU has outpaced DRAM ...
Performance (1/latency) CPU 60% per yr 2X in 1.5 yrs 1000 CPU Gap grew 50% per year 100 DRAM 9% per yr 2X in 10 yrs 10 DRAM 1980 1990 2000 Year How do architects address this gap? Put small, fast “cache” memories between CPU and DRAM. Create a “memory hierarchy” 1/27/2010 CS252-S10, Lecture 03

5 1977: DRAM faster than microprocessors
Apple ][ (1977) Steve Wozniak Steve Jobs CPU: 1000 ns DRAM: 400 ns 1/27/2010 CS252-S10, Lecture 03

6 Memory Hierarchy Take advantage of the principle of locality to:
Present as much memory as in the cheapest technology Provide access at speed offered by the fastest technology Processor Tertiary Storage (Tape/ Cloud Storage) Control Secondary Storage (Disk/ FLASH/ PCM) Main Memory (DRAM/ FLASH/ PCM) Second Level Cache (SRAM) On-Chip Cache The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. (We will go over this slide in details in the next lecture on caches). +1 = 16 min. (X:56) Datapath Registers Speed (ns): 1s 10s-100s 100s 10,000,000s (10s ms) 10,000,000,000s (10s sec) Size (bytes): 100s Ks-Ms Ms Gs Ts 1/27/2010 CS252-S10, Lecture 03 CS252 S05

7 The Principle of Locality
Program access a relatively small portion of the address space at any instant of time. Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 15 years, HW relied on locality for speed The principle of locality states that programs access a relatively small portion of the address space at any instant of time. This is kind of like in real life, we all have a lot of friends. But at any given time most of us can only keep in touch with a small group of them. There are two different types of locality: Temporal and Spatial. Temporal locality is the locality in time which says if an item is referenced, it will tend to be referenced again soon. This is like saying if you just talk to one of your friends, it is likely that you will talk to him or her again soon. This makes sense. For example, if you just have lunch with a friend, you may say, let’s go to the ball game this Sunday. So you will talk to him again soon. Spatial locality is the locality in space. It says if an item is referenced, items whose addresses are close by tend to be referenced soon. Once again, using our analogy. We can usually divide our friends into groups. Like friends from high school, friends from work, friends from home. Let’s say you just talk to one of your friends from high school and she may say something like: “So did you hear so and so just won the lottery.” You probably will say NO, I better give him a call and find out more. So this is an example of spatial locality. You just talked to a friend from your high school days. As a result, you end up talking to another high school friend. Or at least in this case, you hope he still remember you are his friend. +3 = 10 min. (X:50) 1/27/2010 CS252-S10, Lecture 03 CS252 S05

8 Programs with locality cache well ...
Bad locality behavior Temporal Locality Memory Address (one dot per access) Spatial Locality Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): (1971) Time 1/27/2010 CS252-S10, Lecture 03

9 Memory Hierarchy: Apple iMac G5
1.6 GHz Managed by compiler Managed by hardware Managed by OS, hardware, application 07 Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency Cycles, Time 1, 0.6 ns 3, 1.9 ns 11, 6.9 ns 88, 55 ns 107, 12 ms Goal: Illusion of large, fast, cheap memory Let programs address a memory space that scales to the disk size, at a speed that is usually as fast as register access 1/27/2010 CS252-S10, Lecture 03

10 iMac’s PowerPC 970: All caches on-chip
L1 (64K Instruction) Registers 512K L2 (1K) 1/27/2010 L1 (32K Data) CS252-S10, Lecture 03

11 Memory Hierarchy: Terminology
Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty (500 instructions on 21264!) A HIT is when the data the processor wants to access is found in the upper level (Blk X). The fraction of the memory access that are HIT is defined as HIT rate. HIT Time is the time to access the Upper Level where the data is found (X). It consists of: (a) Time to access this level. (b) AND the time to determine if this is a Hit or Miss. If the data the processor wants cannot be found in the Upper level. Then we have a miss and we need to retrieve the data (Blk Y) from the lower level. By definition (definition of Hit: Fraction), the miss rate is just 1 minus the hit rate. This miss penalty also consists of two parts: (a) The time it takes to replace a block (Blk Y to BlkX) in the upper level. (b) And then the time it takes to deliver this new block to the processor. It is very important that your Hit Time to be much much smaller than your miss penalty. Otherwise, there will be no reason to build a memory hierarchy. +2 = 14 min. (X:54) Lower Level Memory Upper Level To Processor From Processor Blk X Blk Y 1/27/2010 CS252-S10, Lecture 03 CS252 S05

12 4 Questions for Memory Hierarchy
Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy) 1/27/2010 CS252-S10, Lecture 03

13 Q1: Where can a block be placed in the upper level?
Block 12 placed in 8 block cache: Fully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number Sets Direct Mapped (12 mod 8) = 4 2-Way Assoc (12 mod 4) = 0 Full Mapped Cache Memory 1/27/2010 CS252-S10, Lecture 03

14 Sources of Cache Misses
Compulsory (cold start or process migration, first reference): first access to a block “Cold” fact of life: not a whole lot you can do about it Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant Capacity: Cache cannot contain all blocks access by the program Solution: increase cache size Conflict (collision): Multiple memory locations mapped to the same cache location Solution 1: increase cache size Solution 2: increase associativity Coherence (Invalidation): other process (e.g., I/O) updates memory (Capacity miss) That is the cache misses are due to the fact that the cache is simply not large enough to contain all the blocks that are accessed by the program. The solution to reduce the Capacity miss rate is simple: increase the cache size. Here is a summary of other types of cache miss we talked about. First is the Compulsory misses. These are the misses that we cannot avoid. They are caused when we first start the program. Then we talked about the conflict misses. They are the misses that caused by multiple memory locations being mapped to the same cache location. There are two solutions to reduce conflict misses. The first one is, once again, increase the cache size. The second one is to increase the associativity. For example, say using a 2-way set associative cache instead of directed mapped cache. But keep in mind that cache miss rate is only one part of the equation. You also have to worry about cache access time and miss penalty. Do NOT optimize miss rate alone. Finally, there is another source of cache miss we will not cover today. Those are referred to as invalidation misses caused by another process, such as IO , update the main memory so you have to flush the cache to avoid inconsistency between memory and cache. +2 = 43 min. (Y:23) 1/27/2010 CS252-S10, Lecture 03 CS252 S05

15 Q2: How is a block found if it is in the upper level?
offset Block Address Tag Index Set Select Data Select Index Used to Lookup Candidates in Cache Index identifies the set Tag used to identify actual copy If no candidates match, then declare cache miss Block is minimum quantum of caching Data select field used to select data within block Many caching applications don’t have data select field 1/27/2010 CS252-S10, Lecture 03

16 Block Size and Spatial Locality
Block is unit of transfer between the cache and memory 4 word block, b=2 Tag Word0 Word1 Word2 Word3 block address offsetb Split CPU address 32-b bits b bits 2b = block size a.k.a line size (in bytes) Larger block size has distinct hardware advantages less tag overhead exploit fast burst transfers from DRAM exploit fast burst transfers over wide busses What are the disadvantages of increasing block size? Larger block size will reduce compulsory misses (first miss to a block). Larger blocks may increase conflict misses since the number of blocks is smaller. Fewer blocks => more conflicts. Can waste bandwidth. 1/27/2010 CS252-S10, Lecture 03 CS252 S05

17 Review: Direct Mapped Cache
Direct Mapped 2N byte cache: The uppermost (32 - N) bits are always the Cache Tag The lowest M bits are the Byte Select (Block Size = 2M) Example: 1 KB Direct Mapped Cache with 32 B Blocks Index chooses potential block Tag checked to verify block Byte select chooses byte within block Cache Index 4 31 Cache Tag Byte Select 9 Ex: 0x50 Ex: 0x01 Ex: 0x00 : 0x50 Valid Bit Cache Tag Byte 32 1 2 3 : Cache Data Byte 0 Byte 1 Byte 31 Byte 33 Byte 63 Byte 992 Byte 1023 31 Let’s use a specific example with realistic numbers: assume we have a 1 KB direct mapped cache with block size equals to 32 bytes. In other words, each block associated with the cache tag will have 32 bytes in it (Row 1). With Block Size equals to 32 bytes, the 5 least significant bits of the address will be used as byte select within the cache block. Since the cache size is 1K byte, the upper 32 minus 10 bits, or 22 bits of the address will be stored as cache tag. The rest of the address bits in the middle, that is bit 5 through 9, will be used as Cache Index to select the proper cache entry. +2 = 30 min. (Y:10) 1/27/2010 CS252-S10, Lecture 03 CS252 S05

18 Review: Set Associative Cache
N-way set associative: N entries per Cache Index N direct mapped caches operates in parallel Example: Two-way set associative cache Cache Index selects a “set” from the cache Two tags in the set are compared to input in parallel Data is selected based on the tag result Cache Index 4 31 Cache Tag Byte Select 8 Cache Data Cache Block 0 Cache Tag Valid : Cache Data Cache Block 0 Cache Tag Valid : This is called a 2-way set associative cache because there are two cache entries for each cache index. Essentially, you have two direct mapped cache works in parallel. This is how it works: the cache index selects a set from the cache. The two tags in the set are compared in parallel with the upper bits of the memory address. If neither tag matches the incoming address tag, we have a cache miss. Otherwise, we have a cache hit and we will select the data on the side where the tag matches occur. This is simple enough. What is its disadvantages? +1 = 36 min. (Y:16) Compare Mux 1 Sel1 Sel0 OR Hit 1/27/2010 CS252-S10, Lecture 03 Cache Block CS252 S05

19 Review: Fully Associative Cache
Fully Associative: Every block can hold any line Address does not include a cache index Compare Cache Tags of all Cache Entries in Parallel Example: Block Size=32B blocks We need N 27-bit comparators Still have byte select to choose from within block 4 Cache Tag (27 bits long) Byte Select 31 = Ex: 0x01 Valid Bit : Cache Tag : Cache Data Byte 0 Byte 1 Byte 31 Byte 32 Byte 33 Byte 63 While the direct mapped cache is on the simple end of the cache design spectrum, the fully associative cache is on the most complex end. It is the N-way set associative cache carried to the extreme where N in this case is set to the number of cache entries in the cache. In other words, we don’t even bother to use any address bits as the cache index. We just store all the upper bits of the address (except Byte select) that is associated with the cache block as the cache tag and have one comparator for every entry. The address is sent to all entries at once and compared in parallel and only the one that matches are sent to the output. This is called an associative lookup. Needless to say, it is very hardware intensive. Usually, fully associative cache is limited to 64 or less entries. Since we are not doing any mapping with the cache index, we will never push any other item out of the cache because multiple memory locations map to the same cache location. Therefore, by definition, conflict miss is zero for a fully associative cache. This, however, does not mean the overall miss rate will be zero. Assume we have 64 entries here. The first 64 items we accessed can fit in. But when we try to bring in the 65th item, we will need to throw one of them out to make room for the new item. This bring us to the third type of cache misses: Capacity Miss. +3 = 41 min. (Y:21) 1/27/2010 CS252-S10, Lecture 03 CS252 S05

20 Q3: Which block should be replaced on a miss?
Easy for Direct Mapped Set Associative or Fully Associative: LRU (Least Recently Used): Appealing, but hard to implement for high associativity Random: Easy, but – how well does it work? Assoc: 2-way 4-way 8-way Size LRU Ran 16K 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64K 1.9% 2.0% 1.5% 1.7% 1.4% 256K 1.15% 1.17% 1.13% 1.12% 1/27/2010 CS252-S10, Lecture 03

21 Q4: What happens on a write?
Write-Through Write-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache Debug Easy Hard Do read misses produce writes? No Yes Do repeated writes make it to lower level? Additional option -- let writes to an un-cached address allocate a new cache line (“write-allocate”). 1/27/2010 CS252-S10, Lecture 03

22 Write Buffers for Write-Through Caches
Processor Cache Write Buffer Lower Level Memory Holds data awaiting write-through to lower level memory Q. Why a write buffer ? A. So CPU doesn’t stall Q. Why a buffer, why not just one register ? A. Bursts of writes are common. Q. Are Read After Write (RAW) hazards an issue for write buffer? A. Yes! Drain buffer before next read, or check write buffers for match on reads 1/27/2010 CS252-S10, Lecture 03

23 5 Basic Cache Optimizations
Reducing Miss Rate Larger Block size (compulsory misses) Larger Cache size (capacity misses) Higher Associativity (conflict misses) Reducing Miss Penalty Multilevel Caches Reducing hit time Giving Reads Priority over Writes E.g., Read complete before earlier writes in write buffer 1/27/2010 CS252-S10, Lecture 03

24 Administrivia Paper readings: important for your graduate career
Remember: everything on web site: WebSite signup Make sure to signup for the class if you haven’t yet Don’t forget the ISCA retrospective 1/27/2010 CS252-S10, Lecture 03

25 RISC: The integrated systems view (Discussion of Papers)
“The Case for the Reduced Instruction Set Computer” Dave Patterson and David Ditzel “Comments on ‘The Case for the Reduced Instruction Set Computer’” Doug Clark and William Strecker “"Retrospective on High-Level Computer Architecture" David Ditzel and David Patterson In-class discussion of these papers 1/27/2010 CS252-S10, Lecture 03

26 What is virtual memory? Physical Address Space Virtual Virtual Address Page Table index into page table Base Reg V Access Rights PA V page no. offset 10 table located in physical memory P page no. Physical Address Virtual memory => treat memory as a cache for the disk Terminology: blocks in this cache are called “Pages” Typical size of a page: 1K — 8K Page table maps virtual page numbers to physical frames “PTE” = Page Table Entry 1/27/2010 CS252-S10, Lecture 03

27 What is in a Page Table Entry (PTE)?
What is in a Page Table Entry (or PTE)? Pointer to next-level page table or to actual page Permission bits: valid, read-only, read-write, write-only Example: Intel x86 architecture PTE: Address same format previous slide (10, 10, 12-bit offset) Intermediate page tables called “Directories” P: Present (same as “valid” bit in other architectures) W: Writeable U: User accessible PWT: Page write transparent: external cache write-through PCD: Page cache disabled (page cannot be cached) A: Accessed: page has been accessed recently D: Dirty (PTE only): page has been modified recently L: L=14MB page (directory only). Bottom 22 bits of virtual address serve as offset Page Frame Number (Physical Page Number) Free (OS) L D A PCD PWT U W P 1 2 3 4 5 6 7 8 11-9 31-12 1/27/2010 CS252-S10, Lecture 03 CS252 S05

28 Three Advantages of Virtual Memory
Translation: Program can be given consistent view of memory, even though physical memory is scrambled Makes multithreading reasonable (now used a lot!) Only the most important part of program (“Working Set”) must be in physical memory. Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later. Protection: Different threads (or processes) protected from each other. Different pages can be given special behavior (Read Only, Invisible to user programs, etc). Kernel data protected from User programs Very important for protection from malicious programs Sharing: Can map same physical page to multiple users (“Shared memory”) 1/27/2010 CS252-S10, Lecture 03

29 Large Address Space Support
Physical Address: Offset Page # 4KB 10 bits 12 bits Virtual Address: Offset P2 index P1 index 4 bytes 4 bytes PageTablePtr Single-Level Page Table Large 4KB pages for a 32-bit address  1M entries Each process needs own page table! Multi-Level Page Table Can allow sparseness of page table Portions of table can be swapped to disk 1/27/2010 CS252-S10, Lecture 03 CS252 S05

30 Translation Look-Aside Buffers
Translation Look-Aside Buffers (TLB) Cache on translations Fully Associative, Set Associative, or Direct Mapped TLBs are: Small – typically not more than 128 – 256 entries Fully Associative hit VA PA miss TLB Cache Main Memory CPU Translation with a TLB miss hit Trans- lation data 1/27/2010 CS252-S10, Lecture 03

31 Caching Applied to Address Translation
TLB Virtual Address Physical Memory CPU Physical Address Cached? Yes Save Result No Data Read or Write (untranslated) Translate (MMU) Question is one of page locality: does it exist? Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference Data accesses have less page locality, but still some… Can we have a TLB hierarchy? Sure: multiple levels at different sizes/speeds 1/27/2010 CS252-S10, Lecture 03 CS252 S05

32 What Actually Happens on a TLB Miss?
Hardware traversed page tables: On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) If PTE valid, hardware fills TLB and processor never knows If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards Software traversed Page tables (like MIPS) On TLB miss, processor receives TLB fault Kernel traverses page table to find PTE If PTE valid, fills TLB and returns from fault If PTE marked as invalid, internally calls Page Fault handler Most chip sets provide hardware traversal Modern operating systems tend to have more TLB faults since they use translation for many things Examples: shared segments user-level portions of an operating system 1/27/2010 CS252-S10, Lecture 03

33 Clock Algorithm: Not Recently Used
Set of all pages in Memory Single Clock Hand: Advances only on page fault! Check for pages not used recently Mark pages as not used recently ... Page Table used dirty Clock Algorithm: Approximate LRU (approx to approx to MIN) Replace an old page, not the oldest page Details: Hardware “use” bit per physical page: Hardware sets use bit on each reference If use bit isn’t set, means not referenced in a long time On page fault: Advance clock hand (not real time) Check use bit: 1used recently; clear and leave alone 0selected candidate for replacement 1/27/2010 CS252-S10, Lecture 03 CS252 S05

34 Example: R3000 pipeline MIPS R3000 Pipeline Inst Fetch Dcd/ Reg
ALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A. TLB D-Cache TLB 64 entry, on-chip, fully associative, software TLB fault handler Virtual Address Space ASID V. Page Number Offset 6 12 20 0xx User segment (caching based on PT/TLB entry) 100 Kernel physical space, cached 101 Kernel physical space, uncached 11x Kernel virtual space Allows context switching among 64 user processes without TLB flush 1/27/2010 CS252-S10, Lecture 03

35 Reducing translation time further
As described, TLB lookup is in serial with cache lookup: Machines with TLBs go one step further: they overlap TLB lookup with cache access. Works because offset available early Virtual Address TLB Lookup V Access Rights PA V page no. offset 10 P page no. Physical Address 1/27/2010 CS252-S10, Lecture 03

36 Overlapping TLB & Cache Access
Here is how this might work with a 4K cache: What if cache size is increased to 8KB? Overlap not complete Need to do something else. See CS152/252 Another option: Virtual Caches Tags in cache are virtual addresses Translation only happens on cache misses TLB 4K Cache 10 2 00 4 bytes index 1 K page # disp 20 assoc lookup 32 Hit/ Miss FN Data = 1/27/2010 CS252-S10, Lecture 03

37 Problems With Overlapped TLB Access
Overlapped access requires address bits used to index into cache do not change as result translation This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache Example: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K: 11 2 cache index 00 This bit is changed by VA translation, but is needed for cache lookup 20 12 virt page # disp Solutions: go to 8K byte page sizes; go to 2 way set associative cache; or SW guarantee VA[13]=PA[13] 1K 2 way set assoc cache 10 4 4 1/27/2010 CS252-S10, Lecture 03

38 Summary #1/3: The Cache Design Space
Several interacting dimensions cache size block size associativity replacement policy write-through vs write-back write allocation The optimal choice is a compromise depends on access characteristics workload use (I-cache, D-cache, TLB) depends on technology / cost Simplicity often wins Cache Size Associativity Block Size Bad No fancy replacement policy is needed for the direct mapped cache. As a matter of fact, that is what cause direct mapped trouble to begin with: only one place to go in the cache--causes conflict misses. Besides working at Sun, I also teach people how to fly whenever I have time. Statistic have shown that if a pilot crashed after an engine failure, he or she is more likely to get killed in a multi-engine light airplane than a single engine airplane. The joke among us flight instructors is that: sure, when the engine quit in a single engine stops, you have one option: sooner or later, you land. Probably sooner. But in a multi-engine airplane with one engine stops, you have a lot of options. It is the need to make a decision that kills those people. Good Factor A Factor B Less More 1/27/2010 CS252-S10, Lecture 03 CS252 S05

39 Summary #2/3: Caches The Principle of Locality:
Program access a relatively small portion of the address space at any instant of time. Temporal Locality: Locality in Time Spatial Locality: Locality in Space Three Major Categories of Cache Misses: Compulsory Misses: sad facts of life. Example: cold start misses. Capacity Misses: increase cache size Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! Write Policy: Write Through vs. Write Back Today CPU time is a function of (ops, cache misses) vs. just f(ops): affects Compilers, Data structures, and Algorithms 1/27/2010 CS252-S10, Lecture 03 CS252 S05

40 Summary #3/3: TLB, Virtual Memory
Page tables map virtual address to physical address TLBs are important for fast translation TLB misses are significant in processor performance funny times, as most systems can’t access all of 2nd level cache without TLB misses! Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: 1) Where can block be placed? 2) How is block found? 3) What block is replaced on miss? 4) How are writes handled? Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is more important than memory hierarchy benefits, but computers insecure Prepare for debate + quiz on Wednesday Let’s do a short review of what you learned last time. Virtual memory was originally invented as another level of memory hierarchy such that programers, faced with main memory much smaller than their programs, do not have to manage the loading and unloading portions of their program in and out of memory. It was a controversial proposal at that time because very few programers believed software can manage the limited amount of memory resource as well as human. This all changed as DRAM size grows exponentially in the last few decades. Nowadays, the main function of virtual memory is to allow multiple processes to share the same main memory so we don’t have to swap all the non-active processes to disk. Consequently, the most important function of virtual memory these days is to provide memory protection. The most common technique, but we like to emphasis not the only technique, to translate virtual memory address to physical memory address is to use a page table. TLB, or translation lookaside buffer, is one of the most popular hardware techniques to reduce address translation time. Since TLB is so effective in reducing the address translation time, what this means is that TLB misses will have a significant negative impact on processor performance. +3 = 3 min. (X:43) 1/27/2010 CS252-S10, Lecture 03 CS252 S05


Download ppt "EECS 252 Graduate Computer Architecture Lecture 3 0 (continued) Review of Caches and Virtual Memory January 27th, 2010 John Kubiatowicz Electrical."

Similar presentations


Ads by Google