ECE 445 – Computer Organization The Memory Hierarchy (Lectures #23) The slides included herein were taken from the materials accompanying Computer Organization and Design, 4th Edition, by Patterson and Hennessey, and were used with permission from Morgan Kaufmann Publishers.
ECE 445 - Computer Organization Material to be covered ... Chapter 5: Sections 1 – 5, 11 – 12 Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Associative Caches Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Associative Caches Fully associative Allow a given block to go in any cache entry Requires all entries to be searched at once Comparator per entry (expensive) n-way set associative Each set contains n entries Block number determines which set (Block number) modulo (#Sets in cache) Search all entries in a given set at once n comparators (less expensive) Fall 2010 ECE 445 - Computer Organization
Associative Cache Example Memory address being accessed is in Block #12 in Main memory. Fall 2010 ECE 445 - Computer Organization
Spectrum of Associativity For a cache with 8 entries Fall 2010 ECE 445 - Computer Organization
Associativity Example Compare 4-block caches Direct mapped (aka. 1-way set associative) 2-way set associative Fully associative Block access sequence: 0, 8, 0, 6, 8 Fall 2010 ECE 445 - Computer Organization
Associativity Example Direct mapped # of Cache Blocks = 4 Block address Cache index Hit/miss Cache content after access 1 2 3 miss Mem[0] 8 Mem[8] 6 Mem[6] Cache index = (Block Address) modulo (# of Cache Blocks) Fall 2010 ECE 445 - Computer Organization
Associativity Example 2-way set associative # of Cache Sets = 2 # of Entries per Set = 2 Block address Cache index Hit/miss Cache content after access Set 0 Set 1 miss Mem[0] 8 Mem[8] hit 6 Mem[6] Cache index = (Block Address) modulo (# of Sets in Cache) Fall 2010 ECE 445 - Computer Organization
Associativity Example Fully associative # of Cache Sets = 1 # of Entries per Set = 4 Block address Hit/miss Cache content after access miss Mem[0] 8 Mem[8] hit 6 Mem[6] Any memory address can be located in any entry of the cache. Fall 2010 ECE 445 - Computer Organization
How Much Associativity Increased associativity decreases miss rate But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks; using SPEC2000 benchmark 1-way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1% Fall 2010 ECE 445 - Computer Organization
Set Associative Cache Organization 4-way Set Associative Cache Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Replacement Policy Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Replacement Policy Direct mapped: no choice Each block in main memory is mapped to exactly one location in the cache. Set associative Prefer non-valid entry, if there is one Otherwise, choose among entries in the set Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that. Random Gives approximately the same performance as LRU for high associativity. Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Multi-level Caches Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Multilevel Caches Primary (L1) cache attached to CPU Small, but fast Level-2 (L2) cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L2 cache misses Some high-end systems include L3 cache Main Memory CPU L2 L1 Fall 2010 ECE 445 - Computer Organization
Multilevel Cache Example Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = 100ns/0.25ns = 400 cycles Effective CPI = 1 + 0.02 × 400 = 9 Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Example (cont.) Now add L2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L2 miss Extra penalty = (0.5%) * (100ns/0.25ns) CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4 Performance ratio = 9/3.4 = 2.6 Fall 2010 ECE 445 - Computer Organization
Multilevel Cache Considerations Primary cache Focus on minimal hit time L2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results L1 cache usually smaller than a single cache L1 block size smaller than L-2 block size Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Virtual Memory Fall 2010 ECE 445 - Computer Organization
The term virtual memory as defined by Merriam-Webster: “a section of a hard drive that can be used as if it were an extension of a computer's random-access memory” Fall 2010 ECE 445 - Computer Organization
The term virtual memory as defined by Wikipedia: “Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage.” Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Courtesy of Ehamberg (Wikipedia: Virtual Memory) Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main memory Each gets a private virtual address space holding its frequently used code and data Protected from other programs CPU and OS translate virtual addresses to physical addresses VM “block” is called a page VM translation “miss” is called a page fault Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Address Translation Fixed-size pages (e.g., 4K) Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Page Fault Penalty On page fault, the page must be fetched from disk Takes millions of clock cycles Handled by OS code Try to minimize page fault rate Fully associative placement Smart replacement algorithms Fall 2010 ECE 445 - Computer Organization
Page Tables According to Wikipedia: “A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are those unique to the accessing process. Physical addresses are those unique to the CPU, i.e., RAM.” Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Page Tables Stores placement information Array of page table entries, indexed by virtual page number Page table register in CPU points to page table in physical memory If page is present in memory PTE stores the physical page number Plus other status bits (referenced, dirty, …) If page is not present PTE can refer to location in swap space on disk Fall 2010 ECE 445 - Computer Organization
Translation Using a Page Table Fall 2010 ECE 445 - Computer Organization
Mapping Pages to Storage Fall 2010 ECE 445 - Computer Organization
ECE 445 - Computer Organization Questions? Fall 2010 ECE 445 - Computer Organization