Morgan Kaufmann Publishers Memory Hierarchy: Introduction 17 January, 2019 Lecture 4.1 Memory Hierarchy: Introduction Chapter 5 — Large and Fast: Exploiting Memory Hierarchy
Learning Objectives Outline the memory hierarchy Explain the principle of locality Spatial locality Temporal locality Understand the abstract view of cache in computer organization Cache is transparent to processor Calculate hit rate and miss rate 2
Coverage Textbook Chapter 5.1 3
Processor-Memory Performance Gap 55%/year (2X/1.5yr) “Moore’s Law” Processor-Memory Performance Gap (grows 50%/year) DRAM 7%/year (2X/10yrs) HIDDEN SLIDE – KEEP? Memory baseline is a 64KB DRAM in 1980, with three years to the next generation until 1996 and then two years thereafter with a 7% per year performance improvement in latency. Processor assumes a 35% improvement per year until 1986, then a 55% until 2003, then 5% Need to supply an instruction and a data every clock cycle In 1980 there were no caches (and no need for them), by 1995 most systems had 2 level caches (e.g., 60% of the transistors on the Alpha 21164 were in the cache)
The “Memory Wall” Processor vs DRAM speed disparity continues to grow Clocks per DRAM access Clocks per instruction Good memory hierarchy (cache) design is increasingly important to overall performance
Morgan Kaufmann Publishers 17 January, 2019 Memory Technology Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM) 50ns – 70ns, $20 – $75 per GB Magnetic disk 5ms – 20ms, $0.20 – $2 per GB Ideal memory Access time of SRAM Capacity and cost/GB of disk 6 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy
Memory Hierarchy The Pyramid 7
A Typical Memory Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology while at the speed offered by the fastest technology On-Chip Components Control Secondary Memory (Disk) Cache Instr Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Cache Data RegFile DTLB Instead, the memory system of a modern computer consists of a series of black boxes ranging from the fastest to the slowest. Besides variation in speed, these boxes also varies in size (smallest to biggest) and cost. What makes this kind of arrangement work is one of the most important principle in computer design. The principle of locality. The principle of locality states that programs access a relatively small portion of the address space at any instant of time. The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. (We will go over this slide in detail in the next lectures on caches). Speed (cycles): ½’s 1’s 10’s 100’s 10,000’s Size (bytes): 100’s 10K’s M’s G’s T’s Cost: highest lowest
Morgan Kaufmann Publishers January 17, 2019 Inside the Processor AMD Barcelona: 4 processor cores Chapter 1 — Computer Abstractions and Technology
Morgan Kaufmann Publishers 17 January, 2019 Principle of Locality Programs access a small proportion of their address space at any time Temporal locality (locality in time) Items accessed recently are likely to be accessed again soon e.g., instructions in a loop, induction variables Keep most recently accessed items into the cache Spatial locality (locality in space) Items near those accessed recently are likely to be accessed soon E.g., sequential instruction access, array data Move blocks consisting of contiguous words closer to the processor 10 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy
Taking Advantage of Locality Morgan Kaufmann Publishers 17 January, 2019 Taking Advantage of Locality Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory Main memory Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory Cache memory attached to CPU Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy
Memory Hierarchy Levels Morgan Kaufmann Publishers 17 January, 2019 Memory Hierarchy Levels Block (aka cache line) Unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses Hit time: If accessed data is absent Miss: data not in upper level Miss ratio: misses/accesses = 1 – hit ratio Miss penalty: Time to access the block + Time to determine hit/miss Time to access the block in the lower level + Time to transmit that block to the level that experienced the miss + Time to insert the block in that level + Time to pass the block to the requestor Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy