CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
CMPE 421 Parallel Computer Architecture
CS1104: Computer Organisation School of Computing National University of Singapore.
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
University of Washington Memory and Caches I The Hardware/Software Interface CSE351 Winter 2013.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CSE 351 Section 9 3/1/12.
The Goal: illusion of large, fast, cheap memory
Ramya Kandasamy CS 147 Section 3
How will execution time grow with SIZE?
Cache Memory Presentation I
ECE 445 – Computer Organization
Lecture 21: Memory Hierarchy
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Lecture 20: OOO, Memory Hierarchy
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days –Primary memory = main memory –Secondary memory = disks Nowadays, hierarchy within the primary memory –One or more levels of caches on-chip (SRAM, expensive, fast) –Often one level of cache off-chip (DRAM or SRAM; less expensive, slower) –Main memory (DRAM; slower; cheaper; more capacity)

CSE378 Intro to caches2 Goal of a memory hierarchy Keep close to the ALU the information that will be needed now and in the near future –Memory closest to ALU is fastest but also most expensive So, keep close to the ALU only the information that will be needed now and in the near future Technology trends –Speed of processors (and SRAM) increase by 60% every year –Latency of DRAMS decrease by 7% every year –Hence the processor-memory gap or the memory wall bottleneck

CSE378 Intro to caches3 Processor-Memory Performance Gap x Memory latency decrease (10x over 8 years but densities have increased 100x over the same period) o x86 CPU speed (100x over 10 years) “Memory gap” “Memory wall” x x x x x x o o o o o 386 Pentium Pentium Pro Pentium III Pentium IV

CSE378 Intro to caches4 Typical numbers TechnologyTypical access time$/Mbyte SRAM1-20 ns$ DRAM40-120ns$1-10 Disk milliseconds  10 6 ns$

CSE378 Intro to caches5 Principle of locality A memory hierarchy works because code and data are not accessed randomly Computer programs exhibit the principle of locality –Temporal locality: data/code used in the past is likely to be reused in the future (e.g., code in loops, data in stacks) –Spatial locality: data/code close (in memory addresses) to the data/code that is being presently referenced will be referenced in the near future (straight-line code sequence, traversing an array)

CSE378 Intro to caches6 Caches Registers are not sufficient to keep enough data locality close to the ALU Main memory (DRAM) is too “far”. It takes many cycles to access it –Instruction memory is accessed every cycle Hence need of fast memory between main memory and registers. This fast memory is called a cache. –A cache is much smaller (in amount of storage) than main memory Goal: keep in the cache what’s most likely to be referenced in the near future

CSE378 Intro to caches7 Basic use of caches When fetching an instruction, first check to see whether it is in the (instruction) cache –If so (cache hit) bring the instruction from the cache to the IF/ID pipeline register –If not (cache miss) go to next level of memory hierarchy, until found When performing a load, first check to see whether it is in the (data) cache –If cache hit, send the data from the cache to the destination register –If cache miss go to next level of memory hierarchy, until found When performing a store, several possibilities –Ultimately, though, the store has to percolate to main memory

CSE378 Intro to caches8 Levels in the memory hierarchy ALU registers On-chip caches: split I-cache; D-cache 8-64KB (level 1) 64KB – 2MB (level 2) Off-chip cache; 128KB - 8MB Main memory; up to 4 GB Secondary memory; ’s of GB Archival storage SRAM; a few ns SRAM/DRAM;  ns DRAM; ns a few milliseconds

CSE378 Intro to caches9 Caches are ubiquitous Not a new idea. First cache in IBM System/85 (late 60’s) Concept of cache used in many other aspects of computer systems –disk cache, network server cache, web cache etc. Works because programs exhibit locality Lots of research on caches in last 25 years because of the increasing gap between processor speed and (DRAM) memory latency Every current microprocessor has a cache hierarchy with at least one level on-chip

CSE378 Intro to caches10 Main memory access (review) Recall: –In a Load (or Store) the address in an index in the memory array –Each byte of memory has a unique address, i.e., the mapping between memory address and memory location is unique ALU Address Main Mem

CSE378 Intro to caches11 Cache Access for a Load or an Instr. fetch Cache is much smaller than main memory –Not all memory locations have a corresponding entry in the cache at a given time When a memory reference is generated, i.e., when the ALU generates an address: –There is a look-up in the cache: if the memory location is mapped in the cache, we have a cache hit. The contents of the cache location is returned to the ALU. –If we don’t have a cache hit (cache miss), we have to look in next level in the memory hierarchy (i.e., other cache or main memory)

CSE378 Intro to caches12 Cache access ALU Address Cache Main memory How do you know where to look? How do you know if there is a hit? Main memory is accessed only if there was a cache miss hit miss

CSE378 Intro to caches13 Some basic questions on cache design When do we bring the contents of a memory location in the cache? Where do we put it? How do we know it’s there? What happens if the cache is full and we want to bring something new? –In fact, a better question is “what happens if we want to bring something new and the place where it’s supposed to go is already occupied?”

CSE378 Intro to caches14 Some “top level” answers When do we bring the contents of a memory location in the cache? -- When there is a cache miss for that location, that is “on demand” Where do we put it? -- Depends on cache organization (see next slides) How do we know it’s there? -- Each entry in the cache carries its own name, or tag What happens if the cache is full and we want to bring something new? One entry currently in the cache will be replaced by the new one

CSE378 Intro to caches15 Generic cache organization Address data Address or tag Generated by ALU If address (tag) generated by ALU = address (tag) of a cache entry, we have a cache hit; the data in the cache entry is valid Cache entry or cache block or cache line

CSE378 Intro to caches16 Cache organizations Mapping of a memory location to a cache entry can range from full generality to very restrictive –In general, the data portion of a cache block contains several words If a memory location can be mapped anywhere in the cache (full generality) we have a fully associative cache If a memory location can be mapped at a single cache entry (most restrictive) we have a direct-mapped cache If a memory location can be mapped at one of several cache entries, we have a set-associative cache

CSE378 Intro to caches17 How to check for a hit? For a fully associative cache –Check all tag (address) fields to see if there is a match with the address generated by ALU –Very expensive if it has to be done fast because need to perform all the comparisons in parallel –Fully associative caches do not exist for general-purpose caches For a direct mapped cache –Check only the tag field of the single possible entry For a set associative cache –Check the tag fields of the set of possible entries