CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Address Translation and Protection Steve Ko Computer Sciences and Engineering University at.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Pipelining III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache IV Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590 Computer Architecture Virtual Memory II
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 15 Instructor: L.N. Bhuyan
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
February 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at.
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
CMPE 421 Parallel Computer Architecture
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
February 9, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
1 Lecture 10 Cache Peng Liu Physical Size Affects Latency 2 Small Memory CPU Big Memory CPU  Signals have further to travel  Fan.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
CENG709 Computer Architecture and Operating Systems Lecture 7 - Memory Hierarchy-II Murat Manguoglu Department of Computer Engineering Middle East Technical.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
The Goal: illusion of large, fast, cheap memory
Multilevel Memories (Improving performance using alittle “cash”)
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Peng Liu Lecture 11 Cache Peng Liu
Peng Liu Lecture 10 Cache Peng Liu
Rose Liu Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Systems Architecture II
Adapted from slides by Sally McKee Cornell University
CMSC 611: Advanced Computer Architecture
Peng Liu Lecture 10 Cache (2) Peng Liu
Lecture 22: Cache Hierarchies, Memory
CS 3410, Spring 2014 Computer Science Cornell University
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 6 – Memory II Krste Asanovic Electrical Engineering and Computer.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo

CSE 490/590, Spring Last time… Dynamic RAM (DRAM) is main form of main memory storage in use today –Holds values on small capacitors, need refreshing (hence dynamic) –Slow multi-step access: precharge, read row, read column Static RAM (SRAM) is faster but more expensive –Used to build on-chip memory for caches Cache holds small set of values in fast memory (SRAM) close to processor –Need to develop search scheme to find values in cache, and replacement policy to make space for newly accessed locations Caches exploit two forms of predictability in memory reference streams –Temporal locality, same location likely to be accessed again soon –Spatial locality, neighboring location likely to be accessed soon

CSE 490/590, Spring 2011 Some Basics (Again) Block: the unit of access/storage in cache Word: the unit of access by CPU A block contains multiple words. –Why multiple words? On cache miss, –Memory access –Cache block (re)placement –Why keep it? Five things to decide –After fetching a block from the memory, where do we place it inside the cache? –If the line is taken or the cache is full already, which block to evict? –How many words per block? –How big? –What happens on write? 3

CSE 490/590, Spring Placement Policy Set Number Cache Fully (2-way) Set Direct AssociativeAssociative Mapped anywhereanywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) Memory Block Number block 12 can be placed

CSE 490/590, Spring 2011 Direct-Mapped Cache TagData Block V = Block Offset TagIndex t k b t HIT Data Word or Byte 2 k lines

CSE 490/590, Spring Way Set-Associative Cache TagData Block V = Block Offset TagIndex t k b HIT TagData Block V Data Word or Byte = t

CSE 490/590, Spring 2011 Fully Associative Cache TagData Block V = Block Offset Tag t b HIT Data Word or Byte = = t

CSE 490/590, Spring Replacement Policy In an associative cache, which block from a set should be evicted when the set becomes full? Random Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way) pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin used in highly associative caches Not Most Recently Used (NMRU) FIFO with exception for most recently used block or blocks This is a second-order effect. Why? Replacement only happens on misses

CSE 490/590, Spring Word3Word0Word1Word2 Block Size and Spatial Locality Larger block size has distinct hardware advantages less tag overhead exploit fast burst transfers from DRAM exploit fast burst transfers over wide busses What are the disadvantages of increasing block size? block address offset b 2 b = block size a.k.a line size (in bytes) Split CPU address b bits 32-b bits Tag Block is unit of transfer between the cache and memory 4 word block, b=2 Fewer blocks => more conflicts. Can waste bandwidth.

CSE 490/590, Spring CPU-Cache Interaction (5-stage pipeline) PC addr inst Primary Instruction Cache 0x4 Add IR D nop hit ? PCen Decode, Register Fetch wdata R addr wdata rdata Primary Data Cache we A B YY ALU MD1 MD2 Cache Refill Data from Lower Levels of Memory Hierarchy hit ? Stall entire CPU on data cache miss To Memory Control M E

CSE 490/590, Spring Improving Cache Performance Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the hit time reduce the miss rate reduce the miss penalty What is the simplest design strategy? Biggest cache that doesn’t increase hit time past 1-2 cycles (approx 8-32KB in modern technology) [ design issues more complex with out-of-order superscalar processors ]

CSE 490/590, Spring 2011 Serial-versus-Parallel Cache and Memory access  is HIT RATIO: Fraction of references in cache 1 -  is MISS RATIO: Remaining references CACHE Processor Main Memory Addr Data Average access time for serial search: t cache + (1 - ) t mem CACHE Processor Main Memory Addr Data Average access time for parallel search:  t cache + (1 - ) t mem Savings are usually small, t mem >> t cache, hit ratio  high High bandwidth required for memory path Complexity of handling parallel paths can slow t cache

CSE 490/590, Spring CSE 490/590 Administrivia Feedback on lectures –If you have any feedback/concern, please send it along to me –Thanks to those who already did –Please ask questions if things are not clear –Or you can simply scream, “TOO FAST!” –Please utilize my office hours (I will change to sometime in the afternoon) Very important to attend –Recitations this week & next week Quiz 1 –Fri, 2/11 –Closed book, in-class –Includes lectures until last Monday (1/31) –Review: Wed (2/9)

CSE 490/590, Spring Acknowledgements These slides heavily contain material developed and copyright by –Krste Asanovic (MIT/UCB) –David Patterson (UCB) And also by: –Arvind (MIT) –Joel Emer (Intel/MIT) –James Hoe (CMU) –John Kubiatowicz (UCB) MIT material derived from course UCB material derived from course CS252