331 Lec20.1Fall 2003 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Advertisements

Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
331 Week13.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
Computer Systems Organization: Lecture 11
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
361 Computer Architecture Lecture 14: Cache Memory
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.
Computer Architecture and Design – ECEN 350 Part 9 [Some slides adapted from M. Irwin, D. Paterson. D. Garcia and others]
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSE331 W15.1Irwin&Li Fall 2006 PSU CSE 331 Computer Organization and Design Fall 2006 Week 15 Section 1: Mary Jane Irwin (
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
CMPE 421 Parallel Computer Architecture
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Improving Memory Access 2/3 The Cache and Virtual Memory
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 1.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
CPEG3231 Integration of cache and MIPS Pipeline  Data-path control unit design  Pipeline stalls on cache misses.
1Chapter 7 Memory Hierarchies Outline of Lectures on Memory Systems 1. Memory Hierarchies 2. Cache Memory 3. Virtual Memory 4. The future.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
CSE 331 Computer Organization and Design Fall 2007 Week 15
Morgan Kaufmann Publishers Memory & Cache
Mary Jane Irwin ( ) CSE 431 Computer Architecture Fall 2005 Lecture 19: Cache Introduction Review Mary Jane Irwin (
CMSC 611: Advanced Computer Architecture
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Morgan Kaufmann Publishers Memory Hierarchy: Introduction
ENG3380 Computer Organization and Architecture “Cache Memory Part II”
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory Rabi Mahapatra
CS Computer Architecture Spring Lecture 19: Cache Introduction
Presentation transcript:

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

331 Lec20.2Fall 2003 Head’s Up  This week’s material l Basics of caches -Reading assignment – PH 7.2  Reminders

331 Lec20.3Fall 2003 Second Level Cache (SRAM) Review: A Typical Memory Hierarchy Control Datapath Secondary Memory (Disk) On-Chip Components RegFile Main Memory (DRAM) Data Cache Instr Cache ITLB DTLB eDRAM Speed: 1 ns 2 ns 10 ns 50 ns 1,000 ns Size: 128 B 64 KB 256 KB 4 GB TB’s Cost: highest lowest  By taking advantage of the Principle of Locality: l Present the user with as much memory as is available in the cheapest technology at the access speed offered by the fastest technology.

331 Lec20.4Fall 2003 Review: Principle of Locality  Temporal Locality l Keep most recently accessed data items closer to the processor  Spatial Locality l Move blocks consisting of contiguous words to the upper levels  Hit Time << Miss Penalty l Hit: data appears in some block in the upper level (Blk X) -Hit Rate: the fraction of accesses found in the upper level -Hit Time: Time to access the upper level = RAM access time + Time to determine hit/miss l Miss: data needs to be retrieve from a lower level block (Blk Y) -Miss Rate = 1 - (Hit Rate) -Miss Penalty: Time to replace a block in the upper level with a block from the lower level + Time to deliver this block to the processor Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y  In general, Average Access Time: l = Hit Time + Miss Penalty x Miss Rate

331 Lec20.5Fall 2003 Review: How is the Hierarchy Managed?  registers memory l by compiler (programmer?)  cache main memory l by the hardware  main memory disks l by the hardware and operating system (virtual memory) l by the programmer (files)

331 Lec20.6Fall 2003  Two questions to answer (in hardware): l Q1: How do we know if a data item is in the cache? l Q2: If it is, how do we find it?  First method: l Direct mapped -For each item of data at the lower level, there is exactly one location in the cache where it might be (i.e., lots of items at the lower level share locations in the upper level) l Block size is one word of data l Mapping: (word address) modulo (# of words in the cache) Cache

331 Lec20.7Fall 2003 Caching: A Simple First Example Cache Main Memory Q2: How do we find it? Use low order 2 memory address bits to determine which cache block (i.e., modulo the number of blocks in the cache) TagData Q1: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Valid

331 Lec20.8Fall 2003 Direct Mapped Cache  Consider the main memory reference string Mem(0) 00 Mem(1) 00 Mem(0) 00 Mem(1) 00 Mem(2) miss hit 00 Mem(0) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) Start with an empty cache - all blocks marked as not valid 00 Mem(1) 00 Mem(2) 00 Mem(3)

331 Lec20.9Fall 2003 Another Reference String Mapping  Now consider the main memory reference string l Ping pong effect due to conflict misses - two memory locations that map into the same cache block Start with an empty cache - all blocks marked as not valid miss 00 Mem(0) Mem(4) Mem(0) Mem(0) Mem(0) Mem(4) Mem(4) 0 00

331 Lec20.10Fall 2003 Sources of Cache Misses  Compulsory (cold start or process migration, first reference): first access to a block l “Cold” fact of life, not a whole lot you can do about it l If you are going to run “billions” of instruction, Compulsory Misses are insignificant  Conflict (collision): l Multiple memory locations mapped to the same cache location l Solution 1: increase cache size l Solution 2: increase associativity  Capacity: l Cache cannot contain all blocks accessed by the program l Solution: increase cache size

331 Lec20.11Fall 2003  One word/block, cache size = 1K words MIPS Direct Mapped Cache Example Hit 20Tag 10 Index DataIndexTagValid Byte offset 20 Data 32

331 Lec20.12Fall 2003 Cache Summary  The Principle of Locality: l Program likely to access a relatively small portion of the address space at any instant of time -Temporal Locality: Locality in Time -Spatial Locality: Locality in Space  Three Major Categories of Cache Misses: l Compulsory Misses: sad facts of life. Example: cold start misses l Conflict Misses: increase cache size and/or associativity Nightmare Scenario: ping pong effect! l Capacity Misses: increase cache size  Cache Design Space l total size, block size, associativity (replacement policy) l write-hit policy (write-through, write-back) l write-miss policy (write allocate, write buffers)

331 Lec20.13Fall 2003  The off-chip interconnect and memory architecture can affect overall system performance in dramatic ways. Memory Systems that Support Caches CPU Cache Memory bus One word wide organization (one word wide bus and one word wide memory)  Assume 1. 1 clock cycle (2 ns) to send the address clock cycles (50 ns) for DRAM cycle time, 8 clock cycles (16 ns) access time 3. 1 clock cycle (2ns) to return a word of data  Memory-Bus to Cache bandwidth number of bytes accessed from memory and transferred to cache/CPU per clock cycle 32-bit data & 32-bit addr per cycle on-chip

331 Lec20.14Fall 2003 One Word Wide Memory Organization CPU Cache Memory bus on-chip  If the block size is one word, then for a memory access due to a cache miss, the pipeline will have to stall the number of cycles required to return one data word from memory cycle to send address cycles to read DRAM cycle to return data total clock cycles miss penalty  Number of bytes transferred per clock cycle (bandwidth) for a single miss is bytes per clock /27 = 0.148

331 Lec20.15Fall 2003 One Word Wide Memory Organization, con’t CPU Cache Memory bus on-chip  What if the block size is four words? cycle to send 1 st address cycles to read DRAM cycles to return last data word total clock cycles miss penalty  Number of bytes transferred per clock cycle (bandwidth) for a single miss is bytes per clock 25 cycles 1 4 x 25 = (4 x 4)/102 = 0.157

331 Lec20.16Fall 2003 One Word Wide Memory Organization, con’t CPU Cache Memory bus on-chip  What if the block size is four words and if a fast page mode DRAM is used? cycle to send 1 st address cycles to read DRAM cycles to return last data word total clock cycles miss penalty  Number of bytes transferred per clock cycle (bandwidth) for a single miss is bytes per clock 25 cycles 8 cycles *8 = (4 x 4)/51 = 0.314

331 Lec20.17Fall 2003 Interleaved Memory Organization  For a block size of four words cycle to send 1 st address cycles to read DRAM cycles to return last data word total clock cycles miss penalty CPU Cache Memory bank 1 bus on-chip Memory bank 0 Memory bank 2 Memory bank 3  Number of bytes transferred per clock cycle (bandwidth) for a single miss is bytes per clock 25 cycles (4 x 4)/30 = =

331 Lec20.18Fall 2003 DRAM Memory System Summary  Its important to match the cache characteristics l caches access one block at a time (usually more than one word)  with the DRAM characteristics l use DRAMs that support fast multiple word accesses, preferably ones that match the block size of the cache  with the memory-bus characteristics l make sure the memory-bus can support the DRAM access rates and patterns l with the goal of increasing the Memory-Bus to Cache bandwidth