15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Slides:



Advertisements
Similar presentations
Memory Hierarchy CS465 Lecture 11. D. Barbara Memory CS465 2 Control Datapath Memory Processor Input Output Big Picture: Where are We Now?  The five.
Advertisements

CS 430 – Computer Architecture
Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:
CS 430 Computer Architecture 1 CS 430 – Computer Architecture Caches, Part II William J. Taffe using slides of David Patterson.
CS61C L32 Caches II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #20: Caches Andy.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CS61C L31 Caches II (1) Garcia, Fall 2006 © UCB GPUs >> CPUs?  Many are using graphics processing units on graphics cards for high-performance computing.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
CS61C L23 Cache II (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 – Cache II.
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
COMP3221 lec33-Cache-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 12: Cache Memory - I
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
CS61C L23 Caches I (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
CS61C L21 Caches I (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 31 – Caches II In this week’s Science, IBM researchers describe a new class.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
CS61C L32 Caches II (1) Garcia, 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L31 Caches I (1) Garcia 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.
CS61C L32 Caches II (1) Garcia, Spring 2007 © UCB Experts weigh in on Quantum CPU  Most “profoundly skeptical” of the demo. D-Wave has provided almost.
CS61C L30 Caches I (1) Garcia, Fall 2006 © UCB Shuttle can’t fly over Jan 1?  A computer bug has come up for the shuttle – its computers don’t reset to.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I Touted as “the fastest CPU on Earth”, IBM’s new Power6 doubles.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
CS61C L24 Cache II (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
CS 61C L21 Caches II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Cache Memory CSE Slides from Dan Garcia, UCB.
CMPE 421 Parallel Computer Architecture
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
CS61C L17 Cache1 © UC Regents 1 CS61C - Machine Structures Lecture 17 - Caches, Part I October 25, 2000 David Patterson
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Csci 211 Computer System Architecture – Review on Cache Memory Xiuzhen Cheng
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Memristor memory on its way (hopefully)
Cache Memory.
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecturer PSOE Dan Garcia
Some of the slides are adopted from David Patterson (UCB)
Csci 211 Computer System Architecture – Review on Cache Memory
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Presentation transcript:

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture M,W 10-11:20am Lecture 20 Cache Memories

Computer ArchitectureFall 2007 © Locality °A principle that makes having a memory hierarchy a good idea °If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon.

Computer ArchitectureFall 2007 © A View of the Memory Hierarchy Regs L2 Cache Memory Disk Tape Instr. Operands Blocks Pages Files Upper Level Lower Level Faster Larger Cache Blocks

Computer ArchitectureFall 2007 © Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level Cache

Computer ArchitectureFall 2007 © Cache Design °How do we organize cache? °Where does each memory address map to? (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.) °How do we know which elements are in cache? °How do we quickly locate them?

Computer ArchitectureFall 2007 © Direct Mapped Cache °Mapping: address is modulo the number of blocks in the cache

Computer ArchitectureFall 2007 © Direct-Mapped Cache (1/2) °In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory

Computer ArchitectureFall 2007 © Direct-Mapped Cache (2/2) °Cache Location 0 can be occupied by data from: Memory location 0, 4, 8,... 4 blocks => any memory location that is multiple of 4 Memory Memory Address A B C D E F 4 Byte Direct Mapped Cache Cache Index

Computer ArchitectureFall 2007 © Issues with Direct-Mapped °Since multiple memory addresses map to same cache index, how do we tell which one is in there? °What if we have a block size > 1 byte? °Answer: divide memory address into three fields ttttttttttttttttt iiiiiiiiii oooo tagindexbyte to checkto offset if have selectwithin correct blockblockblock WIDTHHEIGHT Tag Index Offset

Computer ArchitectureFall 2007 © Direct-Mapped Cache Terminology °All fields are read as unsigned integers. °Index: specifies the cache index (which “row” of the cache we should look in) °Offset: once we’ve found correct block, specifies which byte within the block we want -- I.e., which “column” °Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Computer ArchitectureFall 2007 © Direct Mapped Cache (for MIPS)

Computer ArchitectureFall 2007 © Direct-Mapped Cache Example (1/3) °Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks °Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture °Offset need to specify correct byte within a block block contains 4 words = 16 bytes = 2 4 bytes need 4 bits to specify correct byte

Computer ArchitectureFall 2007 © Direct-Mapped Cache Example (2/3) °Index: (~index into an “array of blocks”) need to specify correct row in cache cache contains 16 KB = 2 14 bytes block contains 2 4 bytes (4 words) # blocks/cache =bytes/cache bytes/block = 2 14 bytes/cache 2 4 bytes/block =2 10 blocks/cache need 10 bits to specify this many rows

Computer ArchitectureFall 2007 © Direct-Mapped Cache Example (3/3) °Tag: use remaining bits as tag tag length = addr length - offset - index = bits = 18 bits so tag is leftmost 18 bits of memory address °Why not full 32 bit address as tag? All bytes within block need same address (4b) Index must be same for every address within a block, so its redundant in tag check, thus can leave off to save memory (10 bits in this example)

Computer ArchitectureFall 2007 © TIO cache mnemonic AREA (cache size, B) = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) WIDTH (size of one block, B/block) HEIGHT (# of blocks) AREA (cache size, B) 2 (H+W) = 2 H * 2 W Tag Index Offset

Computer ArchitectureFall 2007 © Caching Terminology °When we try to read memory, 3 things can happen: 1.cache hit: cache block is valid and contains proper address, so read desired word 2.cache miss: nothing in cache in appropriate block, so fetch from memory 3.cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy)

Computer ArchitectureFall 2007 © Accessing data in a direct mapped cache °Ex.: 16KB of data, direct-mapped, 4 word blocks °Read 4 addresses 1.0x x C 3.0x x °Memory values on right: only cache/ memory level of hierarchy Address (hex) Value of Word Memory C a b c d C e f g h C i j k l...

Computer ArchitectureFall 2007 © Accessing data in a direct mapped cache °4 Addresses: 0x , 0x C, 0x , 0x °4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset

Computer ArchitectureFall 2007 © 16 KB Direct Mapped Cache, 16B blocks °Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid)... Valid Tag 0x0-3 0x4-70x8-b0xc-f Index

Computer ArchitectureFall 2007 © 1. Read 0x Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © So we read block 1 ( )... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © So load that data into cache, setting tag, valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Read from cache at offset, return word b ° Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © 2. Read 0x C = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Index is Valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Index valid, Tag Matches... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Index Valid, Tag Matches, return d... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © 3. Read 0x = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © So read block 3... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © No valid data... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Load that cache block, return word f... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © 4. Read 0x = 0… Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © So read Cache Block 1, Data is Valid... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Cache Block 1 Tag does not match (0 != 2)... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Miss, so replace block 1 with new data & tag... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © And return word j... Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl ° efgh Index Tag fieldIndex fieldOffset

Computer ArchitectureFall 2007 © Do an example yourself. What happens? °Chose from: Cache: Hit, Miss, Miss w. replace Values returned:a,b, c, d, e,..., k, l °Read address 0x ? °Read address 0x c ? Valid Tag 0x0-3 0x4-70x8-b0xc-f ijkl 1 0efgh Index Cache

Computer ArchitectureFall 2007 © Answers °0x a hit Index = 3, Tag matches, Offset = 0, value = e °0x c a miss Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d °Since reads, values must = memory values whether or not cached: 0x = e 0x c = d Address Value of Word Memory c a b c d c e f g h c i j k l...

Computer ArchitectureFall 2007 © °Read hits this is what we want! °Read misses stall the CPU, fetch block from memory, deliver to cache, restart Hits vs. Misses

Computer ArchitectureFall 2007 © °Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write- back the cache later) °Write misses: read the entire block into the cache, then write the word Hits vs. Misses

Computer ArchitectureFall 2007 © Block Size Tradeoff (1/3) °Benefits of Larger Block Size Spatial Locality: if we access a given word, we’re likely to access other nearby words soon Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well Works nicely in sequential array accesses too

Computer ArchitectureFall 2007 © Block Size Tradeoff (2/3) °Drawbacks of Larger Block Size Larger block size means larger miss penalty -on a miss, takes longer time to load a new block from next level If block size is too big relative to cache size, then there are too few blocks -Result: miss rate goes up °In general, minimize Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate

Computer ArchitectureFall 2007 © Block Size Tradeoff (3/3) °Hit Time = time to find and retrieve data from current level cache °Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) °Hit Rate = % of requests that are found in current level cache °Miss Rate = 1 - Hit Rate

Computer ArchitectureFall 2007 © Block Size Tradeoff Conclusions Miss Penalty Block Size Increased Miss Penalty & Miss Rate Average Access Time Block Size Exploits Spatial Locality Fewer blocks: compromises temporal locality Miss Rate Block Size

Computer ArchitectureFall 2007 © °Increasing the block size tends to decrease miss rate: Performance

Computer ArchitectureFall 2007 © Performance °Simplified model: execution time = (execution cycles + stall cycles)  cycle time stall cycles = # of instructions  miss ratio  miss penalty °Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size?