CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

Slides:

Advertisements

Similar presentations

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

Advertisements

CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.

CS 430 – Computer Architecture

Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:

The Memory Hierarchy CPSC 321 Andreas Klappenecker.

Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

COMP3221 lec33-Cache-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 12: Cache Memory - I

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.

COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.

CS61C L32 Caches II (1) Garcia, 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.

CS61C L31 Caches I (1) Garcia 2005 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

CS 61C L21 Caches II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

Processor Design 5Z032 Memory Hierarchy Chapter 7 Henk Corporaal Eindhoven University of Technology 2009.

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Computing Systems Memory Hierarchy.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Csci 211 Computer System Architecture – Review on Cache Memory Xiuzhen Cheng

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.

Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

COSC3330 Computer Architecture

The Goal: illusion of large, fast, cheap memory

Improving Memory Access 1/3 The Cache and Virtual Memory

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

CS-447– Computer Architecture Lecture 20 Cache Memories

Lecturer PSOE Dan Garcia

Some of the slides are adopted from David Patterson (UCB)

Chapter Five Large and Fast: Exploiting Memory Hierarchy

10/18: Lecture Topics Using spatial locality

Presentation transcript:

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy

2 Topics  Memories: Types  Memory Hierarchy: why?  Basics of caches  Measuring cache performance  Improving cache performance  Framework for memory hierarchies

3  SRAM:  value is stored on a pair of inverting gates  very fast but takes up more space than DRAM (4 to 6 transistors)  access time: 5-25 ns  Cost (US$) per MByte in 1997: 100 to 250  DRAM:  value is stored as a charge on capacitor (must be refreshed)  very small but slower than SRAM (factor of 5 to 10)  access time: ns  Cost (US$) per MByte in 1997: 5 to 10 Memories: Review

Memory Hierarchy: why?  Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns at cost of $5 to $10 per Mbyte. Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.  Try and give it to them anyway  build a memory hierarchy CPU Level 1 Level 2 Level n Size Speed

5 Memory Hierarchy: requirements  If level is closer to Processor, it must…  Be smaller  Be faster  Contain a subset (most recently used data) of lower levels beneath it  Contain all the data in higher levels above it  Lowest Level (usually disk or the main memory) contains all the available data

6 Locality  A principle that makes having a memory hierarchy a good idea  If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality : nearby items will tend to be referenced soon.  Our initial focus: two levels (upper, lower)  block: minimum unit of data  hit: data requested is in the upper level  miss: data requested is not in the upper level

7 Exploiting locality Caches

8  Two issues:  How do we know if a data item is in the cache?  If it is, how do we find it?  Our first example:  block size is one word of data  "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level Cache

9 Direct Mapped Cache  Cache Location 0 can be occupied by data from:  Memory location 0, 4, 8,...  In general: any memory location that is multiple of 4 Memory Memory Address A B C D E F 4 Byte Direct Mapped Cache Cache Index

10  Mapping: address is modulo the number of blocks in the cache Direct Mapped Cache

11 Issues with Direct Mapped Caches 1.Since multiple memory addresses map to same cache index, how do we tell which one is in there? 2.What if we have a block size > 1 byte?  Solution: divide memory address into three fields ttttttttttttttttt iiiiiiiiii oooo tagindexbyte to checkto offset if have selectwithin correct blockblockblock

12 Direct Mapped Caches: Terminology  All fields are read as unsigned integers.  Index: specifies the cache index (which “row” of the cache we should look in)  Offset: once we’ve found correct block, specifies which byte within the block we want  Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

13 Direct Mapped Cache: Example  Suppose we have a 16KB direct-mapped cache with 4 word blocks.  Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture.  Offset  need to specify correct byte within a block  block contains 4 words = 16 bytes = 2 4 bytes  need 4 bits to specify correct byte

14 Direct Mapped Cache: Example  Index  need to specify correct row in cache  cache contains 16 KB = 2 14 bytes  block contains 2 4 bytes (4 words)  # rows/cache =# blocks/cache (since there’s one block/row) =bytes/cache bytes/row =2 14 bytes/cache 2 4 bytes/row =2 10 rows/cache  need 10 bits to specify this many rows

15 Direct Mapped Cache: Example  Tag  use remaining bits as tag  tag length = mem addr length - offset - index = bits = 18 bits  so tag is leftmost 18 bits of memory address

16  For MIPS: Direct Mapped Cache Byte offset ValidTagDataIndex Tag Index HitData Address (bit positions)

17  32-bit byte addresses  Direct-mapped cache of size 2 n words with one-word (4-byte) blocks  What is the size of the “tag field”?  32 – (n+2) bits (2 bits for byte offset and n bits for index)  What is the total number of bits in the cache? 2 n x (block size + tag size + valid field size) = 2 n x (32 + (32 – n – 2) + 1) because the block size is 32 bits = 2 n x (63 – n) #Bits required (example)

18  Taking advantage of spatial locality: Direct Mapped Cache Address (bit positions)

19 Accessing data in a direct mapped cache  Example: 16KB, direct- mapped, 4 word blocks  Read 4 addresses  0x , 0x C, 0x , 0x  Memory values on right:  only cache/memory level of hierarchy Address (hex)Value of Word Memory C a b c d C e f g h C i j k l...

20 Accessing data in a direct mapped cache  4 Addresses:  0x , 0x C, 0x , 0x  4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset

21  Read hits  this is what we want!  Read misses  stall the CPU, fetch block from memory, deliver to cache, restart the load instruction  Write hits:  can replace data in cache and memory (write-through)  write the data only into the cache (write-back the cache later)  Write misses:  read the entire block into the cache, then write the word (allocate on write miss)  do not read the cache line; just write to memory (no allocate on write miss) Hits vs. Misses

22  Make reading multiple words easier by using banks of memory  It can get a lot more complicated... Hardware Issues

23  Increasing the block size tends to decrease miss rate: Performance

24 Performance  Use split caches because there is more spatial locality in code:

25 Memory access times #clock cycles to send the address (say 1) #clock cycles to initiate each DRAM access (say 15) #clock cycles to transfer a word of data (say 1) Clock cycles required to access 4 words: 1 + 4x15 + 4x11 + 1x15 + 4x11 + 1x15 + 1

26 Improving performance  Two ways of improving performance:  decreasing the miss ratio: associativity  decreasing the miss penalty: multilevel caches

27 Decreasing miss ratio with associativity block 2 blocks / set 4 blocks / set 8 blocks / set

28 4-way set-associative cache

29 Tag size versus associativity Cache of 4K blocks, four word block size (or four-word cache lines), and 32-bit addresses  Direct mapped  Byte offset = 4 bits (each block = 4 words = 16 bytes)  Index + Tag = 32 – 4 = 28 bits  For 4K blocks, 12 index bits are required  #Tag bits for each block = 28 – 12 = 16  Total #Tag bits = 16 x 4 = 64Kbits  4-way set-associative  #Sets = 1K, therefore 10 bits index bits are required  #Tag bits for each block = 28 – 10 = 18  Total #Tag bits = 4 x 18 x 1K = 72Kbits

30 Block replacement policy In a direct mapped cache, when a miss occurs, the requested block can go only at one position. In a set-associative cache, there can be multiple positions in a set for storing each block. If all the positions are filled, which block should be replaced?  Least Recently Used (LRU) Policy  Randomly choose a block and replace it

31 Common framework for memory hierarchies Q1: where can a block be placed?  note: a block is in this case a cache line  Direct mapped cache: one position  n-way set-associative cache: n positions (typically 2-8)  Fully associative: everywhere Q2: how is a block found?  Direct mapped: index part of address indicates entry  n-way: use index to search in all the n cache blocks  Fully associative: check all tags

32 Common framework for memory hierarchies Q3: wich block should be replaced on a miss?  Direct mapped: no choice  Associative caches: use replacement algorithm, like LRU Q4: what happens on a write?  write-through  write-back  on a write miss: allocate  no-allocate

33 Common framework for memory hierarchies  Understanding (cache) misses:The three Cs  Compulsory miss  Capacity miss  Conflict miss cache size miss rate direct mapped (1-way) 2-way fully associative compulsory capacity 4-way

34 Reading  3 rd edition of the textbook  Chapter 7, Sections 7.1 – 7.3 and Section 7.5  2 nd edition of the textbook  Chapter 7, Sections 7.1 – 7.3 and Section 7.5