CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping.

Slides:



Advertisements
Similar presentations
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially.
How caches take advantage of Temporal locality
11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Vm Computer Architecture Lecture 16: Virtual Memory.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Memory Systems Architecture and Hierarchical Memory Systems
Cache memory October 16, 2007 By: Tatsiana Gomova.
Chapter 6: Memory Memory is organized into a hierarchy
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
CAM Content Addressable Memory
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Cache Memory Yi-Ning Huang. Principle of Locality Principle of Locality A phenomenon that the recent used memory location is more likely to be used again.
CSCI206 - Computer Organization & Programming
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
ECE232: Hardware Organization and Design
Memory and cache CPU Memory I/O.
CAM Content Addressable Memory
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
ECE 445 – Computer Organization
CSCI206 - Computer Organization & Programming
Memory and cache CPU Memory I/O.
Module IV Memory Organization.
Module IV Memory Organization.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chap. 12 Memory Organization
Cache Memory.
How can we find data in the cache?
Miss Rate versus Block Size
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
So far in memory management…
Translation Buffers (TLB’s)
CSE451 Virtual Memory Paging Autumn 2002
Update : about 8~16% are writes
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping

CSCI 232© 2005 JW Ryder2 Some Tools M = 2 N - Address space size (usually physical) B = 2 b - Line size (words/block) S = 2 c - Cache size in blocks (2 b+c = Total # Words in cache) b words to a block 2 b = B S cache blocks 2 c = S b = 2 c = 2 S = 4 B = 4 M = 16

CSCI 232© 2005 JW Ryder3 Direct Mapping Cache N - c - b c b Word # in block Block # in S Block Tag Block j in MM maps to block frame number j mod S Previous example blocks 0, 4, 8, … map to block 0 GT 1 MM block maps to a cache line, every 1/S MM block maps to same line To see which MM block is in a cache frame, use the (N - c - b) bits as a block tag

CSCI 232© 2005 JW Ryder4 Direct Continued N = 8, b = 2, c = 2 N - c - b c b S = 2 c B = 2 b

CSCI 232© 2005 JW Ryder5 Operation Simultaneously –use middle c bit in addr to look up the tag register value in block frame –look up word in cache line c Compare (N - c - b) tag register bits with value of addr –128 blocks? j mod 128 ==> block # 3, 131, 259, etc. Tags match means hit otherwise miss –accessed word suppressed –victim pre-selected. No choice with direct mapped caches. Line indicated by c is the one that must be the victim

CSCI 232© 2005 JW Ryder6 Hardware Needed Tag Registers, S (N - c - b) bit registers 1 comparator to match tags Clean/Dirty bit and hardware per block frame No replacement hardware Associative hardware for tag matching not needed If control flip flops between MM blocks k and k + nS where n  Z, we have thrashing

CSCI 232© 2005 JW Ryder7 N - b b Fully Associative Block tag Word Number Block in MM can map to any cache frame Leading (N - b) bits in addr stored as block tag with each frame Comparison of block tags need to all be done at same time. Tag regs searched associatively with (N - b) bits as MM key. Contents of matching block frame only accessed if hit Cache set up as associative storage. (Content addressable memory) Allows fast access on hit

CSCI 232© 2005 JW Ryder8 Fully Continued Slower than direct mapped, no read ahead before match possible Victim - Any line, need to maintain history of each line

CSCI 232© 2005 JW Ryder9 Hardware S comparators (N - b bits / comparator) Block status bits (usage, clean/dirty) We have victim choice Costliest of all cache designs Best cache utilization Cycle time slower –Assoc search hardware Permits wide variety of replacement algorithms

CSCI 232© 2005 JW Ryder Fully Associative S = 2 c Direct Mapping: Cheap, poor performance in terms of replacement choices, fast cycle time Fully Associative: Expensive, very good performance in terms of replacement choices, slower cycle time (no early read out)

CSCI 232© 2005 JW Ryder11 Set Associative Cache Combination of Direct Mapped and Fully Associative Set 0, K=0 Set 1, K=1 Set 2, K=2 Set 3, K=3 Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7

CSCI 232© 2005 JW Ryder12 Set Assoc. Continued S blocks divided into K sets K = 2 m S / K = 2 c / 2 m = 2 c-m c = 3, m = 2, S = 2 c = 8 blocks K = 2 m = 2 2 = 4 sets S / K = Blocks per Set = 8 / 4 = 2 S / K = 2 c / 2 m = 2 c-m = = 2 1 = 2 This is an “S by K-way Set Associative Cache” –2-way Set Associative Cache

CSCI 232© 2005 JW Ryder13 Set Assoc. Continued Block # j in MM can be in any block frame (Fully Assoc. part) within set number j mod k (Direct Mapping part) N - m - b m b Block tag Set # Word # Set chosen by middle m bits in address

CSCI 232© 2005 JW Ryder14 Set Assoc. Diagram Set 00, K=0 Set 01, K=1 Set 10, K=2 Set 11, K=3 S = 2 c, c=3 K = 2 m, m=2 N - m - b bit block tags b = Addr. = N-m-b m b

CSCI 232© 2005 JW Ryder15 Diagram Continued Map address to set 01 (m) Read out S / K tags (2) and S / K lines (2) Compare S / K tags - find match on 0000 in set 01 Send word 10 (b) of already read out data to CPU from tag set 01 On Miss –Compare S / K tags and don’t find match on tags from set 01 –Suppress data lines read out from set 01 –Select victim from one of the S / K lines in set 01

CSCI 232© 2005 JW Ryder16 Pipelined Operations Stage 1 –Bring out each line from a set into intermediate latches. (Tag & Data) –Each line within set is in a different bank so interleaving is possible for fast access Stage 2 –Does associative compare

CSCI 232© 2005 JW Ryder17 Hardware S / K (N - m - b) bit comparators (assoc. logic) per set, S total Status, clean/dirty bits and associated hardware Registers to hold data and tags read out Can pipeline tag/data read out and compare Performance approaches that of Fully Associative cache

CSCI 232© 2005 JW Ryder18 Sector Mapped Cache N - r - b r b Sector tag Block # Word # MM divided into sectors with Q = 2 r blocks / sector A validity tag is also associated with each block frame within a sector (in the cache) to indicate if contents of that block frame are valid or not

CSCI 232© 2005 JW Ryder19 Sector Mapped Diagram Block 0 Block 1 Block 2 r - 1 Block 0 Block 1 Block 2 r - 1 Sector 0 Sector 1 Tags Q = 2 r blocks / sector r = 2

CSCI 232© 2005 JW Ryder20 Operation Leading (N - r - b) bits used to associatively locate the sector in the cache On hit –Use block # to locate block in sector (r) –Not real hit yet - only a sector hit –If validity tag is VALID, get word from block (b) else load block from MM (block miss) On miss –Select victim ‘sector frame’ –Reset validity tags on all blocks within sector frame –After writing back any dirty blocks, load only missing block and set its validity bit on

CSCI 232© 2005 JW Ryder21 Sector Mapping vs Set Associative Caches Sector Mapping: –Associative mapping to sector frame –Direct Mapping to block Set Associative: –Direct Mapping to set –Associative mapping to block within set Associative Hardware –Similar to Set Associative –Not as easy to pipeline