Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Organization.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
CPEN Digital System Design
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture And Organization UNIT-II Structured Organization.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Computer system & Architecture
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Memory and Storage. Computer Memory Processor registers – Temporary storage locations within the CPU – Examples Instruction register – holds the instruction.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Cosc 2150: Computer Organization
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Cache Memory.
Reducing Hit Time Small and simple caches Way prediction Trace caches
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Lecture 21: Memory Hierarchy
William Stallings Computer Organization and Architecture 7th Edition
Lecture 20: OOO, Memory Hierarchy
AKT211 – CAO 07 – Computer Memory
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Memory Principles.
Presentation transcript:

Memory Hierarchy and Cache

A Mystery…

Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit wide DDR : Dual Data Rate S : Synchronous D : synamic

Memory SRAM : Static RAM – Register technology – Maintains state as long as power is on – Flip flops – 4-6 transistors each

Memory DRAM : Dynamic Ram – Main memory technology – Each cell only one transistor an one capacitor Capacitor charge represents value – Slower to read/write – Must be refreshed

Since 1980, CPU has outpaced DRAM... 3 cycle delay for memory access

Since 1980, CPU has outpaced DRAM i7 107 cycle delay for main memory access

Cache Cache memory – Small, fast (SRAM) memory – Stores subset of main memory we think is most important

Cache L1 – closest/fastest to CPU – Often separate instruction/data caches – ~64KB

Cache L2 & L3 – May be on chip or board – May be shared by cores – ~ 1 MB (L2) ~5-10 MB (L3)

Differences No hard rules about – What cache you have – Where it lives

Cache How important is it?

Hierarchy Cache / Main Memory part of a hierarchy –

Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – use it No : Miss – go search next level – Is it in L2? Yes : Hit – use it No : Miss – go search next level – Is it in L3… – Is it in memory…

Memory Access Speedup Assume only L1 cache and main memory – S : Speedup – t m : time to access main memory – t c : time to access cache – h : hit ratio

Memory Access Speedup Divide through by t m Call t c /t m "k" – k : ratio of cache access time to memory access time

Speedup vs HitRate If cache is 100x faster than main memory: – Need high hit rate for large speedup

Cache & Locality Cache effectiveness based on: – Temporal locality : Recently used things tend to be needed again soon – Spatial locality : Memory accesses tend to cluster Sequential instruction access

Memory Units Main memory – Byte addressed

Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Word 0 Word 1 Word 2 Word 3 Word 4 Word 5 …

Memory Units Main memory – Byte addressed Registers – Words of 2-8 bytes Cache – Line of 1+ words Line 0 Line 1 …

Process I need memory location 0x000E – Is it in L1 cache? Yes : Hit – return it No : Miss – go search next level and bring back whole line – Is it in L2? Yes : Hit – return line No : Miss – go search next level bring back whole line – Is it in L3… – Is it in memory…

Associative Memory Data is looked up with a key:

Associativity – What chunks of memory can go in which cache lines

Fully Associative Fully associative cache – Any memory line can go in any cache entry

Fully Associative Memory address – 4 bytes per word – 2 words per line – xxx lines

Fully Associative Address Decoding

Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match

Fully Associative Line 2 could be in any of the cache lines – Must check all tags in parallel for a match – Large amounts of hardware Only practical for very small caches

Direct Mapping Direct mapping : every memory block has one cache entry it can use

Direct Mapped Cache 4 byte words 2 word lines (8 bytes) Cache of 4 lines (32 bytes)

Direct Mapped Cache Direct Mapped Cache : Every line mapped to one cache slot slot = line % 4

Direct Mapped Cache Direct Mapped Cache : Need to track who is in the slot 0? 4? 8?

Direct Mapped Cache Set: Group of lines = size of cache Tag: Records set each line is from

Direct Mapped Cache Address format based on – 4 bytes per word – 2 words per line – 4 lines per set – xxx sets of total memory

Direct Mapped Cache Address Decoding

Address Decoding Direct Mapped Cache

Using tags Need: Tag shows line is from the right set

Using tags Need: Tag shows wrong set is cached - fetch correct line

Scaled Up Byte-addressable memory of 2 14 bytes Cache has 16 blocks, each has 8 bytes What do addresses look like?

Scaled Up Byte-addressable memory of 2 32 bytes Words of 4 bytes Cache has 16 lines, each has 8 words What do addresses look like? – 32 bit address – 2 bits for byte in word – 3 bits for word in line – 4 bits for line – Set is leftovers… 23 bits

Issue : Thrashing Direct Mapped Cache

Issue : Thrashing 0x0040 = 0x x0020 Fetch Line 0/ Word 0 Replace with 1/0 Replace with 2/0 0x0044 = 0x x0024 Fetch 0/1 Replace with 1/1 Replace with 2/1 Direct Mapped Cache

Set Associative n-way Set Associative : every memory block has n-slots it can be in 2-way 

Set Associative n-way Set Associative : every memory block has n-slots it can be in 4-way 

Set Associative Address 2 way set associative:

Set Associative Address Need to check all slots in parallel for right tag

Replacement Strategies How do what block to kick out? – FIFO : Track age – Least Used : Track accesses Very susceptible to thrashing – Least Recently Used : Track age of accesses Very complex – Random

Set Accociative Performance Larger caches = higher hit rate Smaller caches benefit more from associativity

What do they use? Intell Haswell generation AMD

Bad Situations for Cache Data with poor locality – Complex object oriented programming structure Large 2D arrays traversed in column major order… Row Major Access Col Major Access