How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Performance of Cache Memory

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Processor - Memory Interface

The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

11/3/2005Comp 120 Fall November 10 classes to go! Cache.

L18 – Memory Hierarchy 1 Comp 411 – Fall /30/2009 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

IT253: Computer Organization

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.

COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CMSC 611: Advanced Computer Architecture

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

CS161 – Design and Architecture of Computer

Improving Memory Access 1/3 The Cache and Virtual Memory

Multilevel Memories (Improving performance using alittle “cash”)

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Morgan Kaufmann Publishers Memory & Cache

Morgan Kaufmann Publishers

William Stallings Computer Organization and Architecture 7th Edition

Lecture 21: Memory Hierarchy

Andy Wang Operating Systems COP 4610 / CGS 5765

Lecture 23: Cache, Memory, Virtual Memory

Performance metrics for caches

Performance metrics for caches

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Miss Rate versus Block Size

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Lecture 22: Cache Hierarchies, Memory

CS 3410, Spring 2014 Computer Science Cornell University

Performance metrics for caches

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Update : about 8~16% are writes

Cache Memory Rabi Mahapatra

Performance metrics for caches

10/18: Lecture Topics Using spatial locality

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

How to Build a CPU Cache COMP25212 – Lecture 2

Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU writes –key cache performance parameters –making cache more efficient COMP25212 – Lecture 2

Cache Functionality The CPU will give the cache a full (e.g. 32 bit) address to access some data The cache will contain a (small) selection of values from the main memory – each of those values will have come from a particular main memory location How do we find the data? It may or may not be in the cache COMP25212 – Lecture 2 3

Fully Associative Cache (1) COMP25212 – Lecture 2 4 Address from CPU Memory stores both addresses and data Hardware compares Input address with All stored addresses (in parallel) Cache is small Can only hold a few memory values (32k ?) =? Address Data =?

Fully Associative Cache (2) If address is found – this is a ‘cache hit’ – data is read (or written) – no need to access main RAM memory. (fast) If address is not found – this is a ‘cache miss’ – must go to main RAM memory. (slow) But how does data get into the cache? If we have to go to main memory, should we just leave the cache as it was? COMP25212 – Lecture 2 5

Locality Caches rely on locality Temporal Locality – things when used will be used again soon – e.g. instructions and data in loops. Spatial locality – things close together (adjacent addresses) in store are often used together. (e.g. instructions & arrays) COMP25212 – Lecture 2 6

Cache Hit Rate Fraction of cache accesses which ‘hit’ in cache. Need > 98% to hide 50:1 ratio of memory speed to instruction speed Hit rates for instructions usually better than for data (more regular access patterns - more locality) COMP25212 – Lecture 2 7

What to do on a cache miss Temporal locality says it is a good idea to put recently used data in the cache. Assume a store read for the moment. As well as using the data: –Put newly read value into the cache –But cache may be already full –Need to choose a location to reject (replacement policy) COMP25212 – Lecture 2 8

Cache replacement policy Least Recently Used (LRU) – makes sense but expensive to implement (in hardware) Round Robin (or cyclic) - cycle round locations – least recently fetched from memory Random – easy to implement – not as bad as it might seem. COMP25212 – Lecture 2 9

Cache Write Strategy (1) Writes are slightly more complex than reads Must do an address comparison first to see if address is in cache Cache hit –Update value in cache –But what about value in RAM? –If we write to RAM every time it will be slow –Does RAM need updating every time? COMP25212 – Lecture 2 10

Cache Hit Write Strategy Write Through –Every cache write is also done to memory Write Through with buffers –Write through is buffered i.e processor doesn’t wait for it to finish (but multiple writes could back up) Copy Back –Write is only done to cache (mark as ‘dirty’) –RAM is updated when ‘dirty’ cache entry is replaced (or cache is flushed e.g. on process switch) COMP25212 – Lecture 2 11

Cache Miss Write Strategy Cache miss on write –Write Allocate Find a location (reject an entry if necessary) Assign cache location and write value Write through back to RAM Or rely on copy back later –Write Around (or Write no-Allocate) Just write to RAM Subsequent read will cache it if necessary COMP25212 – Lecture 2 12

Overall Cache Write Strategy Fastest is Write Allocate / Copy Back But cache & main memory are not ‘coherent’ (i.e. can have different values) Does this matter? –Autonomous I/O devices –Exceptions –Multi-processors May need special handling (later) COMP25212 – Lecture 2 13

Cache Data Width Each cache line is :- If address is e.g. 32 bit byte address, logically data is one byte. In practice we hold more data per address –e.g a 4 byte (32 bit) word –stored address needed is then only 30 bits In real practice (later) we would hold even more data per address. COMP25212 – Lecture 2 14 Address Data

Real Cache Implementations Fully associative cache is logically what we want. But the storing of full (32 bit ?) addresses and the hardware comparison is expensive (and uses a lot of power) It is possible to achieve most of the functionality using ordinary small and fast (usually static) RAM memory in special ways. Few real processor caches are fully associative. COMP25212 – Lecture 2 15

Direct Mapped Cache (1) Uses standard RAM to implement the functionality of an ideal cache (with some limitations) Usually uses static RAM – although more expensive than DRAM, is faster. But overall is considerably cheaper to implement than fully associative. COMP25212 – Lecture 2 16

Direct Mapped Cache (2) COMP25212 – Lecture 2 17 tag index Tag RAM Data RAM compare hit/miss data address Address divided into two parts index (e.g 15 bits) used to address (32k) RAMs directly Rest of addr, tag is stored and compared with incoming tag. If same, data is read

Direct Mapped Cache (3) Is really just a hash table implemented in hardware. Index / Tag could be selected from address in many ways. Most other aspects of operation are identical to fully associative. Except for replacement policy. COMP25212 – Lecture 2 18

Direct Mapped Replacement New incoming tag/data can only be stored in one place - dictated by index Using LSBs as index exploits principle of spatial locality to minimize displacing recently used data. Much cheaper than associative (and faster?) but lower ‘hit rate’ (due to inflexible replacement strategy). COMP25212 – Lecture 2 19

Set Associative Caches (1) A compromise! A set associative cache is simply a small number (e.g. 2 or 4 ) of direct mapped caches operating in parallel If one matches, we have a hit and select the appropriate data. COMP25212 – Lecture 2 20

E.g. 2 Way Set Associative COMP25212 – Lecture 2 21

Set Associative Caches (2) Now replacement strategy can be a bit more flexible E.g. in a 4 way, we can choose any one of 4 - using LRU, cyclic etc. Now (e.g.) 4 entries all with the same index but different tag can be stored. Less ‘interference’ than direct mapped. COMP25212 – Lecture 2 22

Set Associative Caches (3) Hit rate gets better with increasing number of ways. Obviously higher no. of ways gets more expensive In fact N location fully associative cache is the same as single location N way set associative cache! COMP25212 – Lecture 2 23