COMP25212 - SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

Slides:



Advertisements
Similar presentations
Lecture 12 Reduce Miss Penalty and Hit Time
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Heng Sovannarith
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture Lecture 3 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
COMP SYSTEM ARCHITECTURE CACHES IN SYSTEMS Sergio Davies Feb/Mar 2014COMP25212 – Lecture 4.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSE 351 Section 9 3/1/12.
ECE232: Hardware Organization and Design
CAM Content Addressable Memory
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Cache Memories September 30, 2008
Adapted from slides by Sally McKee Cornell University
CS 704 Advanced Computer Architecture
Part V Memory System Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CS 3410, Spring 2014 Computer Science Cornell University
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3

Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D caches –Multiple Level Caches COMP25212 cache 3Feb/Mar 2014

Cache Control Bits We have, so far, ignored detail of cache initialization. At some time it must start empty. We need a valid bit for each entry to indicate meaningful data. We also need a ‘dirty’ bit if we are using ‘Write Back’ rather than ‘Write Through’ COMP25212 cache 3 3 Feb/Mar 2014

Exploiting spatial locality Storing and comparing the address or tag (part of address) is expensive. So far we have assumed that each address relates to a single data item (byte or word). We can use a wider cache ‘line’ and store more data per address/tag Spatial locality suggests we will make use of it (e.g. series of instructions) COMP25212 cache 3 4 Feb/Mar 2014

COMP25212 cache 3 5 tag index Addr Tag RAM compare hit address multiplexer data Word 0 Word 1 Feb/Mar 2014 Direct Mapped Cache 2 words per line

Multiple Word Line Size Now bottom bits of address are used to select which word Can be used with fully or set associative as well. Typical line size 16 or 32 bytes (e.g. 4 or 8 32 bit words) – i.e bit words in 64 bit architectures. Transfer from RAM in ‘blocks’ which are usually equal to line size – (use burst mode memory access) COMP25212 cache 3 6 Feb/Mar 2014

The Effect of Line Size Spatial locality means that if we access data then data close by will often be accessed So a larger line size means we get that nearby data in the cache and avoid misses But if line size is too big –Data may not be used –Displaces other possibly useful data –Larger RAM accesses take longer COMP25212 cache 3 7 Feb/Mar 2014

The Effect of Line Size (typical characteristic) COMP25212 cache 3 8 Feb/Mar 2014

Separate Instruction & Data (I&D) Caches Instruction fetch every instruction Data fetch every 3 instructions Usually working in separate address areas Access patterns different - Inst. Access in serial sections - can use lower associativity Better utilization - use separate caches Called ‘Harvard’ architecture COMP25212 cache 3 9 Feb/Mar 2014

Split Level 1 (L1) Caches COMP25212 cache 3 10 L1 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data Feb/Mar 2014

Multiple Level Caches (1) Bigger caches have lower miss rates As chips get bigger we could build bigger caches to perform better But bigger caches always run slower L1 cache needs to run at processor speed Instead put another cache (Level 2) between L1 and RAM COMP25212 cache 3 11 Feb/Mar 2014

Multiple Level Caches COMP25212 cache 3 12 L1 Data Cache CPU RAM Memory On-chip L1 Inst Cache fetch data L2 Feb/Mar 2014

Multiple Level Caches (2) L2 cache is typically 16x bigger than L1 L2 cache is typically 4x slower than L1 (but still 10x faster than RAM) If only 1 in 50 accesses miss in L1 and similar in L2 - only have to cover very small number of RAM accesses Not quite that easy but works well. COMP25212 cache 3 13 Feb/Mar 2014

Multiple Level Caches (3) Vital to performance of modern processors L2 is usually shared by L1I and L1D Replacement strategy and write policy obviously gets more complex COMP25212 cache 3 14 Feb/Mar 2014

e.g. Xeon E Available 2013Q2 4-core, 8-thread Core private caches –32Kb L1 I-cache –32Kb L1 D-cache –256Kb L2 cache (I+D) 8Mb L3 cache (shared I+D by all cores) 2 channel DDR3 COMP25212 cache 3Feb/Mar 2014

COMP25212 cache 3 16 Addr Tag RAM compare hit Address (32 bits) multiplexer data Feb/Mar 2014 Cache address splitting TagIndexWord idAlign Word 0 Word 7... ? bits 8MB Cache

COMP25212 cache 3 17 Addr Tag RAM compare hit Address (32 bits) multiplexer data Feb/Mar 2014 Cache address splitting TagIndexWord idAlign 8MB Cache Word 0 Word 7... ? bits 2 bits 3 bits 256KWords Per block 18 bits 9 bits

Cache Example Assume CPU with simple L1 cache only –L1 cache 98% hit rate –L1 access time = 1 CPU cycle –RAM access = 50 cycles –Suggestion: consider 100 accesses What is effective overall memory access time? Assume CPU makes a RAM access (fetch) every cycle COMP25212 cache 3 18 Feb/Mar 2014

Cache Example (solution) For every 100 accesses –98 hit in cache : 98 * 1 cycle = 98 cycles –2 miss in cache, go to RAM : 2 * (1+50) cycles = 102 cycles –Total : = 200 cycles Average access time = 200/100 = 2 cycles CPU on average will only run at ½ speed COMP25212 cache 3 19 Feb/Mar 2014

Two Level Cache Now assume L2 cache between L1 & RAM –Access time = 4 cycles –Hit rate = 90% L2 : every 100 accesses take – ( 90 * 4) + 10 * (4 + 50) = 900 –Average access = 9 cycles COMP25212 cache 3 20 Feb/Mar 2014

Two Level Cache Example Assume CPU with L1 and L2 cache –L1: 98% hit, 1 CPU cycle –L2 + RAM: average 9 CPU cycles –Suggestion: consider 100 accesses What is effective overall memory access time? Assume CPU makes a RAM access (fetch) every cycle COMP25212 cache 3 21 Feb/Mar 2014

Two Level Cache Example (solution) Back to L1 For every 100 accesses time taken (1 * 98) + 2 (1 + 9) = 118 cycles Average access = 1.18 cycles Now approx 85% of potential full speed COMP25212 cache 3 22 Feb/Mar 2014

Alternatively… 1000 accesses –980 will hit in L1 (98%) –18 will hit in L2 (90% of 20) –2 will go to main memory So access time is * (1+4) + 2 * (1+4+50) = 1180 Average = 1180/1000 = 1.18 COMP25212 cache 3 23 Feb/Mar 2014