Prof. Hakim Weatherspoon

Slides:

Advertisements

Similar presentations

Lecture 12 Reduce Miss Penalty and Hit Time

Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.2 (writes), 5.3, 5.5.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches 2 P & H Chapter 5.2 (writes), 5.3, 5.5.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter

Virtual Memory 1 Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P & H Chapter 5.4 (up to TLBs)

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Deniz Altinbuken CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: (except writes) Caches and Memory.

CSCI206 - Computer Organization & Programming

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

Main Memory Cache Architectures

Cache Memory and Performance

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

Associativity in Caches Lecture 25

Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

How will execution time grow with SIZE?

Basic Performance Parameters in Computer Architecture:

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

CSCI206 - Computer Organization & Programming

Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia

Morgan Kaufmann Publishers Memory & Cache

William Stallings Computer Organization and Architecture 7th Edition

ECE 445 – Computer Organization

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013

Caches 2 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science

Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011

Lecture 21: Memory Hierarchy

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012

Performance metrics for caches

Performance metrics for caches

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013

CDA 5155 Caches.

Adapted from slides by Sally McKee Cornell University

Han Wang CS 3410, Spring 2012 Computer Science Cornell University

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Performance metrics for caches

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Main Memory Cache Architectures

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate

CS 3410, Spring 2014 Computer Science Cornell University

Lecture 21: Memory Hierarchy

Performance metrics for caches

Cache - Optimization.

Performance metrics for caches

10/18: Lecture Topics Using spatial locality

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Prof. Hakim Weatherspoon Caches 3 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.15; Also, 5.13 & 5.17

Overview Writing to caches: policies, performance Cache tradeoffs and performance

What about Stores? Where should you write the result of a store? If that memory location is in the cache? Send it to the cache Should we also send it to memory right away? (write-through policy) Wait until we evict the block (write-back policy) If it is not in the cache? Allocate the line (put it in the cache)? (write allocate policy) Write it directly to memory without allocation? (no write allocate policy)

Cache Write Policies Q: How to write data? Memory DRAM CPU addr Cache SRAM data If data is already in the cache… No-Write writes invalidate the cache and go directly to memory Write-Through writes go to main memory and cache Write-Back CPU writes only to cache cache writes to main memory later (when block is evicted)

Write Allocation Policies Q: How to write data? Memory DRAM CPU addr Cache SRAM data If data is not in the cache… Write-Allocate allocate a cache line for new data (and maybe write-through) No-Write-Allocate ignore cache, just go to main memory

No write 2-way set associative cache Contacts 2-way set associative cache 8 sets, use first letter to index set Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 123-456-7890 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 ABC DEF GHI JKL MNO PQRS TUV WXYZ Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Baker, J. Baker, S. Dunn, A. Foster, D. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Henry, J. Isaacs, M. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

Write through 2-way set associative cache Contacts 2-way set associative cache 8 sets, use first letter to index set Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 123-456-7890 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 ABC DEF GHI JKL MNO PQRS TUV WXYZ Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Baker, J. Baker, S. Dunn, A. Foster, D. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Henry, J. Isaacs, M. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

Write back 2-way set associative cache Contacts 2-way set associative cache 8 sets, use first letter to index set Use the initial to offset! Baker, J. 111-111-1111 Baker, S. 222-222-2222 Dunn, A. 333-333-3333 Foster, D. 444-444-4444 Gill, D. 555-555-5555 Harris, F. 666-666-6666 Henry, J. 777-777-7777 Isaacs, M. 888-888-8888 Mann, B. 111-111-1119 Moore, F. 222-222-2229 Powell, C. 333-333-3339 Sam, J. 444-444-4449 Taylor, B. 555-555-5559 Taylor, O. 666-666-6669 Wright, T. 777-777-7779 Zhang, W. 888-888-8889 Speed Dial 2 3 4 5 6 7 8 9 ABC DEF GHI JKL MNO PQRS TUV WXYZ Baker, J. Baker, S. Dunn, A. Foster, D. Gill, D. Harris, F. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Baker, J. Baker, S. Dunn, A. Foster, D. Mann, B. Moore, F. Powell, C. Sam, J. Taylor, B. Taylor, O. Wright, T. Zhang, W. Henry, J. Isaacs, M. 4 bytes each, 8 bytes per line, show tags Block Size: 2 Contact Numbers

Next Goal How does a write-through cache work? Assume write-allocate

Handling Stores (Write-Through) Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory Fully Associative Cache 2 cache lines 2 word block 4 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Assume write-allocate policy 78 29 120 123 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] V tag data 71 150 162 173 18 21 33 $0 $1 $2 $3 28 19 Misses: 0 Hits: 0 200 210 225

Write-Through (REF 1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] 71 150 162 173 18 21 33 $0 $1 $2 $3 28 19 Misses: 0 Hits: 0 200 210 225

Write-Through (REF 1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00001 29 block offset 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M 71 1 0000 78 150 29 162 lru 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 1 Hits: 0 200 210 225

Write-Through (REF 2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M 71 1 0000 78 150 29 162 lru 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 1 Hits: 0 200 210 225

Write-Through (REF 2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00111 29 120 block offset 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M 71 1 0000 78 150 29 162 lru 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 0 173 200 210 225

Write-Through (REF 3) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M 71 lru 1 0000 78 150 29 162 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 0 173 200 210 225

Write-Through (REF 3) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 Addr: 00000 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0000 173 150 29 162 lru 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 1 173 200 210 225

Write-Through (REF 4) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 Addr: 00101 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0000 173 150 29 162 lru 1 0010 162 71 173 150 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 1 173 200 210 225

Write-Through (REF 4) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 71 lru 1 0000 173 29 150 29 162 1 0010 71 173 29 150 150 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 3 Hits: 1 173 200 210 225

Write-Through (REF 5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 Addr: 01010 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 lru 1 0101 173 29 29 162 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 3 Hits: 1 173 200 210 225

Write-Through (REF 5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 29 28 162 lru 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 1 33 200 210 225

Write-Through (REF 6) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 Addr: 00101 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 29 29 28 162 lru 1 0010 71 173 29 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 1 33 200 210 225

Write-Through (REF 6) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 29 29 28 162 lru 1 0010 71 173 29 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 2 33 200 210 225

Write-Through (REF 7) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 Addr: 01011 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 29 29 28 162 lru 1 0010 71 173 29 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 2 33 200 210 225

Write-Through (REF 7) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 $3 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 120 123 V tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 71 1 0101 29 33 29 29 28 162 lru 1 0010 71 173 29 29 18 21 33 29 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 3 33 200 210 225

How Many Memory References? Write-through performance Each miss (read or write) reads a block from mem 4 misses  8 mem reads Each store writes an item to mem 4 mem writes Evictions don’t need to write to mem no need for dirty bit

Write-Through (REF 8,9) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] 120 M H 123 V tag data 71 1 0101 29 29 29 29 28 162 lru 1 0010 71 173 29 29 29 18 21 29 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 3 33 200 210 225

Write-Through (REF 8,9) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 $0 $1 $2 Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 173 29 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] 120 M H 123 V tag data 71 1 0101 29 29 29 29 28 162 lru 1 0010 71 173 29 29 29 18 21 29 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 5 33 200 210 225

Summary: Write Through Write-through policy with write allocate Cache miss: read entire block from memory Write: write only updated item to memory Eviction: no need to write to memory

Next Goal: Write-Through vs. Write-Back Can we also design the cache NOT to write all stores immediately to memory? Keep the most current copy in cache, and update memory when that data is evicted (write-back policy) Do we need to write-back all evicted lines? No, only blocks that have been stored into (written)

Write-Back Meta-Data V = 1 means the line has valid data Tag Byte 1 Byte 2 … Byte N V = 1 means the line has valid data D = 1 means the bytes are newer than main memory When allocating line: Set V = 1, D = 0, fill in Tag and Data When writing line: Set D = 1 When evicting line: If D = 0: just set V = 0 If D = 1: write-back Data, then set D = 0, V = 0

Write-back Example Example: How does a write-back cache work? Assume write-allocate

Handling Stores (Write-Back) Using byte addresses in this example! Addr Bus = 5 bits Processor Cache Memory Fully Associative Cache 2 cache lines 2 word block 3 bit tag field 1 bit block offset field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Assume write-allocate policy 78 29 120 123 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] V d tag data 71 150 162 173 18 21 33 $0 $1 $2 $3 28 19 Misses: 0 Hits: 0 200 210 225

Write-Back (REF 1) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] 71 150 162 173 18 21 33 $0 $1 $2 $3 28 19 Misses: 0 Hits: 0 200 210 225

Write-Back (REF 1) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00001 29 block offset 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M 71 1 0000 78 150 29 162 lru 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 1 Hits: 0 200 210 225

Write-Back (REF 1) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M 71 1 0000 78 150 29 162 lru 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 1 Hits: 0 200 210 225

Write-Back (REF 2) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M 71 1 0000 78 150 29 162 lru 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 1 Hits: 0 200 210 225

Write-Back (REF 2) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00111 29 120 block offset 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M 71 lru 1 0000 78 150 29 162 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 0 173 200 210 225

Write-Back (REF 3) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M 71 lru 1 0000 78 150 29 162 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 0 173 200 210 225

Write-Back (REF 3) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00000 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 1 0000 173 150 29 162 lru 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 1 173 200 210 225

Write-Back (REF 4) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 1 0000 173 150 29 162 lru 1 0011 162 173 173 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 2 Hits: 1 173 200 210 225

Write-Back (REF 4) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00101 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 lru 1 1 0000 173 150 29 162 1 1 0010 71 173 150 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 3 Hits: 1 173 200 210 225

Write-Back (REF 5) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 01010 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 lru 1 1 0000 173 150 29 162 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 3 Hits: 1 173 200 210 225

Write-Back (REF 5) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 173 Addr: 01010 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 71 lru 1 1 0000 173 150 29 162 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 3 Hits: 1 173 200 210 225

Write-Back (REF 5) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 01010 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 150 28 162 lru 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 1 33 200 210 225

Write-Back (REF 6) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 Addr: 00101 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 1 0101 33 150 28 162 lru 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 1 33 200 210 225

Write-Back (REF 6) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 lru 1 0101 33 150 28 162 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 2 33 200 210 225

Write-Back (REF 7) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] M H 71 lru 1 0101 33 150 28 162 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 2 33 200 210 225

Write-Back (REF 7) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 123 V d tag data LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 71 1 1 0101 29 150 28 162 lru 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 3 33 200 210 225

How Many Memory References? Write-back performance Each miss (read or write) reads a block from mem 4 misses  8 mem reads Some evictions write a block to mem 1 dirty eviction  2 mem writes (+ 2 dirty evictions later  +4 mem writes)

Write-Back (REF 8,9) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 123 V d tag data 71 1 1 0101 29 150 28 162 lru 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 3 33 200 210 225

Write-Back (REF 8,9) Processor Cache Memory 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 78 29 120 LB $1  M[ 1 ] LB $2  M[ 7 ] SB $2  M[ 0 ] SB $1  M[ 5 ] LB $2  M[ 10 ] SB $1  M[ 10 ] M H 123 V d tag data 71 1 1 0101 29 150 28 162 lru 1 1 0010 71 173 29 18 21 33 $0 $1 $2 $3 28 29 19 Misses: 4 Hits: 5 33 200 210 225

How Many Memory References? Write-back performance Each miss (read or write) reads a block from mem 4 misses  8 mem reads Some evictions write a block to mem 1 dirty eviction  2 mem writes (+ 2 dirty evictions later  +4 mem writes) By comparison write-through was Reads: eight words Writes: 4/6/8/10/12/… etc words

So is write back just better? What are other performance tradeoffs between write-through and write-back? How can we further reduce penalty for cost of writes to memory?

Performance Tradeoffs Q: Hit time: write-through vs. write-back? A: Write-through slower on writes Q: Miss penalty: write-through vs. write-back? A: Write-back slower on evictions

Performance: An Example Performance: Write-back versus Write-through Assume: large associative cache, 16-byte lines for (i=1; i<n; i++) A[0] += A[i]; for (i=0; i<n; i++) B[i] = A[i] N words Write-through: n/16 reads n writes Write-back: n/16 reads 1 write Q: direct, or associative? A: associative definitely (w/ LRU or LFU, etc.) A) reads: miss every W’th A[i] A) writes: n for write-through, 1 eviction for write-back B) reads: miss every W’th A[i] or B[i] B) writes: n for write-through, n for write-back Write-through: 2 x n/16 reads n writes Write-back: 2 x n/16 reads n write

Write Buffering Q: Writes to main memory are slow! A: Use a write-back buffer A small queue holding dirty lines Add to end upon eviction Remove from front upon completion Q: When does it help? A: short bursts of writes (but not sustained writes) A: fast eviction reduces miss penalty

Write-through vs. Write-back Write-through is slower But simpler (memory always consistent) Write-back is almost always faster write-back buffer hides large eviction cost But what about multiple cores with separate caches but sharing memory? Write-back requires a cache coherency protocol Inconsistent views of memory Need to “snoop” in each other’s caches Extremely complex protocols, very hard to get right

Cache-coherency Cache coherency protocol Q: Multiple readers and writers? A: Potentially inconsistent views of memory A’ A CPU CPU CPU CPU A L1 L1 A L1 L1 L1 L1 L1 L1 A L2 L2 disk net A Mem Cache coherency protocol May need to snoop on other CPU’s cache activity Invalidate cache line when other CPU writes Flush write-back caches before other CPU reads Or the reverse: Before writing/reading… Extremely complex protocols, very hard to get right

Summary: Write Through Write-through policy with write allocate Cache miss: read entire block from memory Write: write only updated item to memory Eviction: no need to write to memory Slower, but cleaner Write-back policy with write allocate **But may need to write dirty cacheline first** Write: nothing to memory Eviction: have to write to memory, entire cacheline because don’t know what is dirty (only 1 dirty bit) Faster, but complicated with multicore

Next Goal Performance: What is the average memory access time (AMAT) for a cache? AMAT = %hit x hit time + % miss x miss time

Cache Performance Example Average Memory Access Time (AMAT) Cache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mapped Data cost: 3 cycle per word access Lookup cost: 2 cycle Mem (DRAM): 4GB Data cost: 50 cycle for first word, plus 3 cycles per subsequent word 16 words (i.e. 64 / 4 = 16) Hit rate 90% .90 *5 + .10*(2+50+16*3) = 4.5 + 10.0 = 14.5 cycles on average Hit rate 95 %  9.5 cycles on average

Cache Performance Example Average Memory Access Time (AMAT) Cache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mapped Data cost: 3 cycle per word access Lookup cost: 2 cycle Mem (DRAM): 4GB Data cost: 50 cycle for first word, plus 3 cycles per subsequent word 16 words (i.e. 64 / 4 = 16) AMAT = %hit x hit time + % miss x miss time Hit time = 5 cycles Miss time = hit time + 50 (first word) + 15 x 3 (words) = 100 cycles If %hit = 90%, then AMAT = .9 x 5 + .1 x 100 = 14.5 cycles Hit rate 90% .90 *5 + .10*(2+50+16*3) = 4.5 + 10.0 = 14.5 cycles on average Hit rate 95 %  9.5 cycles on average

Multi Level Caching Cache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mapped Hit time: 5 cycles L2 cache: bigger Hit time = 20 cycles Mem (DRAM): 4GB Hit rate: 90% in L1, 90% in L2 Often: L1 fast and direct mapped, L2 bigger and higher associativity AMAT = %hit x hit time + % miss x miss time AMAT = .9 x 5 + .1 (.9 x 20 + .1 x 120) = 4.5 + .1 (18 + 12) = 7.5

Next Goal How to decide on Cache Organization Q: How to decide block size? A: Try it and see But: depends on cache size, workload, associativity, … Experimental approach!

Experimental Results Top to bottom: bigger cache means fewer misses. For a given total cache size, larger block sizes mean…. fewer lines so fewer tags (and smaller tags for associative caches) so less overhead and fewer cold misses (within-block “prefetching”) But fewer blocks available (for scattered accesses!) so more conflicts and miss penalty (time to fetch block) is larger

Tradeoffs For a given total cache size, larger block sizes mean…. fewer lines so fewer tags, less overhead and fewer cold misses (within-block “prefetching”) But also… fewer blocks available (for scattered accesses!) so more conflicts and larger miss penalty (time to fetch block) Same for other parameters: experimental mostly

Associativity

Other Designs Multilevel caches

Performance Summary Cache design a very complex problem: Average memory access time (AMAT) depends on cache architecture and size access time for hit, miss penalty, miss rate Cache design a very complex problem: Cache size, block size (aka line size) Number of ways of set-associativity (1, N, ) Eviction policy Number of levels of caching, parameters for each Separate I-cache from D-cache, or Unified cache Prefetching policies / instructions Write policy

Cache Conscious Programming // H = 12, W = 10 int A[H][W]; for(x=0; x < W; x++) for(y=0; y < H; y++) sum += A[y][x]; 1 2 3 4 5 6 7 8 9 10 11 12 Every access is a cache miss! (unless entire matrix can fit in cache)

Cache Conscious Programming // H = 12, W = 10 int A[H][W]; for(y=0; y < H; y++) for(x=0; x < W; x++) sum += A[y][x]; 1 2 3 4 5 6 7 8 9 10 11 12 13 … Block size = 4  75% hit rate Block size = 8  87.5% hit rate Block size = 16  93.75% hit rate And you can easily prefetch to warm the cache

A Real Example Lenovo Yoga 2 Pro laptop Dual core > dmidecode -t cache Cache Information Socket Designation: L1 Cache Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 32 kB Maximum Size: 32 kB Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Instruction Associativity: 8-way Set-associative Socket Designation: L2 Cache Configuration: Enabled, Not Socketed, Level 2 Installed Size: 256 kB Maximum Size: 256 kB System Type: Unified Lenovo Yoga 2 Pro laptop Dual core Intel i7-4500 CPU @ 2.4 GHz (purchased in 2014) Cache Information Socket Designation: L3 Cache Configuration: Enabled, Not Socketed, Level 3 Operational Mode: Write Back Location: Internal Installed Size: 4096 kB Maximum Size: 4096 kB Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Unified Associativity: 16-way Set-associative

A Real Example Dual-core 3.16GHz Intel (purchased in 2011) > dmidecode -t cache Cache Information Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Installed Size: 128 KB Error Correction Type: None Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Varies With Memory Address Installed Size: 6144 KB Error Correction Type: Single-bit ECC > cd /sys/devices/system/cpu/cpu0; grep cache/*/* cache/index0/level:1 cache/index0/type:Data cache/index0/ways_of_associativity:8 cache/index0/number_of_sets:64 cache/index0/coherency_line_size:64 cache/index0/size:32K cache/index1/level:1 cache/index1/type:Instruction cache/index1/ways_of_associativity:8 cache/index1/number_of_sets:64 cache/index1/coherency_line_size:64 cache/index1/size:32K cache/index2/level:2 cache/index2/type:Unified cache/index2/shared_cpu_list:0-1 cache/index2/ways_of_associativity:24 cache/index2/number_of_sets:4096 cache/index2/coherency_line_size:64 cache/index2/size:6144K Dual-core 3.16GHz Intel (purchased in 2011)

A Real Example Dual 32K L1 Instruction caches Dual 32K L1 Data caches Dual-core 3.16GHz Intel (purchased in 2009) Dual 32K L1 Instruction caches 8-way set associative 64 sets 64 byte line size Dual 32K L1 Data caches Same as above Single 6M L2 Unified cache 24-way set associative (!!!) 4096 sets 4GB Main memory 1TB Disk

By the end of the cache lectures…

Summary Memory performance matters! Design space is huge often more than CPU performance … because it is the bottleneck, and not improving much … because most programs move a LOT of data Design space is huge Gambling against program behavior Cuts across all layers: users  programs  os  hardware Multi-core / Multi-Processor is complicated Inconsistent views of memory Extremely complex protocols, very hard to get right

Have a great Spring Break!!