Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Advertisements

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

Performance of Cache Memory

Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

Cache Organization and Performance Evaluation Vittorio Zaccaria.

Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.

CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Computing Systems Memory Hierarchy.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

CMPE 421 Parallel Computer Architecture

Five Components of a Computer

Lecture 6. Cache #2 Prof. Taeweon Suh Computer Science & Engineering Korea University ECM534 Advanced Computer Architecture.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 1 Memory Technology Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 11: Memory Hierarchy.

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.

نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy.

Memory Hierarchy 國立清華大學資訊工程學系黃婷婷教授. Outline  Memory hierarchy  The basics of caches Direct-mapped cache Address sub-division Cache hit and miss Memory.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

COSC3330 Computer Architecture

Cache Memory and Performance

FLIPPED CLASSROOM ACTIVITY CONSTRUCTOR – USING EXISTING CONTENT

Consider a Direct Mapped Cache with 4 word blocks

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Morgan Kaufmann Publishers

ECE 445 – Computer Organization

Exploiting Memory Hierarchy Chapter 7

ECE 445 – Computer Organization

Set-Associative Cache

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Presentation transcript:

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014 –Grades so far 1

Direct Mapped Cache Location determined by address Direct mapped: only one choice –(Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits 2

Address Subdivision 3

Measuring Cache Performance Components of CPU time –Program execution cycles Includes cache hit time –Memory stall cycles Mainly from cache misses With simplifying assumptions: 4

Cache Performance Example Given –I-cache miss rate = 2% –D-cache miss rate = 4% –Miss penalty = 100 cycles –Base CPI (ideal cache) = 2 –Load & stores are 36% of instructions Miss cycles per instruction –I-cache: 0.02 × 100 = 2 –D-cache: 0.36 × 0.04 × 100 = 1.44 Actual CPI = = 5.44 –Ideal CPU is 5.44/2 =2.72 times faster 5

Average Access Time Hit time is also important for performance Average memory access time (AMAT) –AMAT = Hit time + Miss rate × Miss penalty Example –CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% –AMAT = × 20 = 2ns 2 cycles per instruction 6

Fully associative –Allow a given block to go in any cache entry –Requires all entries to be searched at once –Comparator per entry (expensive) n-way set associative –Each set contains n entries –Block number determines which set (Block number) modulo (#Sets in cache) –Search all entries in a given set at once –n comparators (less expensive) Associative Caches 7

Associative Cache Example 8 The location of the block with address ‘12’ varies depending on the type of the cache memory and degree of associativity. In a direct mapped of size 8 blocks, the memory address ‘12’ is associated with the block address ‘4’ in the cache memory because 12 = (1100) 2 Thus (12 modulo 8) = 4 = (100) 2 is used to find location in the cache. In a 2-way set associative, there will be 4 sets. Thus, the corresponding location is cash is the block number (12 modulo 4) = 0. In a full associative scenario, the memory block for block address 12 can appear in any of the eight cache blocks.

Spectrum of Associativity For a cache with 8 entries 9

Associativity Example Compare 4-block caches –Direct mapped, 2-way set associative, fully associative –Block access sequence: 0, 8, 0, 6, 8 Direct mapped Block address Cache index Hit/missCache content after access missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6] 10

Associativity Example 2-way set associative Block address Cache index Hit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6] Fully associative Block address Hit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6] 11

How Much Associativity Increased associativity decreases miss rate –But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 –1-way: 10.3% –2-way: 8.6% –4-way: 8.3% –8-way: 8.1% 12 Why this is the case?

Set Associative Cache Organization 13