Problem 6.15 1024-byte direct-mapped cache 16-byte blocks sizeof(int) = 4 struct algae_position{ int x; int y; }; Struct algae_position grid[16][16]; grid.

Slides:



Advertisements
Similar presentations
Two-Dimensional Arrays Chapter What is a two-dimensional array? A two-dimensional array has “rows” and “columns,” and can be thought of as a series.
Advertisements

Memory Operation and Performance To understand the memory architecture so that you could write programs that could take the advantages and make the programs.
Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.
Faculty of Computer Science © 2006 CMPUT 229 Cache Performance Analysis Hitting for performance.
Simulations of Memory Hierarchy LAB 2: CACHE LAB.
Chapter 6 The Memory Hierarchy
Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
Arrays. Memory organization Table at right shows 16 bytes, each consisting of 8 bits Each byte has an address, shown in the column to the left
Carnegie Mellon 1 Cache Memories : Introduction to Computer Systems 10 th Lecture, Sep. 23, Instructors: Randy Bryant and Dave O’Hallaron.
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
How caches take advantage of Temporal locality
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Cache Memories May 5, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance EECS213.
EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.
Cache Organization Topics Background Simple examples.
Caches Oct. 22, 1998 Topics Memory Hierarchy
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance CS213.
CSCE 212 Quiz 11 – 4/13/11 Given a direct-mapped cache with 8 one-word blocks and the following 32-bit memory address references: 1 2, ,
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Systems I Locality and Caching
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
C Static Arrays Pepper. What is an array? Memory locations – same type – next to each other (contiguous) – Same name – Indexed by position number of type.
Copyright © 2013, SAS Institute Inc. All rights reserved. MEMORY CACHE – PERFORMANCE CONSIDERATIONS CLAIRE CATES DISTINGUISHED DEVELOPER
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O’Hallaron.
Computer Architecture Lecture 26 Fasih ur Rehman.
– 1 – , F’02 Caching in a Memory Hierarchy Larger, slower, cheaper storage device at level k+1 is partitioned into blocks.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Data Structure CS 322. What is an array? Initializing arrays Accessing the values of an array Multidimensional arrays LAB#1 : Arrays.
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 31/01/2014 Compilers for Embedded Systems.
Code and Caches 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with permission.
1 Seoul National University Cache Memories. 2 Seoul National University Cache Memories Cache memory organization and operation Performance impact of caches.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Cache Memories February 28, 2002 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Reading:
Systems I Cache Organization
1 Cache Memories. 2 Today Cache memory organization and operation Performance impact of caches  The memory mountain  Rearranging loops to improve spatial.
Cache Memories Topics Generic cache-memory organization Direct-mapped caches Set-associative caches Impact of caches on performance CS 105 Tour of the.
Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 2007/01/08 with slides by CMU
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Systems I.
1 Writing Cache Friendly Code Make the common case go fast  Focus on the inner loops of the core functions Minimize the misses in the inner loops  Repeated.
Vassar College 1 Jason Waterman, CMPU 224: Computer Organization, Fall 2015 Cache Memories CMPU 224: Computer Organization Nov 19 th Fall 2015.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Cache Memories CENG331 - Computer Organization Instructors:
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
1 Memory Systems Caching Lecture 24 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
CSCI206 - Computer Organization & Programming
CSE 351 Section 9 3/1/12.
Tutorial Nine Cache CompSci Semester One 2016.
Introduction To Computer Systems
Associativity in Caches Lecture 25
Cache Memories CSE 238/2038/2138: Systems Programming
Two-Dimension Arrays Computer Programming 2.
The Hardware/Software Interface CSE351 Winter 2013
Section 7: Memory and Caches
CS 105 Tour of the Black Holes of Computing
Cache Miss Rate Computations
Exam Review of Cache Memories Dec 11, 2001
Cache Memories Topics Cache memory organization Direct mapped caches
Lecture 22: Cache Hierarchies, Memory
Help! How does cache work?
Recitation 6: Cache Access Patterns
2D Arrays Just another data structure Typically use a nested loop
Cache Memories Professor Hugh C. Lauer CS-2011, Machine Organization and Assembly Language (Slides include copyright materials from Computer Systems:
Computer Organization and Assembly Languages Yung-Yu Chuang 2006/01/05
Cache Memories.
Cache Memory and Performance
Cache Memory and Performance
Writing Cache Friendly Code
Presentation transcript:

Problem byte direct-mapped cache 16-byte blocks sizeof(int) = 4 struct algae_position{ int x; int y; }; Struct algae_position grid[16][16]; grid begins at memory address 0 Some code: int total_x = 0, total_y = 0; int i, j; for (i = 0; i < 16; i++){ for (j=0; j < 16; j++){ total_x += grid[i][j].x; } for (i = 0; i < 16; i++){ for (j=0; j < 16; j++){ total_y += grid[i][j].y; }

Problem 6.15 What is the total number of reads? What is the total number of reads that miss in the cache? What is the miss rate?

Problem 6.15 Total number of reads:

Problem 6.15 Total number of reads: –One read per iteration of the each loop. –16*16 = 256 Iterations in loop 1 –16*16 = 256 iterations in loop 2 – = 512 reads total

Problem 6.15 What is the total number of reads that miss the cache? –What does our cache look like?

Problem byte direct-mapped cache 16-byte blocks -There are 1024 bytes / (16 bytes/set) = 64 blocks -sizeof(int) = 4 struct algae_position{ int x; int y; }; –sizeof(algae_position) = 8; –Each block can hold two algae_position’s Struct algae_position grid[16][16]; grid begins at memory address 0

Problem 6.15 struct algae_position{ int x; int y; }; –sizeof(algae_position) = 8; –Each block can hold two algae_position’s Struct algae_position grid[16][16]; grid begins at memory address 0 –Access to grid[0][0] or grid[0][1] will cause both ( [0][0] and [0][1]) to be stored in block 0. –Access to grid[0][2] or grid[0][3] will cause both to be stored in block 1. –Etc…

Problem byte direct-mapped cache 16-byte blocks -There are 1024 bytes / (8 bytes/block) = 128 blocks struct algae_position{ int x; int y; }; –sizeof(algae_position) = 8; –Each block can hold two algae_position’s Struct algae_position grid[16][16]; grid begins at memory address 0 –Access to grid[0][0] or grid[0][1] will cause both ( [0][0] and [0][1]) to be stored in block 0. There are 64 blocks total, each block can hold 2 algae_position’s. –The first row of grid[][] will take up the first 8 blocks. –The cache will be full after we read the first 128 / 8 = 16 rows of grid[][]. –Any reads after that will evict previously stored items in the cache.

Problem 6.15 There are 128 blocks total, each block can hold 2 algae_position’s. –The first row of grid[][] will take up the first 8 blocks. –The cache will be full after we read the first 128/8 = 16 rows of grid[][]. –Any reads after that will evict previously stored items in the cache. This means that we can consider each loop separately since all the first 16 rows the second loop is accessing will have been evicted by the later reads (of the last 16 rows) of the first loop.

Problem 6.15 /* First loop */ for (i = 0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[i][j].x; } In the first loop: –Access grid[0][0].x -- Miss Both grid[0][0] and grid[0][1] are stored in block 0. –Access grid[0][1].x -- Hit –Access grid[0][2].x –- Miss –Access grid[0][3].x –- Hit – reads, 128 of them are misses.

Problem 6.15 /* Second loop */ for (i = 0; i < 16; i++){ for (j = 0; j < 16; j++){ total_y += grid[i][j].y; } In the first loop: –Access grid[0][0].y -- Miss Both grid[0][0] and grid[0][1] are stored in block 0. –Access grid[0][1].y -- Hit –Access grid[0][2].y –- Miss –Access grid[0][3].y –- Hit – reads, 128 of them are misses.

Problem 6.15 What is the total number of reads? –512 What is the total number of reads that miss in the cache? –128 (first loop) (second loop) = 256 What is the miss rate? –256 misses/ 512 reads = 50%

Problem 6.16 What if our loop looked like this? for (i =0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[j][i].x; total_y += grid[j][i].y; }

Problem 6.16 What if our loop looked like this? for (i =0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[j][i].x; total_y += grid[j][i].y; }

Problem 6.16 What if our loop looked like this? for (i =0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[j][i].x; total_y += grid[j][i].y; } We are going down column by column to sum the coordinates. The cache can only hold half of the elements in the array, so that means that a read to grid[8][0] will evict the block that was loaded when we read grid[0][0]. Since this block also contained grid[0][1], the first read of grid[0][1] will be a miss. Each iteration will have one hit and one miss. This means we have 256 hits and 256 misses (50% miss rate).

Problem 6.17 What if our loop looked like this? for (i =0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[i][j].x; total_y += grid[i][j].y; }

Problem 6.17 What if our loop looked like this? for (i =0; i < 16; i++){ for (j = 0; j < 16; j++){ total_x += grid[i][j].x; total_y += grid[i][j].y; } Stride-one reference pattern. We will read grid[0][0].x (cold miss), grid[0][0].y (hit), grid[0][1].x (hit), grid[0][1].y (hit). The next two iterations will have the same number of hits/misses. 25% of the 512 reads will be misses.