Caches and Blocking <TA Names> 26th February 2018.

Slides:



Advertisements
Similar presentations
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Advertisements

Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
How caches take advantage of Temporal locality
Caches The principle that states that if data is used, its neighbor will likely be used soon.
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
CDA 3103 Computer Organization Review Instructor: Hao Zheng Dept. Comp. Sci & Eng. USF.
Cache Lab Implementation and Blocking
Cache Lab Implementation and Blocking
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
Malloc & VM By sseshadr. Agenda Administration – Process lab code will be inked by Thursday (pick up in ECE hub) – Malloc due soon (Thursday, November.
Systems I Cache Organization
Carnegie Mellon Introduction to Computer Systems / Spring 2009 March 23, 2009 Virtual Memory.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 1.
Recitation 6 – 3/11/01 Outline Cache Organization Replacement Policies MESI Protocol –Cache coherency for multiprocessor systems Anusha
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
Recitation: C Review TA’s 2 Oct 2017.
CSCI206 - Computer Organization & Programming
Calling Conventions, Caching
Recitation 5: Attack Lab
CSE 351 Section 9 3/1/12.
Chapter 7 Text Input/Output Objectives
Introduction To Computer Systems
Recitation 6: C Review 30 Sept 2016.
Cache Memories CSE 238/2038/2138: Systems Programming
Replacement Policy Replacement policy:
Recitation: Attack Lab
How will execution time grow with SIZE?
The Hardware/Software Interface CSE351 Winter 2013
COSC 220 Computer Science II
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Consider a Direct Mapped Cache with 4 word blocks
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
Morgan Kaufmann Publishers
Recitation: C Review TA’s 19 Feb 2018.
Exam Review of Cache Memories Dec 11, 2001
Caches and Blocking 8 October 2018.
Lecture 21: Memory Hierarchy
Circular Buffers, Linked Lists
Lecture 21: Memory Hierarchy
Cache Memories September 30, 2008
Terminos españoles de ingeneria de computacion.
CSCI206 - Computer Organization & Programming
Recitation: Attack Lab
Lecture 08: Memory Hierarchy Cache Performance
Final Review CSE321 B.Ramamurthy 11/23/2018 B.Ramamurthy.
Lecture 22: Cache Hierarchies, Memory
Exam2 Review CSE113 B.Ramamurthy 12/4/2018 B.Ramamurthy.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Final Review CSE321 B.Ramamurthy 12/9/2018 B.Ramamurthy.
10/16: Lecture Topics Memory problem Memory Solution: Caches Locality
CSE 351: The Hardware/Software Interface
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Program Breakdown, Variables, Types, Control Flow, and Input/Output
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 11: Cache Hierarchies
CS-447– Computer Architecture Lecture 20 Cache Memories
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Some of the slides are adopted from David Patterson (UCB)
Cache Memory and Performance
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
10/18: Lecture Topics Using spatial locality
Recitation: Data Lab The TAs Jan 29, 2018.
Recitation 9: Processes, Signals, TSHLab
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Presentation transcript:

Caches and Blocking <TA Names> 26th February 2018

Agenda Caching Review Blocking to reduce cache misses Cache alignment

Reminders Cache Lab is due Thursday! Exam 1 is next week!! (Week of March 5th) Start doing practice problems. Come to the review session.

What Type of Locality? The following function exhibits which type of locality? Consider only array accesses. void who(int *arr, int size) { for (int i = 0; i < size-1; ++i) arr[i] = arr[i+1]; } A. Spatial B. Temporal C. Both A and B D. Neither A nor B ISOMORPHIC! Answer: C 4

What Type of Locality? The following function exhibits which type of locality? Consider only array accesses. void coo(int *arr, int size) { for (int i = size-2; i >= 0; --i) arr[i] = arr[i+1]; } A. Spatial B. Temporal C. Both A and B D. Neither A nor B Answer: C Show address stream, assuming array starts at 0x100 5

Interlude: terminology A direct-mapped cache only contains one line per set. This means E = 2e = 1. Memory 000 001 010 011 100 101 110 111 Cache (bytes) B0 B1 Cache (lines) L0 Cache (sets) S0

Interlude: terminology A fully associative cache has 1 set, and many lines for that one set. This means S = 2s = 1. Memory 000 001 010 011 100 101 110 111 Cache (bytes) B0 B1 Cache (lines) L0 Cache (sets) S0

Direct-Mapped Cache Example Assuming a 32-bit address (i.e. m=32): Assume the cache is direct-mapped, and each block stores 8 bytes, and there are 4 sets how many bits are used for tag (t), set index (s), and block offset (b). Set 0: Valid Tag Cache block Set 1: Valid Tag Cache block Set 2: Valid Tag Cache block Set 3: Valid Tag Cache block Answer: B

Direct-Mapped Cache Example Assuming a 32-bit address (i.e. m=32): Assume the cache is direct-mapped, and each block stores 8 bytes, and there are 4 sets how many bits are used for tag (t), set index (s), and block offset (b). t = 27, s = 2, b = 3 Set 0: Valid Tag Cache block Set 1: Valid Tag Cache block Set 2: Valid Tag Cache block Set 3: Valid Tag Cache block Answer: B

Which Set Is it? A. B. C. D. E. Answer: D (set 3) Which set may the address 0xFA1C be located in? 8 bytes per data block Set 0: Valid Tag Cache block E = 1 lines per set Set 1: Valid Tag Cache block Set # for 0xFA1C A. B. 1 C. 2 D. 3 E. More than one of the above Set 2: Valid Tag Cache block Set 3: Valid Tag Cache block 27 bits 2 bits 3 bits 31 Answer: D (set 3) 0xFA1C = 11111010000 11 100 10 Tag Set index Block offset

Cache Block Range A. B. C. D. E. Answer: D What range of addresses will be in the same block as address 0xFA1C? 8 bytes per data block Set 0: Valid Tag Cache block Addr. Range A. 0xFA1C B. 0xFA1C – 0xFA23 C. 0xFA1C – 0xFA1F D. 0xFA18 – 0xFA1F E. It depends on the access size (byte, word, etc) Set 1: Valid Tag Cache block Set 2: Valid Tag Cache block Set 3: Valid Tag Cache block 27 bits 2 bits 3 bits 31 Answer: D 11 Tag Set index Block offset

Cache Misses Accessed Bytes If N = 16, how many bytes does the loop access of A? Accessed Bytes A 4 B 16 C 64 D 256 int foo(int* a, int N) { int i, sum = 0; for(i = 0; i < N; i++) sum += a[i]; return sum; }

Cache Misses Two accesses per line, so already max of 16 misses. N still equals 16 Misses A B 8 C 12 D 14 E 16 int foo(int* a, int N) { int i, sum = 0; for(i = 0; i < N; i++) sum += a[i]; return sum; } Two accesses per line, so already max of 16 misses. i = 2, 5 are never evicted, so not misses on second call. That gives 14 total. 13

Very Hard Cache Problem We will use a direct-mapped cache with 2 sets, which each can hold up to 4 int’s. How can we copy A into B, shifted over by 1 position? The most efficient way? (Use temp!) A 1 2 3 4 5 6 7 B 1 2 3 4 5 6 7

temp 1 2 3 4 5 6 7 A 1 2 3 4 5 6 7 B 1 2 3 4 5 6 7 Number of misses:

Could’ve been 16 misses otherwise! temp 1 2 3 4 5 6 7 A 1 2 3 4 5 6 7 B 1 2 3 4 5 6 7 Number of misses: Could’ve been 16 misses otherwise! We would save even more if the block size were larger, or if temp were already cached

If You Get Stuck Read it again after doing ~25% of the lab Please read the writeup Read it again after doing ~25% of the lab CS:APP Chapter 6 View lecture notes and course FAQ at http://www.cs.cmu.edu/~213 Office hours Sunday through Friday (Generally) 5:00-9:00pm in WeH 5207 Post a private question on Piazza man malloc, man gdb, gdb's help command http://csapp.cs.cmu.edu/public/waside/waside-blocking.pdf

Appendix: C Programming Style Properly document your code Header comments, overall operation of large blocks, any tricky bits Write robust code – check error and failure conditions Write modular code Use interfaces for data structures, e.g. create/insert/remove/free functions for a linked list No magic numbers – use #define Formatting 80 characters per line Consistent braces and whitespace No memory or file descriptor leaks