Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics.
Simulations of Memory Hierarchy LAB 2: CACHE LAB.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.
Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Cache Memories September 30, 2008 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
How caches take advantage of Temporal locality
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches The principle that states that if data is used, its neighbor will likely be used soon.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
Cache Lab Implementation and Blocking
Cache Lab Implementation and Blocking
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Lecture 5: Memory Performance. Types of Memory Registers L1 cache L2 cache L3 cache Main Memory Local Secondary Storage (local disks) Remote Secondary.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Vassar College 1 Jason Waterman, CMPU 224: Computer Organization, Fall 2015 Cache Memories CMPU 224: Computer Organization Nov 19 th Fall 2015.
Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
The Goal: illusion of large, fast, cheap memory
Cache Memories CSE 238/2038/2138: Systems Programming
Section 9: Virtual Memory (VM)
How will execution time grow with SIZE?
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
Local secondary storage (local disks)
Lecture 21: Memory Hierarchy
Cache Memories September 30, 2008
Lecture 22: Cache Hierarchies, Memory
10/16: Lecture Topics Memory problem Memory Solution: Caches Locality
CMSC 611: Advanced Computer Architecture
Caches II CSE 351 Winter 2018 Instructor: Mark Wyse
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Recitation 7 Caching By yzhuang

Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus If you download cachelab before noon of September 30, you should re-download the tarball. See the writeup for details.

Memory Hierarchy Registers SRAM DRAM Local Secondary storage Remote Secondary storage Today: we study this interaction to give you an idea how caching works

SRAM vs DRAM SRAM (cache) ◦ Faster (L1 cache: 1 CPU cycle) ◦ Smaller (Megabytes) ◦ More expensive DRAM (main memory) ◦ Relatively slower (100 CPU cycles) ◦ Larger (Gigabytes) ◦ Cheaper

Caching Temporal locality ◦ A memory location accessed is likely to be accessed again multiple times in the future ◦ After accessing address X in memory, save the bytes in cache for future access Spatial locality ◦ If a location is accessed, then nearby locations are likely to be accessed in the future. ◦ After accessing address X, save the block of memory around X in cache for future access

Memory Address 64-bit on shark machines Block offset: b bits Set index: s bits

Cache A cache is a set of 2^s cache sets A cache set is a set of E cache lines ◦ E is called associativity ◦ If E=1, it is called “direct-mapped” Each cache line stores a block ◦ Each block has 2^b bytes

Cachelab Part (a) Building a cache simulator Part(b) Optimizing matrix transpose

Part(a) Cache simulator A cache simulator is NOT a cache! ◦ Memory contents NOT stored ◦ Block offsets are NOT used ◦ Simply counts hits, misses, and evictions Your cache simulator need to work for different s, b, E, given at run time. Use LRU replacement policy

Cache simulator: Hints A cache is just 2D array of cache lines: ◦ struct cache_line cache[S][E]; ◦ S = 2^s, is the number of sets ◦ E is associativity Each cache_line has: ◦ Valid bit ◦ Tag ◦ LRU counter

Matrix Transpose (A -> B) Matrix A Matrix B Part (b) Efficient Matrix Transpose

Matrix Transpose (A -> B) Suppose block size is 8 bytes (2 ints) Matrix A Matrix B Access A[0][0] cache miss Access B[0][0] cache miss Access A[0][1] cache hit Access B[1][0] cache miss Part (b) Efficient Matrix Transpose Question: After we handle 1&2. Should we handle 3&4 first, or 5&6 first ? 1 2

Part (b) Hint What inspiration do you get from previous slide ? ◦ Divide matrix into sub-matrices ◦ This is called blocking (CSAPP2e p.629) ◦ Size of sub-matrix depends on  cache block size, cache size, input matrix size ◦ Try different sub-matrix sizes We hope you invent more tricks to reduce the number of misses !

Part (b) Cache: ◦ You get 1 kilobytes of cache ◦ Directly mapped (E=1) ◦ Block size is 32 bytes (b=5) ◦ There are 32 sets (s=5) Test Matrices: ◦ 32 by 32, 64 by 64, 61 by 67

The End Good luck!