Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
University of Amsterdam Computer Systems – cache characteristics Arnoud Visser 1 Computer Systems Cache characteristics.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
CSC1016 Coursework Clarification Derek Mortimer March 2010.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
Modified from notes by Saeid Nooshabadi
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Systems I Locality and Caching
ECE Dept., University of Toronto
Cache Lab Implementation and Blocking
Cache Lab Implementation and Blocking
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
CMPE 421 Parallel Computer Architecture
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CSCI206 - Computer Organization & Programming
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Caches Samira Khan March 23, 2017.
CSE 351 Section 9 3/1/12.
The Memory System (Chapter 5)
Cache Performance Samira Khan March 28, 2017.
Cache Memories CSE 238/2038/2138: Systems Programming
Multilevel Memories (Improving performance using alittle “cash”)
The Hardware/Software Interface CSE351 Winter 2013
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
ReCap Random-Access Memory (RAM) Nonvolatile Memory
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Bojian Zheng CSCD70 Spring 2018
Part V Memory System Design
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
Help! How does cache work?
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
CMSC 611: Advanced Computer Architecture
Caches II CSE 351 Winter 2018 Instructor: Mark Wyse
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
Lecture 21: Memory Hierarchy
Chapter 1 Computer System Overview
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Lecture 9: Caching and Demand-Paged Virtual Memory
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Cache Memory and Performance
Sarah Diesburg Operating Systems CS 3430
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Simulations of Memory Hierarchy LAB 2: CACHE LAB

OVERVIEW Objectives Cache Set-Up Command line parsing Least Recently Used (LRU) Matrix Transposition Cache-Friendly Code

OBJECTIVE There are two parts to this lab: Part A: Cache Simulator Simulate a cache table using the LRU algorithm Part B: Optimizing Matrix Transpose Write “cache-friendly” code in order to optimize cache hits/misses in the implementation of a matrix transpose function When submitting your lab, please submit the handin.tar file as described in the instructions.

MEMORY HIERARCHY Pick your poison: smaller, faster, and costlier, or larger, slower, and cheaper

CACHE ADDRESSING X-bit memory addresses (in Part A, X <= 64 bits) Block offset: b bits Set index: s bits Tag bits: X – b – s Cache is a collection of S=2^s cache sets Cache set is a collection of E cache lines E is the associativity of the cache If E=1, the cache is called “direct-mapped” Each cache line stores a block of B=2^b bytes of data

ADDRESS ANATOMY

CACHE TABLE BASICS Conditions: Set size (S) Block size (B) Line size (E) Note that the total capacity of this cache would be S*B*E Blocks are the fundamental units of the cache

CACHE TABLE CORRESPONDENCE WITH ADDRESS

Example for 32 bit address

CACHE SET LOOK-UP Determine the set index and the tag bits based on the memory address Locate the corresponding cache set and determine whether or not there exists a valid cache line with a matching tag If a cache miss occurs: If there is an empty cache line, utilize it If the set is full then a cache line must be evicted

TYPES OF CACHE MISSES Compulsory Miss: First access to a block has to be a miss Conflict Miss: Level k cache is large enough, but multiple data objects all map to the same level k block Capacity Miss: Occurs when the working set of blocks (blocks of memory being used) is larger than the cache

PART A: CACHE SIMULATION

YOUR OWN CACHE SIMULATOR NOT a real cache Block offsets are NOT used but are important in understanding the concept of a cache s, b, and E given at runtime

FUNCTIONS TO USE FOR COMMAND LINE PARSING int getopt(int argc, char*const* argv, const char* options) See: Example-of-Getopt.html#Example-of-Getopt long long int strtoll(const char* str, char** endptr, int base) See:

LEAST RECENTLY USED (LRU) ALGORITHM A least recently used algorithm should be used to determine which cache lines to evict in what order Each cache line will need some sort of “time” field which should be update each time that cache line is referenced If a cache miss occurs in a full cache set, the cache line with the least relevant time field should be evicted

PART B: OPTIMIZING MATRIX TRANSPOSE

WHAT IS A MATRIX TRANSPOSITION? The transpose of a matrix A is denoted as A T The rows of A T are the columns of A, and the columns of A T are the rows of A Example:

GENERAL MATRIX TRANSPOSITION

CACHE-FRIENDLY CODE In order to have fewer cache misses, you must make good use of: Temporal locality: reuse the current cache block if possible (avoid conflict misses [thrashing]) Spatial locality: reference the data of close storage locations Tips: Cache blocking Optimized access patterns Your code should look ugly if done correctly

CACHE BLOCKING Partition the matrix in question into sub-matrices Divide the larger problem into smaller sub-problems Main idea: Iterate over blocks as you perform the transpose as opposed to the simplistic algorithm which goes index by index, row by row Determining the size of these blocks will take some amount of thought and experimentation

QUESTIONS TO PONDER What would happen if instead of accessing each index in row order you alternated with jumping from row to row within the same column? What would happen if you declared only 4 local variables as opposed to 12 local variables? Is it possible to get rid of the local variables all together? What happens when accessing elements along the diagonal? What happens when the program is run in a different directory?

(XKCD)