Download presentation
Presentation is loading. Please wait.
Published byNickolas Wheeler Modified over 9 years ago
1
Simulations of Memory Hierarchy LAB 2: CACHE LAB
2
OVERVIEW Objectives Cache Set-Up Command line parsing Least Recently Used (LRU) Matrix Transposition Cache-Friendly Code
3
OBJECTIVE There are two parts to this lab: Part A: Cache Simulator Simulate a cache table using the LRU algorithm Part B: Optimizing Matrix Transpose Write “cache-friendly” code in order to optimize cache hits/misses in the implementation of a matrix transpose function When submitting your lab, please submit the handin.tar file as described in the instructions.
4
MEMORY HIERARCHY Pick your poison: smaller, faster, and costlier, or larger, slower, and cheaper
5
CACHE ADDRESSING X-bit memory addresses (in Part A, X <= 64 bits) Block offset: b bits Set index: s bits Tag bits: X – b – s Cache is a collection of S=2^s cache sets Cache set is a collection of E cache lines E is the associativity of the cache If E=1, the cache is called “direct-mapped” Each cache line stores a block of B=2^b bytes of data
6
ADDRESS ANATOMY
7
CACHE TABLE BASICS Conditions: Set size (S) Block size (B) Line size (E) Note that the total capacity of this cache would be S*B*E Blocks are the fundamental units of the cache
8
CACHE TABLE CORRESPONDENCE WITH ADDRESS
9
Example for 32 bit address
10
CACHE SET LOOK-UP Determine the set index and the tag bits based on the memory address Locate the corresponding cache set and determine whether or not there exists a valid cache line with a matching tag If a cache miss occurs: If there is an empty cache line, utilize it If the set is full then a cache line must be evicted
11
TYPES OF CACHE MISSES Compulsory Miss: First access to a block has to be a miss Conflict Miss: Level k cache is large enough, but multiple data objects all map to the same level k block Capacity Miss: Occurs when the working set of blocks (blocks of memory being used) is larger than the cache
12
PART A: CACHE SIMULATION
13
YOUR OWN CACHE SIMULATOR NOT a real cache Block offsets are NOT used but are important in understanding the concept of a cache s, b, and E given at runtime
14
FUNCTIONS TO USE FOR COMMAND LINE PARSING int getopt(int argc, char*const* argv, const char* options) See: http://www.gnu.org/software/libc/manual/html_node/ Example-of-Getopt.html#Example-of-Getopt long long int strtoll(const char* str, char** endptr, int base) See: http://www.cplusplus.com/reference/cstdlib/strtoll/
15
LEAST RECENTLY USED (LRU) ALGORITHM A least recently used algorithm should be used to determine which cache lines to evict in what order Each cache line will need some sort of “time” field which should be update each time that cache line is referenced If a cache miss occurs in a full cache set, the cache line with the least relevant time field should be evicted
16
PART B: OPTIMIZING MATRIX TRANSPOSE
17
WHAT IS A MATRIX TRANSPOSITION? The transpose of a matrix A is denoted as A T The rows of A T are the columns of A, and the columns of A T are the rows of A Example:
18
GENERAL MATRIX TRANSPOSITION
19
CACHE-FRIENDLY CODE In order to have fewer cache misses, you must make good use of: Temporal locality: reuse the current cache block if possible (avoid conflict misses [thrashing]) Spatial locality: reference the data of close storage locations Tips: Cache blocking Optimized access patterns Your code should look ugly if done correctly
20
CACHE BLOCKING Partition the matrix in question into sub-matrices Divide the larger problem into smaller sub-problems Main idea: Iterate over blocks as you perform the transpose as opposed to the simplistic algorithm which goes index by index, row by row Determining the size of these blocks will take some amount of thought and experimentation
21
QUESTIONS TO PONDER What would happen if instead of accessing each index in row order you alternated with jumping from row to row within the same column? What would happen if you declared only 4 local variables as opposed to 12 local variables? Is it possible to get rid of the local variables all together? What happens when accessing elements along the diagonal? What happens when the program is run in a different directory?
22
(XKCD)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.