Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Similar presentations


Presentation on theme: "Simulations of Memory Hierarchy LAB 2: CACHE LAB."— Presentation transcript:

1 Simulations of Memory Hierarchy LAB 2: CACHE LAB

2 OVERVIEW Objectives Cache Set-Up Command line parsing Least Recently Used (LRU) Matrix Transposition Cache-Friendly Code

3 OBJECTIVE There are two parts to this lab: Part A: Cache Simulator Simulate a cache table using the LRU algorithm Part B: Optimizing Matrix Transpose Write “cache-friendly” code in order to optimize cache hits/misses in the implementation of a matrix transpose function When submitting your lab, please submit the handin.tar file as described in the instructions.

4 MEMORY HIERARCHY Pick your poison: smaller, faster, and costlier, or larger, slower, and cheaper

5 CACHE ADDRESSING X-bit memory addresses (in Part A, X <= 64 bits) Block offset: b bits Set index: s bits Tag bits: X – b – s Cache is a collection of S=2^s cache sets Cache set is a collection of E cache lines E is the associativity of the cache If E=1, the cache is called “direct-mapped” Each cache line stores a block of B=2^b bytes of data

6 ADDRESS ANATOMY

7 CACHE TABLE BASICS Conditions: Set size (S) Block size (B) Line size (E) Note that the total capacity of this cache would be S*B*E Blocks are the fundamental units of the cache

8 CACHE TABLE CORRESPONDENCE WITH ADDRESS

9 Example for 32 bit address

10 CACHE SET LOOK-UP Determine the set index and the tag bits based on the memory address Locate the corresponding cache set and determine whether or not there exists a valid cache line with a matching tag If a cache miss occurs: If there is an empty cache line, utilize it If the set is full then a cache line must be evicted

11 TYPES OF CACHE MISSES Compulsory Miss: First access to a block has to be a miss Conflict Miss: Level k cache is large enough, but multiple data objects all map to the same level k block Capacity Miss: Occurs when the working set of blocks (blocks of memory being used) is larger than the cache

12 PART A: CACHE SIMULATION

13 YOUR OWN CACHE SIMULATOR NOT a real cache Block offsets are NOT used but are important in understanding the concept of a cache s, b, and E given at runtime

14 FUNCTIONS TO USE FOR COMMAND LINE PARSING int getopt(int argc, char*const* argv, const char* options) See: http://www.gnu.org/software/libc/manual/html_node/ Example-of-Getopt.html#Example-of-Getopt long long int strtoll(const char* str, char** endptr, int base) See: http://www.cplusplus.com/reference/cstdlib/strtoll/

15 LEAST RECENTLY USED (LRU) ALGORITHM A least recently used algorithm should be used to determine which cache lines to evict in what order Each cache line will need some sort of “time” field which should be update each time that cache line is referenced If a cache miss occurs in a full cache set, the cache line with the least relevant time field should be evicted

16 PART B: OPTIMIZING MATRIX TRANSPOSE

17 WHAT IS A MATRIX TRANSPOSITION? The transpose of a matrix A is denoted as A T The rows of A T are the columns of A, and the columns of A T are the rows of A Example:

18 GENERAL MATRIX TRANSPOSITION

19 CACHE-FRIENDLY CODE In order to have fewer cache misses, you must make good use of: Temporal locality: reuse the current cache block if possible (avoid conflict misses [thrashing]) Spatial locality: reference the data of close storage locations Tips: Cache blocking Optimized access patterns Your code should look ugly if done correctly

20 CACHE BLOCKING Partition the matrix in question into sub-matrices Divide the larger problem into smaller sub-problems Main idea: Iterate over blocks as you perform the transpose as opposed to the simplistic algorithm which goes index by index, row by row Determining the size of these blocks will take some amount of thought and experimentation

21 QUESTIONS TO PONDER What would happen if instead of accessing each index in row order you alternated with jumping from row to row within the same column? What would happen if you declared only 4 local variables as opposed to 12 local variables? Is it possible to get rid of the local variables all together? What happens when accessing elements along the diagonal? What happens when the program is run in a different directory?

22 (XKCD)


Download ppt "Simulations of Memory Hierarchy LAB 2: CACHE LAB."

Similar presentations


Ads by Google