CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Simulations of Memory Hierarchy LAB 2: CACHE LAB.
Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches The principle that states that if data is used, its neighbor will likely be used soon.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Systems I Locality and Caching
ECE Dept., University of Toronto
Cache Lab Implementation and Blocking
Cache Lab Implementation and Blocking
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Lecture 5: Memory Performance. Types of Memory Registers L1 cache L2 cache L3 cache Main Memory Local Secondary Storage (local disks) Remote Secondary.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Vassar College 1 Jason Waterman, CMPU 224: Computer Organization, Fall 2015 Cache Memories CMPU 224: Computer Organization Nov 19 th Fall 2015.
The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
Cache Memories CSE 238/2038/2138: Systems Programming
Section 9: Virtual Memory (VM)
How will execution time grow with SIZE?
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
Local secondary storage (local disks)
Lecture 21: Memory Hierarchy
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 22: Cache Hierarchies, Memory
CMSC 611: Advanced Computer Architecture
Caches II CSE 351 Winter 2018 Instructor: Mark Wyse
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Contents Memory types & memory hierarchy Virtual memory (VM)
CS 3410, Spring 2014 Computer Science Cornell University
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory and Performance
Cache Memory and Performance
Presentation transcript:

CacheLab 10/10/2011 By Gennady Pekhimenko

Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings are errors – Part (a) Building Cache Simulator – Part (b) Efficient Matrix Transpose Blocking

Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings are errors – Part (a) Building Cache Simulator – Part (b) Efficient Matrix Transpose Blocking

Memory Hierarchy Registers SRAM DRAM Local Secondary storage Remote Secondary storage Today: we study this interaction to give you an idea how caching works

SRAM vs DRAM tradeoff SRAM (cache) –Faster (L1 cache: 1 CPU cycle) –Smaller (Kilobytes (L1) or Megabytes (L2)) –More expensive and “energy-hungry” DRAM (main memory) –Relatively slower (hundreds of CPU cycles) –Larger (Gigabytes) –Cheaper

Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Technical Questions – Part (a) Building Cache Simulator – Part (b) Efficient Matrix Transpose Blocking

Caching Temporal locality – A memory location accessed is likely to be accessed again multiple times in the future – After accessing address X in memory, save the bytes in cache for future access Spatial locality – If a location is accessed, then nearby locations are likely to be accessed in the future. – After accessing address X, save the block of memory around X in cache for future access

Memory Address 64-bit on shark machines Block offset: b bits Set index: s bits

Cache A cache is a set of 2^s cache sets A cache set is a set of E cache lines –E is called associativity –If E=1, it is called “direct-mapped” Each cache line stores a block –Each block has 2^b bytes

Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings are errors – Part (a) Building Cache Simulator – Part (b) Efficient Matrix Transpose Blocking

Cachelab Warnings are errors! Include proper header files Part (a) Building a cache simulator Part (b) Optimizing matrix transpose

Warnings are Errors Strict compilation flags Reasons: – Avoid potential errors that are hard to debug – Learn good habits from the beginning

Missing Header Files If function declaration is missing – Find corresponding header files – Use: man Live example – man 3 getopt

Getopt function

Part (a) Cache simulator A cache simulator is NOT a cache! – Memory contents NOT stored – Block offsets are NOT used – Simply counts hits, misses, and evictions Your cache simulator need to work for different s, b, E, given at run time. Use LRU replacement policy

Cache simulator: Hints A cache is just 2D array of cache lines: – struct cache_line cache[S][E]; – S = 2^s, is the number of sets – E is associativity Each cache_line has: – Valid bit – Tag – LRU counter

Part (b) Efficient Matrix Transpose Matrix Transpose (A -> B) Matrix A Matrix B

Part (b) Efficient Matrix Transpose Matrix Transpose (A -> B) Suppose block size is 8 bytes (2 ints) Matrix A Matrix B Access A[0][0] cache miss Access B[0][0] cache miss Access A[0][1] cache hit Access B[1][0] cache miss Question: After we handle 1&2. Should we handle 3&4 first, or 5&6 first ? 1 2

Blocking What inspiration do you get from previous slide ? – Divide matrix into sub-matrices – This is called blocking (CSAPP2e p.629) – Size of sub-matrix depends on cache block size, cache size, input matrix size – Try different sub-matrix sizes We hope you invent more tricks to reduce the number of misses !

Part (b) Cache: –You get 1 kilobytes of cache –Directly mapped (E=1) –Block size is 32 bytes (b=5) –There are 32 sets (s=5) Test Matrices: –32 by 32, 64 by 64, 61 by 67

The End Good luck!