Cache Tuning Student: João Gabriel Gazolla

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
CMSC 611: Advanced Computer Architecture
Soner Onder Michigan Technological University
CSE 351 Section 9 3/1/12.
The Memory System (Chapter 5)
Cache Memory and Performance
Memory COMPUTER ARCHITECTURE
Memory and cache CPU Memory I/O.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
CSC 4250 Computer Architectures
Today How’s Lab 3 going? HW 3 will be out today
5.2 Eleven Advanced Optimizations of Cache Performance
CS 105 Tour of the Black Holes of Computing
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
ReCap Random-Access Memory (RAM) Nonvolatile Memory
COSC121: Computer Systems. Managing Memory
Cache Memories September 30, 2008
Memory and cache CPU Memory I/O.
CS 105 Tour of the Black Holes of Computing
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
ECE 445 – Computer Organization
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chap. 12 Memory Organization
CSE 351: The Hardware/Software Interface
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
M. Usha Professor/CSE Sona College of Technology
CS 704 Advanced Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory Rabi Mahapatra
Lecture 9: Caching and Demand-Paged Virtual Memory
Cache Memory and Performance
Principle of Locality: Memory Hierarchies
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Cache Tuning Student: João Gabriel Gazolla asdasdasd Cache Tuning Cache Tuning – Global Cyber Bridges Student: João Gabriel Gazolla Professor: Dr. S. Masoud Sadjadi

Sections Cache Concepts Locality Cache Hit and Miss Memory Hierarchy asdasdasd Sections Cache Concepts Locality Cache Hit and Miss Memory Hierarchy Kinds of Cache Cache Coherence Specifics Cache Tuning – Global Cyber Bridges Thrashing Cache Exercises Conclusion Discussion

clock cycles executing instructions clock cycles waiting for memory asdasdasd Cache Concepts CPU time required to perform an operation is: Cache Tuning – Global Cyber Bridges clock cycles executing instructions ADD A,B,C MOVE B,A MUL A,B,C clock cycles waiting for memory

asdasdasd Cache Concepts The CPU cannot be performing useful work if it is waiting for data to arrive from memory. Cache Tuning – Global Cyber Bridges

Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.

Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.

Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.

Cache Concepts Other Comments: asdasdasd Cache Tuning – Global Cyber Bridges

bank cycle time is 4-8 times the CPU clock Interleaving Sequential Elements, are together (Fortran Style): Cache Tuning – Global Cyber Bridges bank cycle time is 4-8 times the CPU clock So if I can acess in parallel I solve the problem getting more information and putting together

“When an item is referenced, it will be referenced again soon” asdasdasd Temporal Locality Cache Tuning – Global Cyber Bridges #include <iostream> ... Int main(){ int a = 0; for (int i=0;i<987654;i++){ a = a+i; cout << a << endl; } return 0; Cache It! 90% of Time 10% of THE CODE “When an item is referenced, it will be referenced again soon”

Spatial Locality Get Data N and... N+1,N+2,N+3,N+4 But not so many... asdasdasd Spatial Locality Cache Tuning – Global Cyber Bridges Get Data N and... N+1,N+2,N+3,N+4 But not so many... “When an item is referenced, items whose addresses are nearby will tend to be referenced soon. ”

Cache Hit MAXIMIZE it ! What is Cache Hit Rate? asdasdasd Cache Tuning – Global Cyber Bridges What is Cache Hit Rate?

What is Cache Miss Penalty? asdasdasd Cache Miss MINIMIZE it ! Cache Tuning – Global Cyber Bridges What is Cache Miss Rate? What is Cache Miss Penalty?

Memory Hierarchy Sizes asdasdasd Memory Hierarchy Sizes *1024 Bytes Cache Tuning – Global Cyber Bridges *1024 KBytes *1024 MBytes GBytes

There are 3 kinds of cache: Direct mapped cache Set associative cache Fully associative cache Cache Tuning – Global Cyber Bridges 21%

Directed Maped Cache How it works? use MOD op. Direct Mapped Cache asdasdasd Directed Maped Cache Direct Mapped Cache Cache Tuning – Global Cyber Bridges How it works? use MOD op.

Thrashing Process has not enough pages Page-Fault is Ultra High asdasdasd Thrashing Process has not enough pages Page-Fault is Ultra High Low CPU Usage Let’s Increase Multiprogramming Cache Tuning – Global Cyber Bridges

Fully Associative Cache asdasdasd Fully Associative Cache Cache Tuning – Global Cyber Bridges

Set Associative Cache This is a trade-off between direct mapped and fully associative cache. Cache Tuning – Global Cyber Bridges

Cache Block Replacement Cache Tuning – Global Cyber Bridges direct mapped cache

Cache Block Replacement set associative cache Cache Tuning – Global Cyber Bridges FIFO Random LRU “When an item is referenced, it will be referenced again soon”

Specifics and it’s technology Go To: tinyurl.com/gcbcache2 Cache Specifics Cache Tuning – Global Cyber Bridges Itanium SGI Origin 2000 Pentium III Cache Size Replacement Acess Time Commands to Measure Performance Specifics and it’s technology Go To: tinyurl.com/gcbcache2

Cache Coherence Copy 1 of Data A Copy 2 of Data A Data A Cache Tuning – Global Cyber Bridges Copy 3 of Data A

Cache Coherence: Snoop Protocol PN MEMORY . . . Cache Tuning – Global Cyber Bridges Writing on Line 4 Line 4 not Valid AnyMore

Cache Coherence: Directory Based Protocol Cache lines contain extra bits that indicate which other processor has a copy of that cache line, and the status of the cache line – clean (cache line does not need to be sent back to main memory) or dirty (cache line needs to update main memory with content of cache line). Hardware Cache Coherence Cache coherence on the Origin computer is maintained in the hardware, transparent to the programmer. Cache Tuning – Global Cyber Bridges

Cache Coherence: False Sharing struct foo { volatile int x; volatile int y; }; foo f; int sum_a() { int s = 0; for (int i = 0; i < 1000000; ++i) s += f.x; return s; } void inc_b() { ++f.y; } Cache Tuning – Global Cyber Bridges

Cache Exercises Examples of Locality: sum = 0; asdasdasd Cache Exercises sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Examples of Locality: Data Acess Elements in Series: Reference to sum in each iteraction: Instruction Instruction done in Sequence: Always walking through the loop: Spatial Temporal Spatial Temporal

asdasdasd Cache Exercises int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

asdasdasd Cache Exercises int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

asdasdasd Conclusions 100%

asdasdasd Sources Slides Prepared from the CI-Tutor Courses at NCSA by S. Masoud Sadjadi Memória Cache, Simone Martins, 2008. Wikipedia www.ariadne.ac.uk parasol.tamu.edu/~rwerger/Courses/654/ cachecoherence1.pdf www.cs.unc.edu/~montek/teaching/fall- 05/lectures/lecture-16.ppt http://www.ic.uff.br/~simone/sistemasco mp/ David A. Patterson; John L. Hennessy. Organização e Projeto de Computadores, A Interface Hardware/Software LTC, 2000. Página do livro em inglês . Cache Tuning – Global Cyber Bridges

asdasdasd Sources Randal E. Bryant and David R. O´Hallaron. Computer Systems: A Programmer´s Perspective. Prentice Hall 2002. Página do livro Many Google Image Queries Cache Tuning – Global Cyber Bridges

Doubts? Comments? Extras? asdasdasd Doubts? Comments? Extras? Cache Tuning – Global Cyber Bridges Download of the Presentation: www.gabrielgazolla.com/gcbCT.zip