Cache Tuning Student: João Gabriel Gazolla asdasdasd Cache Tuning Cache Tuning – Global Cyber Bridges Student: João Gabriel Gazolla Professor: Dr. S. Masoud Sadjadi
Sections Cache Concepts Locality Cache Hit and Miss Memory Hierarchy asdasdasd Sections Cache Concepts Locality Cache Hit and Miss Memory Hierarchy Kinds of Cache Cache Coherence Specifics Cache Tuning – Global Cyber Bridges Thrashing Cache Exercises Conclusion Discussion
clock cycles executing instructions clock cycles waiting for memory asdasdasd Cache Concepts CPU time required to perform an operation is: Cache Tuning – Global Cyber Bridges clock cycles executing instructions ADD A,B,C MOVE B,A MUL A,B,C clock cycles waiting for memory
asdasdasd Cache Concepts The CPU cannot be performing useful work if it is waiting for data to arrive from memory. Cache Tuning – Global Cyber Bridges
Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts asdasdasd The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts Other Comments: asdasdasd Cache Tuning – Global Cyber Bridges
bank cycle time is 4-8 times the CPU clock Interleaving Sequential Elements, are together (Fortran Style): Cache Tuning – Global Cyber Bridges bank cycle time is 4-8 times the CPU clock So if I can acess in parallel I solve the problem getting more information and putting together
“When an item is referenced, it will be referenced again soon” asdasdasd Temporal Locality Cache Tuning – Global Cyber Bridges #include <iostream> ... Int main(){ int a = 0; for (int i=0;i<987654;i++){ a = a+i; cout << a << endl; } return 0; Cache It! 90% of Time 10% of THE CODE “When an item is referenced, it will be referenced again soon”
Spatial Locality Get Data N and... N+1,N+2,N+3,N+4 But not so many... asdasdasd Spatial Locality Cache Tuning – Global Cyber Bridges Get Data N and... N+1,N+2,N+3,N+4 But not so many... “When an item is referenced, items whose addresses are nearby will tend to be referenced soon. ”
Cache Hit MAXIMIZE it ! What is Cache Hit Rate? asdasdasd Cache Tuning – Global Cyber Bridges What is Cache Hit Rate?
What is Cache Miss Penalty? asdasdasd Cache Miss MINIMIZE it ! Cache Tuning – Global Cyber Bridges What is Cache Miss Rate? What is Cache Miss Penalty?
Memory Hierarchy Sizes asdasdasd Memory Hierarchy Sizes *1024 Bytes Cache Tuning – Global Cyber Bridges *1024 KBytes *1024 MBytes GBytes
There are 3 kinds of cache: Direct mapped cache Set associative cache Fully associative cache Cache Tuning – Global Cyber Bridges 21%
Directed Maped Cache How it works? use MOD op. Direct Mapped Cache asdasdasd Directed Maped Cache Direct Mapped Cache Cache Tuning – Global Cyber Bridges How it works? use MOD op.
Thrashing Process has not enough pages Page-Fault is Ultra High asdasdasd Thrashing Process has not enough pages Page-Fault is Ultra High Low CPU Usage Let’s Increase Multiprogramming Cache Tuning – Global Cyber Bridges
Fully Associative Cache asdasdasd Fully Associative Cache Cache Tuning – Global Cyber Bridges
Set Associative Cache This is a trade-off between direct mapped and fully associative cache. Cache Tuning – Global Cyber Bridges
Cache Block Replacement Cache Tuning – Global Cyber Bridges direct mapped cache
Cache Block Replacement set associative cache Cache Tuning – Global Cyber Bridges FIFO Random LRU “When an item is referenced, it will be referenced again soon”
Specifics and it’s technology Go To: tinyurl.com/gcbcache2 Cache Specifics Cache Tuning – Global Cyber Bridges Itanium SGI Origin 2000 Pentium III Cache Size Replacement Acess Time Commands to Measure Performance Specifics and it’s technology Go To: tinyurl.com/gcbcache2
Cache Coherence Copy 1 of Data A Copy 2 of Data A Data A Cache Tuning – Global Cyber Bridges Copy 3 of Data A
Cache Coherence: Snoop Protocol PN MEMORY . . . Cache Tuning – Global Cyber Bridges Writing on Line 4 Line 4 not Valid AnyMore
Cache Coherence: Directory Based Protocol Cache lines contain extra bits that indicate which other processor has a copy of that cache line, and the status of the cache line – clean (cache line does not need to be sent back to main memory) or dirty (cache line needs to update main memory with content of cache line). Hardware Cache Coherence Cache coherence on the Origin computer is maintained in the hardware, transparent to the programmer. Cache Tuning – Global Cyber Bridges
Cache Coherence: False Sharing struct foo { volatile int x; volatile int y; }; foo f; int sum_a() { int s = 0; for (int i = 0; i < 1000000; ++i) s += f.x; return s; } void inc_b() { ++f.y; } Cache Tuning – Global Cyber Bridges
Cache Exercises Examples of Locality: sum = 0; asdasdasd Cache Exercises sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Examples of Locality: Data Acess Elements in Series: Reference to sum in each iteraction: Instruction Instruction done in Sequence: Always walking through the loop: Spatial Temporal Spatial Temporal
asdasdasd Cache Exercises int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
asdasdasd Cache Exercises int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
asdasdasd Conclusions 100%
asdasdasd Sources Slides Prepared from the CI-Tutor Courses at NCSA by S. Masoud Sadjadi Memória Cache, Simone Martins, 2008. Wikipedia www.ariadne.ac.uk parasol.tamu.edu/~rwerger/Courses/654/ cachecoherence1.pdf www.cs.unc.edu/~montek/teaching/fall- 05/lectures/lecture-16.ppt http://www.ic.uff.br/~simone/sistemasco mp/ David A. Patterson; John L. Hennessy. Organização e Projeto de Computadores, A Interface Hardware/Software LTC, 2000. Página do livro em inglês . Cache Tuning – Global Cyber Bridges
asdasdasd Sources Randal E. Bryant and David R. O´Hallaron. Computer Systems: A Programmer´s Perspective. Prentice Hall 2002. Página do livro Many Google Image Queries Cache Tuning – Global Cyber Bridges
Doubts? Comments? Extras? asdasdasd Doubts? Comments? Extras? Cache Tuning – Global Cyber Bridges Download of the Presentation: www.gabrielgazolla.com/gcbCT.zip