Introduction To Computer Systems

Slides:



Advertisements
Similar presentations
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Advertisements

Simulations of Memory Hierarchy LAB 2: CACHE LAB.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Systems I Locality and Caching
ECE Dept., University of Toronto
Compiler Code Optimizations. Introduction Introduction Optimized codeOptimized code Executes faster Executes faster efficient memory usage efficient memory.
Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Recitation 7: 10/21/02 Outline Program Optimization –Machine Independent –Machine Dependent Loop Unrolling Blocking Annie Luo
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
Systems I Cache Organization
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Systems I.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Problem byte direct-mapped cache 16-byte blocks sizeof(int) = 4 struct algae_position{ int x; int y; }; Struct algae_position grid[16][16]; grid.
CSCI206 - Computer Organization & Programming
Cache Memories.
Code Optimization Overview and Examples
Code Optimization.
CSE 351 Section 9 3/1/12.
Cache Performance Samira Khan March 28, 2017.
Optimization III: Cache Memories
Cache Memories CSE 238/2038/2138: Systems Programming
Ramya Kandasamy CS 147 Section 3
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
The Hardware/Software Interface CSE351 Winter 2013
Section 7: Memory and Caches
Lecture: Cache Hierarchies
Today How’s Lab 3 going? HW 3 will be out today
Cache Memory Presentation I
Morgan Kaufmann Publishers
Code Optimization I: Machine Independent Optimizations
CS 105 Tour of the Black Holes of Computing
The Memory Hierarchy : Memory Hierarchy - Cache
Lecture: Cache Hierarchies
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Bojian Zheng CSCD70 Spring 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Memory Hierarchies.
Lecture 08: Memory Hierarchy Cache Performance
November 14 6 classes to go! Read
Lecture 22: Cache Hierarchies, Memory
Compiler Code Optimizations
Siddhartha Chatterjee
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
CS-447– Computer Architecture Lecture 20 Cache Memories
Cache Memories Professor Hugh C. Lauer CS-2011, Machine Organization and Assembly Language (Slides include copyright materials from Computer Systems:
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Lecture 21: Memory Hierarchy
Loop Optimization “Programs spend 90% of time in loops”
Lecture 9: Caching and Demand-Paged Virtual Memory
Cache Memory and Performance
Optimizing single thread performance
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Writing Cache Friendly Code

Presentation transcript:

Introduction To Computer Systems Carnegie Mellon Introduction To Computer Systems 15-213/18-243, Spring 2011 Recitation 7 (performance) Monday, February 21

Agenda Performance review Program optimization Carnegie Mellon Agenda Performance review Program optimization Memory hierarchy and caches

Performance Review Program optimization Carnegie Mellon Performance Review Program optimization Efficient programs result from: Good algorithms and data structures Code that the compiler can effectively optimize and turn into efficient executable The topic of program optimization relates to the second

Performance Review (cont) Carnegie Mellon Performance Review (cont) Modern compilers use sophisticated techniques to optimize programs However, Their ability to understand code is limited They are conservative Programmer can greatly influence compiler’s ability to optimize

Optimization Blockers Carnegie Mellon Optimization Blockers Procedure calls Compiler’s ability to perform inter-procedural optimization is limited Solution: replace call by procedure body Can result in much faster programs Inlining and macros can help preserve modularity Loop invariants Expression that do not change in loop body Solution: code motion

Optimization Blockers (cont) Carnegie Mellon Optimization Blockers (cont) Memory aliasing Accessing memory can have side effects difficult for the compiler to analyze (e.g., aliasing) Solution: scalar replacement Copy elements into temporary variables, operate, then store result back Particularly important if memory references are in innermost loop

Loop Unrolling A technique for reducing loop overhead Carnegie Mellon Loop Unrolling A technique for reducing loop overhead Perform more data operations in single iteration Resulting program has fewer iterations, which translates into fewer condition checks and jumps Enables more aggressive scheduling of loops However, too much unrolling can be bad Results in larger code Code may not fit in instruction cache

Other Techniques Out of order processing Branch prediction Carnegie Mellon Other Techniques Out of order processing Branch prediction Less crucial in this class

Caches Definition Performance metrics Memory with short access time Carnegie Mellon Caches Definition Memory with short access time Used for storage of frequently or recently used instructions or data Performance metrics Hit rate Miss rate (commonly used) Miss penalty

Cache Misses Types of misses Carnegie Mellon Cache Misses Types of misses Compulsory: due to cold cache (happens at beginning) Conflict: When referenced data maps to the same block Capacity: when working set is larger than cache

Locality Reason why caches work Temporal locality Spatial locality Carnegie Mellon Locality Reason why caches work Temporal locality Programs tend to use the same data and instructions over and over Spatial locality Program tend to use data and instructions with addresses near to those they have recently used

Carnegie Mellon Memory Hierarchy

Cache Miss Analysis Exercise Carnegie Mellon Cache Miss Analysis Exercise Assume: Cache blocks are 16-byte Only memory accesses are to the entries of grid Determine the cache performance of the following: struct algae_position { int x; int y; } struct algae_position_grid[16][16]; int total_x = 0, total_y = 0, i, j; for (i = 0; i < 16; i++) for (j = 0; j < 16; j++) total_x += grid[i][j].x total_y += grid[i][j].y

Techniques for Increasing Locality Carnegie Mellon Techniques for Increasing Locality Rearranging loops (increases spatial locality) Analyze the cache miss rate for the following: Assume 32-byte lines, array elements are doubles void ijk(A[], B[], C[], n) { int i, j, k; double sum; for (i = 0; i < n; i++) for (j = 0; j < n; j++) { sum = 0.0 for (k = 0; k < n; k++) sum += A[i][k]*B[k][j] C[i][j] += sum } void kij(A[], B[], C[], n) { int i, j, k; double r; for (k = 0; k< n; k++) for (i = 0; i < n; i++) { r = A[i][k]; for (j = 0; j < n; j++) C[i][j] += r*B[k][j]; }

Techniques for Increasing Locality (cont) Carnegie Mellon Techniques for Increasing Locality (cont) Blocking (increases temporal locality) Analyze the cache miss rate for the following: Assume 32-byte lines, array elements are doubles void naive(A[], B[], C[], n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) C[i][j] += A[i][k]*B[k][j]; } void blocking (A[], B[], C[], n, b) { int i, j, k, i1, j1, k1; for (i = 0; i < n; i += b) for (j = 0; j < n; j += b) for (k = 0; k < n; k += b) for (i1 = i; i1 < (i + b); i1++) for (j1 = j; j1 < (j + b); j1++) for (k1 = k; k1 < (k + b); k1++) c[i1][j1] += A[i1][k1]*B[k1][j1]; }

Questions? Program optimization Writing friendly cache code Cache lab Carnegie Mellon Questions? Program optimization Writing friendly cache code Cache lab