Introduction To Computer Systems

Slides:

Advertisements

Similar presentations

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Advertisements

Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

Systems I Locality and Caching

ECE Dept., University of Toronto

Compiler Code Optimizations. Introduction Introduction Optimized codeOptimized code Executes faster Executes faster efficient memory usage efficient memory.

Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:

CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Recitation 7: 10/21/02 Outline Program Optimization –Machine Independent –Machine Dependent Loop Unrolling Blocking Annie Luo

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.

Systems I Cache Organization

Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)

Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Systems I.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Problem byte direct-mapped cache 16-byte blocks sizeof(int) = 4 struct algae_position{ int x; int y; }; Struct algae_position grid[16][16]; grid.

CSCI206 - Computer Organization & Programming

Cache Memories.

Code Optimization Overview and Examples

Code Optimization.

CSE 351 Section 9 3/1/12.

Cache Performance Samira Khan March 28, 2017.

Optimization III: Cache Memories

Cache Memories CSE 238/2038/2138: Systems Programming

Ramya Kandasamy CS 147 Section 3

Multilevel Memories (Improving performance using alittle “cash”)

How will execution time grow with SIZE?

The Hardware/Software Interface CSE351 Winter 2013

Section 7: Memory and Caches

Lecture: Cache Hierarchies

Today How’s Lab 3 going? HW 3 will be out today

Cache Memory Presentation I

Morgan Kaufmann Publishers

Code Optimization I: Machine Independent Optimizations

CS 105 Tour of the Black Holes of Computing

The Memory Hierarchy : Memory Hierarchy - Cache

Lecture: Cache Hierarchies

Lecture 21: Memory Hierarchy

Lecture 21: Memory Hierarchy

Bojian Zheng CSCD70 Spring 2018

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Memory Hierarchies.

Lecture 08: Memory Hierarchy Cache Performance

November 14 6 classes to go! Read

Lecture 22: Cache Hierarchies, Memory

Compiler Code Optimizations

Siddhartha Chatterjee

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

Lecture 22: Cache Hierarchies, Memory

CS-447– Computer Architecture Lecture 20 Cache Memories

Cache Memories Professor Hugh C. Lauer CS-2011, Machine Organization and Assembly Language (Slides include copyright materials from Computer Systems:

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Lecture 21: Memory Hierarchy

Loop Optimization “Programs spend 90% of time in loops”

Lecture 9: Caching and Demand-Paged Virtual Memory

Cache Memory and Performance

Optimizing single thread performance

10/18: Lecture Topics Using spatial locality

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Writing Cache Friendly Code

Presentation transcript:

Introduction To Computer Systems Carnegie Mellon Introduction To Computer Systems 15-213/18-243, Spring 2011 Recitation 7 (performance) Monday, February 21

Agenda Performance review Program optimization Carnegie Mellon Agenda Performance review Program optimization Memory hierarchy and caches

Performance Review Program optimization Carnegie Mellon Performance Review Program optimization Efficient programs result from: Good algorithms and data structures Code that the compiler can effectively optimize and turn into efficient executable The topic of program optimization relates to the second

Performance Review (cont) Carnegie Mellon Performance Review (cont) Modern compilers use sophisticated techniques to optimize programs However, Their ability to understand code is limited They are conservative Programmer can greatly influence compiler’s ability to optimize

Optimization Blockers Carnegie Mellon Optimization Blockers Procedure calls Compiler’s ability to perform inter-procedural optimization is limited Solution: replace call by procedure body Can result in much faster programs Inlining and macros can help preserve modularity Loop invariants Expression that do not change in loop body Solution: code motion

Optimization Blockers (cont) Carnegie Mellon Optimization Blockers (cont) Memory aliasing Accessing memory can have side effects difficult for the compiler to analyze (e.g., aliasing) Solution: scalar replacement Copy elements into temporary variables, operate, then store result back Particularly important if memory references are in innermost loop

Loop Unrolling A technique for reducing loop overhead Carnegie Mellon Loop Unrolling A technique for reducing loop overhead Perform more data operations in single iteration Resulting program has fewer iterations, which translates into fewer condition checks and jumps Enables more aggressive scheduling of loops However, too much unrolling can be bad Results in larger code Code may not fit in instruction cache

Other Techniques Out of order processing Branch prediction Carnegie Mellon Other Techniques Out of order processing Branch prediction Less crucial in this class

Caches Definition Performance metrics Memory with short access time Carnegie Mellon Caches Definition Memory with short access time Used for storage of frequently or recently used instructions or data Performance metrics Hit rate Miss rate (commonly used) Miss penalty

Cache Misses Types of misses Carnegie Mellon Cache Misses Types of misses Compulsory: due to cold cache (happens at beginning) Conflict: When referenced data maps to the same block Capacity: when working set is larger than cache

Locality Reason why caches work Temporal locality Spatial locality Carnegie Mellon Locality Reason why caches work Temporal locality Programs tend to use the same data and instructions over and over Spatial locality Program tend to use data and instructions with addresses near to those they have recently used

Carnegie Mellon Memory Hierarchy

Cache Miss Analysis Exercise Carnegie Mellon Cache Miss Analysis Exercise Assume: Cache blocks are 16-byte Only memory accesses are to the entries of grid Determine the cache performance of the following: struct algae_position { int x; int y; } struct algae_position_grid[16][16]; int total_x = 0, total_y = 0, i, j; for (i = 0; i < 16; i++) for (j = 0; j < 16; j++) total_x += grid[i][j].x total_y += grid[i][j].y

Techniques for Increasing Locality Carnegie Mellon Techniques for Increasing Locality Rearranging loops (increases spatial locality) Analyze the cache miss rate for the following: Assume 32-byte lines, array elements are doubles void ijk(A[], B[], C[], n) { int i, j, k; double sum; for (i = 0; i < n; i++) for (j = 0; j < n; j++) { sum = 0.0 for (k = 0; k < n; k++) sum += A[i][k]*B[k][j] C[i][j] += sum } void kij(A[], B[], C[], n) { int i, j, k; double r; for (k = 0; k< n; k++) for (i = 0; i < n; i++) { r = A[i][k]; for (j = 0; j < n; j++) C[i][j] += r*B[k][j]; }

Techniques for Increasing Locality (cont) Carnegie Mellon Techniques for Increasing Locality (cont) Blocking (increases temporal locality) Analyze the cache miss rate for the following: Assume 32-byte lines, array elements are doubles void naive(A[], B[], C[], n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) C[i][j] += A[i][k]*B[k][j]; } void blocking (A[], B[], C[], n, b) { int i, j, k, i1, j1, k1; for (i = 0; i < n; i += b) for (j = 0; j < n; j += b) for (k = 0; k < n; k += b) for (i1 = i; i1 < (i + b); i1++) for (j1 = j; j1 < (j + b); j1++) for (k1 = k; k1 < (k + b); k1++) c[i1][j1] += A[i1][k1]*B[k1][j1]; }

Questions? Program optimization Writing friendly cache code Cache lab Carnegie Mellon Questions? Program optimization Writing friendly cache code Cache lab