Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization of C Code The C for Speed

Similar presentations


Presentation on theme: "Optimization of C Code The C for Speed"— Presentation transcript:

1 Optimization of C Code The C for Speed
Ahmed Helmi Maaroufi Pang Hui EL 2310 Scientific Programming

2 Why optimizing your C code ?
For less memory consumption. For faster computation speed. Two different optimization goals that might sometimes conflict with each other——might have to find a balance.

3 First Things First! Optimize your algorithm before even start writing your code. No point optimizing code that is slow by design. Make common case fast. use a profiler to identify performance bottlenecks.

4 So what can we do? Line level Memory level Function level
Peephole optimization Memory level properly arrange data types Quick access to large arrays Hot&cold data separation Function level Strength reduction Jumps/Branches Condition checking Loop trick Compiler level Avoid memory aliasing Function calls using global variables

5 Peephole Optimization
Performed over a very small set of instructions in a segment of generated code. Works by recognizing sets of instructions that can be replaced by shorter or faster sets of instructions.

6 Peephole Optimization
For most classes, use the operators += , -= , *= , and /= , instead of the operators + , - , * , and / . For objects, use the prefix operator (++obj) instead of the postfix operator (obj++). Use shift operations >> and << instead of integer multiplication and division, where possible. Test if something is equal to zero is faster than to compare two different numbers. Tagged as low-level

7 Peephole Optimization
Before:     x = y % 32;     x = y * 8;     x = y / w + z / w;     if ( a==b &&c==d &&e==f ) {...} After:     x = y &31;     x = y <<3;     x = (y + z) / w;         if ( ((a-b)|(c-d)|(e-f))==0 ) {...}

8 Strength Reduction Replace expensive operations with equivalent but less expensive operations. Many compilers will do this for you automatically. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array addressing.

9 Strength Reduction Before: c = 7; After: c = 7;
for (i = 0; i < N; i++) { y[i] = c * i; } After: c = 7; k = 0; for (i = 0; i < N; i++) { y[i] = k; k = k + c; }

10 Minimize jumps/branches
The elimination of branching is an important concern with today's deeply pipelined processor architectures. “Mispredicted" branches often cost many cycles.

11 Minimize jumps/branches
Use inline functions for short functions to eliminate function overhead. Move loops inside function calls. Iteration is preferred over recursion.

12 Minimize jumps/branches
Before: for (i=0;i<N;i++) { DoSomething(i); } After: DoSomething(N) { for (i=0;i<N;i++) {…} }

13 Minimize Condition Checking
You don’t actually process anything when checking conditions. Whenever possible, replace if’s with switch’s. If a switch statement is not possible, put the most common clauses at the beginning of the if chain. Try to remove “else" clause if there is a lop-sided probability.

14 Loop Tricks Loop Unrolling: Loop jamming: Early loop breaking:
reducing the number of iterations and replicating the body of the loop to reduce loop overhead. Loop jamming: combine adjacent loops which loop over the same range of the same variable. Early loop breaking: not necessary to process the entirety of a loop.

15 Loop Unrolling Before: After: for (int i=0;i<1000;i+=2) {
a[i] = b[i] + c[i]; After: for (int i=0;i<1000;i+=2) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; }

16 Loop Jamming Before: After: for (i = 0; i < MAX; i++)
for ( j = 0; j < MAX; j++) a[i][j] = 0.0; for ( i = 0; i < MAX; i++) a[i][i] = 1.0;  After: for ( i = 0; i < MAX; i++) { for ( j = 0; j < MAX; j++) a[i][j] = 0.0; a[i][i] = 1.0; }

17 Early loop breaking found = false; for(i=0;i<10000;i++) {
if( list[i] == -99 ) found = true; } if( found ) printf(“…”); found = false; for(i=0; i<10000; i++) { if( list[i] == -99 ) found = true; break; } if ( found ) printf(“…”);.

18 Memory and Cache Better arrange members in structure for data aligning. In order to align the data in memory,  empty bytes are inserted between memory addresses which are allocated for other members while memory allocation. Try to avoid casting where possible. Integer and floating point instructions often operate on different registers, so a cast requires copy and communication between registers.

19 Memory and Cache Use pointers when dealing with large objects instead of copying them to memory. Accessing data the same way as stored in physical memory—go row after the row in your matrix. Use memset() to copy large arrays in memory.

20 Memory and Cache Before:
int a[3][3][3]; int b[3][3][3]; ... for(i=0;i<3;i++) for(j=0;j<3;j++)          for(k=0;k<3;k++)             b[i][j][k] = a[i][j][k];     for(i=0;i<3;i++)       for(j=0;j<3;j++)          for(k=0;k<3;k++)            a[i][j][k] = 0; After:     typedef struct { int element[3][3][3];     } Three3DType;     Three3DType a,b;     ...     b = a;     memset(a,0,sizeof(a));

21 Hot & Cold Data Separation: splitting your data structures into frequently accessed ("hot") and rarely accessed ("cold") sections.

22 Hot & Cold Data Separation
struct Customer { int ID; int AccountNumber; char Name[128]; char Address[256]; }; Customer customers [1000]; struct CustomerAccount { int ID; int AccountNumber; CustomerData *pData; }; struct CustomerData char Name[128]; char Address[256]; } CustomerAccount customers[1000];

23 Data Alignment Before: struct structure1 { int id1; char name1; int id2; char name2; float percentage; }; After: struct structure1 { int id1; int id2; char name1; char name2; float percentage; };

24 Compiler Write source code that the compiler can effectively optimize to turn into efficient executable code. Therefore important to understand the capabilities and limitations of optimizing compilers.

25 Compiler Optimization in GCC
Most compilers, including GCC, provide users with some control over which optimizations they apply. The simplest control is to specify the optimization level. invoking GCC with the command-line flag ‘-O1’, ‘-O2’ or ‘-O3’ will cause it to apply a different level of optimizations. Optimization may expand the program size and make the program more difficult to debug using standard debugging tools.

26 Safe Optimizations Compilers must be careful to apply only safe optimizations to a program. In performing only safe optimizations, the compiler must assume that different pointers may be aliased.

27 Memory Aliasing Therefore, memory aliasing can severely limit the opportunities for a compiler to generate optimized code. Programmers using GCC must put more effort into writing programs in a way that simplifies the compiler’s task of generating efficient code.

28 Memory Aliasing void twiddle1(int *xp, int *yp) { *xp += *yp; }

29 Function Calls Using Global Variables
Most compilers do not try to check if function is free of side effects. Instead, it assumes the worst case and leaves function calls intact. Code involving function calls can be optimized by a process known as inline substitution. where the function call is replaced by the code for the body of the function.

30 Function Calls Using Global Variables
int f(); int func1() { return f() + f() + f() + f(); } This function has a side effect—it modifies some part of the global program state. Changing the number of times it gets called changes the program behavior. Int f(); int func2() { return 4*f(); } -Assume this case: int counter = 0; int f() { return counter++;

31 Questions ?


Download ppt "Optimization of C Code The C for Speed"

Similar presentations


Ads by Google