Optimization: The Art of Computing Intel Challenge experience and other tricks … Mathieu Gravey
Golden principle of Optimizing - t e r m Algorithm Implementation Hardware P e r f o m a n c
Example: Prime Number Algorithm For i=2 to N bool isPrime=true; For j=2 to N If (mod(i,j)==0 and i != j) isPrime=false; break; end if end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm For i=2 to N bool isPrime=true; For j=2 to i If (mod(i,j)==0) isPrime=false; break; end if end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm For i=2 to N bool isPrime=true; For j=2 to √i If (mod(i,j)==0) isPrime=false; break; end if end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i=2 to N bool isPrime=true; For j=2 to √i If (mod(i,j)==0) isPrime=false; break; end if end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i=2 to N bool isPrime=true; For j=2 to √i If (mod(i,j)==0) isPrime=false; break; end if end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i=2 to N bool isPrime=true; vectorize the job For j=2 to √i isPrime = isPrime && (mod(i,j)!=0); end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i=3 to N step 2 bool isPrime=true; vectorize the job For j in √i step 2 isPrime = isPrime && (mod(i,j)!=0); end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i=2 to N step 2 bool isPrime=true; vectorize the job For j=2 to √i step 2 isPrime = isPrime && (mod(i,j)!=0); end for if (isPrime) add i to the listOfPrimeNumber End for
Example: Prime Number Algorithm // the job For i==2 to N bool isPrime=true; vectorize the job For j in listOfPrimeNumber and j<√i isPrime = isPrime && (mod(i,j)!=0); end for if (isPrime) add i to the listOfPrimeNumber in order End for
Example: Prime Number Algorithm // the job For i==1 or i==5 in base 6, to N bool isPrime=true; vectorize the job For j in listOfPrimeNumber and j<√i isPrime = isPrime && (mod(i,j)!=0); end for if (isPrime) add i to the listOfPrimeNumber in order End for
Basic principles Pareto principle Structure Parallelization Vectorization inotes4you.files.wordpress.com
Basic principles Start by the main issues Global view critical issue Monkey development Start simple go to complex Iterative process Optimizing, start by slowing down Global picture ! http://bestofpicture.com/
Rules Guidelines Be lazy Don’t reinvent the wheel Don’t be idle Design pattern Global variables are your enemies Don’t Overgeneralize
Rules Guidelines Trust the compiler Simple for you = simple for compiler | computer Share your knowledge Compiler
Rules Guidelines Think different, try, change and try again … Don’t aim for the Best, but something Good and Better
Concrete trick : Memory Array vs. List Prefetch | random access
Concrete trick : First step Optimization Compiler optimization icpc myCodeFile –O3 -xhost –o myCompiledProgram ⚠ -g const No-writes inline restrict/__restrict__ No read updates Loop-unroll __builtin_expect((x),(y))
Concrete trick : OpenMP Vectorization => SIMD #pragma omp simd Multi-operation with one instruction ⚠ non-aligned data Multi-Thread L3 cache-communication Shared memory How to use : #pragma omp parallel for default(none) shared(x,y) fisratPrivate(array) reduction(max:MaxValue) schedule(static) for(int i=0; i< 10000; i++){ something … } #pragma omp critical #pragma omp barrier
Multi-Chip | Multi-Sockets NUMA (Non-uniform memory access) slower than local memory Position in memory => first touch Parallelize the initialisation with : schedule(static) read only data => copy in each local memory Thread Affinity
Questions ?