提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation
Agenda Introduction Who Cares? Definition Loop Dependence and Removal Dependency Identification Lab Summary
Introduction Loops must meet certain criteria… –Iteration Independence –Memory Disambiguation –High Loop Count –Etc…
Who Cares 实现真正的并行 : –OpenMP –Auto Parallelization… 显式的指令级并行 ILP (Instruction Level Parallelism) –Streaming SIMD (MMX, SSE, SSE2, …) –Software Pipelining on Intel® Itanium™ Processor –Remove Dependencies for the Out-of-Order Core –More Instructions run in parallel on Intel Itanium- Processor 自动编译器并行 –High Level Optimizations
Definition int a[MAX]; for (J=0;J<MAX;J++) { a[J] = b[J]; } Loop Independence: Iteration Y of a loop is independent of when or whether iteration X happens
图例 OpenMP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP: Instruction Level Parallelism Green: Benefits from concept Yellow: Some Benefits from Concept Red: No Benefit from Concept
Agenda
Flow Dependency Read After Write Cross-Iteration Flow Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J-1]; } A[1]=A[0]; A[2]=A[1];
Anti-Dependency Write After Read Cross-Iteration Anti-Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J+1]; } A[1]=A[2]; A[2]=A[3];
Output Dependency Write After Write Cross-Iteration Output Dependence: Variables written then written again in a different iteration for (J=1; J<MAX; J++) { A[J]=B[J]; A[J+1]=C[J]; } A[1]=B[1]; A[2]=C[1]; A[2]=B[1]; A[3]=C[1];
IntraIteration Dependency Dependency within an iteration Hurts ILP May be automatically removed by compiler K = 1; for (J=1; J<MAX; J++) { A[J]=A[J] + 1; B[K]=A[K] + 1; K = K + 2; } A[1] = A[1] + 1; B[1]= A[1] + 1;
for (J=1; J<MAX; J++) { A[J]= A[0] + J; } Remove Dependencies Best Choice Requirement for true Parallelism Not all dependencies can be removed for (J=1; J<MAX; J++) { A[J]=A[J-1] + 1; }
for (J=1;J<MAX;J+=2) { A[J]=A[J-1] + B[J]; A[J+1]=A[J-1] + (B[J] + B[J+1]); } Increasing ILP, without removing dependencies Good: Unroll Loop Make sure the compiler can’t or didn’t do this for you Compiler should not apply common sub- expression elimination Also notice that if this is floating point data - precision could be altered for (J=1;J<MAX;J++) { A[J] =A[J-1] + B[J]; }
Induction Variables Induction variables are incremented on each trip through the loop Fix by replacing increment expressions with pure function of loop index i1 = 0; i2 = 0; for(J=0,J<MAX,J++) { i1 = i1 + 1; B(i1) = … i2 = i2 + J; A(i2) = … } for(J=0,J<MAX,J++) { B(J) =... A((J**2 + J)/2)=... }
Reductions Reductions collapse array data to scalar data via associative operations: Take advantage of associativity and compute partial sums or local maximum in private storage Next, combine partial results into shared result, taking care to synchronize access for (J=0; J<MAX; J++) sum = sum + c[J];
Data Ambiguity and the Compiler void func(int *a, int *b) { for (J=0;J<MAX;J++) { a[J] = b[J]; } Are the loop iterations independent? The C++ compiler has no idea No chance for optimization - In order to run error free the compiler assumes that a and b overlap
Function Calls for (J=0;J<MAX;J++) { compute(a[J],b[J]); a[J][1]=sin(b[J]); } Generally function calls inhibit ILP Exceptions: –Transcendentals –IPO compiles
Function Calls with State Many routines maintain state across calls: – –Memory allocation – –Pseudo-random number generators – –I/O routines – –Graphics libraries – –Third-party libraries Parallel access to such routines is unless synchronized Parallel access to such routines is unsafe unless synchronized Check documentation for specific functions to determine thread-safety
for(J=MAX-1;J>=0;J--){ compute(J,...) } A Simple Test 1.Reverse the loop order and rerun in serial 2.If results are unchanged, the loop is Independent* for(J=0;J compute(J,...) } *Exception: Loops with induction variables Reverse
Summary Loop Independence: Loop Iterations are independent of each other. Explained it’s importance –ILP and Parallelism Identified common causes of loop dependence –Flow Dependency, Anti-Dependency, Output Dependency Taught some methods of fixing loop dependence Reinforced concepts through lab