Download presentation
Presentation is loading. Please wait.
Published byPaola Back Modified over 9 years ago
1
提升循环级并行 陈健2002/11 Copyright © 2002 Intel Corporation
2
Agenda Introduction Who Cares? Definition Loop Dependence and Removal Dependency Identification Lab Summary
3
Introduction Loops must meet certain criteria… –Iteration Independence –Memory Disambiguation –High Loop Count –Etc…
4
Who Cares 实现真正的并行 : –OpenMP –Auto Parallelization… 显式的指令级并行 ILP (Instruction Level Parallelism) –Streaming SIMD (MMX, SSE, SSE2, …) –Software Pipelining on Intel® Itanium™ Processor –Remove Dependencies for the Out-of-Order Core –More Instructions run in parallel on Intel Itanium- Processor 自动编译器并行 –High Level Optimizations
5
Definition int a[MAX]; for (J=0;J<MAX;J++) { a[J] = b[J]; } Loop Independence: Iteration Y of a loop is independent of when or whether iteration X happens
6
图例 OpenMP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP: Instruction Level Parallelism Green: Benefits from concept Yellow: Some Benefits from Concept Red: No Benefit from Concept
7
Agenda
8
Flow Dependency Read After Write Cross-Iteration Flow Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J-1]; } A[1]=A[0]; A[2]=A[1];
9
Anti-Dependency Write After Read Cross-Iteration Anti-Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J+1]; } A[1]=A[2]; A[2]=A[3];
10
Output Dependency Write After Write Cross-Iteration Output Dependence: Variables written then written again in a different iteration for (J=1; J<MAX; J++) { A[J]=B[J]; A[J+1]=C[J]; } A[1]=B[1]; A[2]=C[1]; A[2]=B[1]; A[3]=C[1];
11
IntraIteration Dependency Dependency within an iteration Hurts ILP May be automatically removed by compiler K = 1; for (J=1; J<MAX; J++) { A[J]=A[J] + 1; B[K]=A[K] + 1; K = K + 2; } A[1] = A[1] + 1; B[1]= A[1] + 1;
12
for (J=1; J<MAX; J++) { A[J]= A[0] + J; } Remove Dependencies Best Choice Requirement for true Parallelism Not all dependencies can be removed for (J=1; J<MAX; J++) { A[J]=A[J-1] + 1; }
13
for (J=1;J<MAX;J+=2) { A[J]=A[J-1] + B[J]; A[J+1]=A[J-1] + (B[J] + B[J+1]); } Increasing ILP, without removing dependencies Good: Unroll Loop Make sure the compiler can’t or didn’t do this for you Compiler should not apply common sub- expression elimination Also notice that if this is floating point data - precision could be altered for (J=1;J<MAX;J++) { A[J] =A[J-1] + B[J]; }
14
Induction Variables Induction variables are incremented on each trip through the loop Fix by replacing increment expressions with pure function of loop index i1 = 0; i2 = 0; for(J=0,J<MAX,J++) { i1 = i1 + 1; B(i1) = … i2 = i2 + J; A(i2) = … } for(J=0,J<MAX,J++) { B(J) =... A((J**2 + J)/2)=... }
15
Reductions Reductions collapse array data to scalar data via associative operations: Take advantage of associativity and compute partial sums or local maximum in private storage Next, combine partial results into shared result, taking care to synchronize access for (J=0; J<MAX; J++) sum = sum + c[J];
16
Data Ambiguity and the Compiler void func(int *a, int *b) { for (J=0;J<MAX;J++) { a[J] = b[J]; } Are the loop iterations independent? The C++ compiler has no idea No chance for optimization - In order to run error free the compiler assumes that a and b overlap
17
Function Calls for (J=0;J<MAX;J++) { compute(a[J],b[J]); a[J][1]=sin(b[J]); } Generally function calls inhibit ILP Exceptions: –Transcendentals –IPO compiles
18
Function Calls with State Many routines maintain state across calls: – –Memory allocation – –Pseudo-random number generators – –I/O routines – –Graphics libraries – –Third-party libraries Parallel access to such routines is unless synchronized Parallel access to such routines is unsafe unless synchronized Check documentation for specific functions to determine thread-safety
19
for(J=MAX-1;J>=0;J--){ compute(J,...) } A Simple Test 1.Reverse the loop order and rerun in serial 2.If results are unchanged, the loop is Independent* for(J=0;J compute(J,...) } *Exception: Loops with induction variables Reverse
20
Summary Loop Independence: Loop Iterations are independent of each other. Explained it’s importance –ILP and Parallelism Identified common causes of loop dependence –Flow Dependency, Anti-Dependency, Output Dependency Taught some methods of fixing loop dependence Reinforced concepts through lab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.