Download presentation
Presentation is loading. Please wait.
1
Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya Unnikrishnan IBM Toronto Lab priyau@ca.ibm.com CASCON 2005 priyau@ca.ibm.com
2
Software Group © 2005 IBM Corporation October 2005 Overview Parallelization in IBM XL compilers Outlining Automatic parallelization Cost analysis Controlled parallelization Future work
3
Software Group © 2005 IBM Corporation October 2005 Parallelization IBM XL compilers support Fortran 77/90/95, C and C++ Implements both OpenMP and Auto-parallelization. Both target SMP (shared memory parallel) machines Non-threadsafe code generated by default –Use the _r invocation (xlf_r, xlc_r … ) to generate threadsafe code
4
Software Group © 2005 IBM Corporation October 2005 Parallelization options -qsmp=nooptParallelizes code with minimal optimization to allow for better debugging of OpenMP applications. -qsmp=ompParallelizes code containing OpenMP directives -qsmp=autoAutomatically parallelizes loops -qsmp=noautoNo auto-parallelization. Processes IBM and OpenMP parallel directives.
5
Software Group © 2005 IBM Corporation October 2005 Outlining Parallelization transformation
6
Software Group © 2005 IBM Corporation October 2005 Outlining long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) endif return main; } int main{}{ #pragma omp parallel for for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( unsigned @LB, unsigned @UB){ @CIV1 =0; do{ a[]0[(long)@LB + CIV1] = const; …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime call Outlined routine
7
Software Group © 2005 IBM Corporation October 2005 SMP parallel runtime _xlsmpParallelDoSetup_TPO(&main@OL@1,0,n..) main@OL@1(30,39) main@OL@1(0,9) main@OL@1(10,19) main@OL@1(20,29) The outlined function is parameterized – can be invoked for different ranges in the iteration space
8
Software Group © 2005 IBM Corporation October 2005 Auto-parallelization Integrated framework for OpenMP and auto-parallelization Auto-parallelization is restricted to loops. Auto-parallelization is done in the link step when possible. This allows us to perform various interprocedural analysis and optimizations before automatic parallelization
9
Software Group © 2005 IBM Corporation October 2005 Auto-parallelization transformation int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } + int main{}{ #auto-parallel-loop for(int i=0; i<n; i++) { a[i] = const; …… } Outlining
10
Software Group © 2005 IBM Corporation October 2005 We can auto-parallelize OpenMP applications – skipping user-parallel code – good thing!! int main{}{ for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; } + Outlining int main{}{ #auto-parallel-loop for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; }
11
Software Group © 2005 IBM Corporation October 2005 Pre-parallelization phase Loop Normalization (normalize countable loops) Scalar privatization Array privatization Reduction variable analysis Loop interchange (that helps parallelization)
12
Software Group © 2005 IBM Corporation October 2005 Cost Analysis Automatic parallelization tests –Dependence analysis : Is it safe to parallelize ?? –Cost analysis : Is it worthwhile to parallelize ?? Cost analysis: Estimates the total workload of the loop LoopCost = ( IterationCount * ExecTimeOfLoopBody ) Cost known at compile time – trivial Runtime cost analysis is more complex
13
Software Group © 2005 IBM Corporation October 2005 Conditional Parallelization long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) } else main@OL@1(0,0,(unsigned)n,0) endif return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime check
14
Software Group © 2005 IBM Corporation October 2005 Runtime cost analysis challenges Runtime checks should be –Light weight : should not introduce large overhead in applications that are mostly serial –Overflow problems : leads to incorrect decision – costly!! loopcost = ((( c1*n1 ) + (c2*n2) + const)*n3)* … –Restricted to integer operations –Should be accurate Balance all the above factors
15
Software Group © 2005 IBM Corporation October 2005 Runtime dependence test long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if( && loop_cost>threshold){ _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) } else main@OL@1(0,0,(unsigned)n,0) endif return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime dependence Work by Peng Zhao
16
Software Group © 2005 IBM Corporation October 2005
17
Software Group © 2005 IBM Corporation October 2005 Controlled parallelization Cost analysis selects big loops Controlled parallelization –Selection is not enough –Parallel performance dependent on ( amount of work + number of processors used) –Using large number of processors for a small loop huge degradations !!
18
Software Group © 2005 IBM Corporation October 2005 Measured on a 64-way Power5 processor Small is good !!!
19
Software Group © 2005 IBM Corporation October 2005 Controlled parallelization Introduce another runtime parameter IPT (minimum iterations per thread) The IPT is passed to the SMP runtime SMP runtime limits the number of threads working on the parallel loop based on IPT IPT = function( loop_cost, mem access info.. )
20
Software Group © 2005 IBM Corporation October 2005 Controlled Parallelization long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ IPT = func(loop_cost) _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,IPT) endif } else main@OL@1(0,0,(unsigned)n,0) } return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime parameter
21
Software Group © 2005 IBM Corporation October 2005 SMP parallel runtime _xlsmpParallelDoSetup_TPO(&main@OL@1,0,n..IPT) { threadsUsed = IterCount/IPT if (threadsUsed > threadsAvailable) threadsUsed = threadsAvailable ….. }
22
Software Group © 2005 IBM Corporation October 2005 Controlled parallelization for OpenMP Improves performance and scalability Allows fine grained control at loop level granularity Can be applied to OpenMP loops as well Adjust number of threads when ENV variable OMP_DYNAMIC is turned on. Issues with threadprivate data Encouraging results in galgel
23
Software Group © 2005 IBM Corporation October 2005 Measured on a 64-way Power5 processor
24
Software Group © 2005 IBM Corporation October 2005 Future work Improve cost analysis algorithm and fine tune heuristics Implement interprocedural cost analysis. Extend cost analysis and controlled parallelization to non loops in user-parallel code – for scalability Implement interprocedural dependence analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.