Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic optimization of parallel linear algebra software

Similar presentations


Presentation on theme: "Automatic optimization of parallel linear algebra software"— Presentation transcript:

1 Automatic optimization of parallel linear algebra software
Domingo Giménez Department of Programming, Languages and Systems Teaching Algorithms and Parallel Programming Javier’s Ph. D. Director in collaboration with José Gonzalez (Department of Computer Architecture) Javier Cuenca Department of Computer Architecture Teaching Computer Structure Ph. D. Student: Automatic optimization of parallel linear algebra software University of Murcia SPAIN ICL September 2001

2 Current Situation of Linear Algebra Parallel Routines
Linear Algebra: highly optimizable operations, but optimizations are Platform Specific Traditional method: Hand-Optimization for each platform Time-consuming Incompatible with Hardware Evolution Incompatible with changes in the system (architecture and basic libraries) Unsuitable for systems with variable workload Misuse by non expert users ICL September 2001

3 Solutions to this situation?
Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. ICL September 2001

4 Our approach Routines Parameterised:
System parameters, Algorithmic parameters System parameters obtained at installation time Analytical model of the routine and simple installation routines to obtain the system parameters A reduced number of executions at installation time Algorithmic parameters From the analytical model with the system parameters obtained in the installation process ICL September 2001

5 Our approach: the scheme
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS LIBRARY SYSTEM MANAGER ICL September 2001

6 Design: Modelling the LAR
LAR-DESIGNER LAR MODELLING LAR LAR-MOD ICL September 2001

7 LAR-MOD:Analytical Model of LAR
The behaviour of the algorithm on the platform is defined Texec = f (SPs, n, APs) SPs = f(n, APs) System Parameters APs Algorithmic Parameters n Problem Size ICL September 2001

8 LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries LARs Performance ICL September 2001

9 LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs) LARs Performance ICL September 2001

10 LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs): ts start-up time tw word-sending time Arithmetic System Parameters (ASPs) LARs Performance ICL September 2001

11 LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs): tc arithmetic cost. Using BLAS: k1 k2 and k3 LARs Performance ICL September 2001

12 LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries How to estimate each SP? 1º.- Obtain the kernel of performance cost of LAR 2º.- Make an Estimation Routine from this kernel LARs Performance ICL September 2001

13 Design ICL September 2001 LAR-DESIGNER LAR MODELLING LAR LAR-MOD D E S

14 Design: Making the LAR-ERs
LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001

15 LAR-ERs: Estimation Routines
Arithmetic System Parameters (ASPs): Computation Kernel of the LAR  Estimation Routine Similar storage scheme Similar quantity of data Communication System Parameters (CSPs): Communication Kernel of the LAR  Estimation Routine Similar kind of communication ICL September 2001

16 Design ICL September 2001 LAR-DESIGNER LAR MODELLING LAR
IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001

17 Design: Process has finished
LAR-DESIGNER HAND-MADE ONLY ONCE LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001

18 Installation: Runing the LAR-ERs
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF LAR-SPF SYSTEM MANAGER ICL September 2001

19 Installation: obtaining the OAP
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER ICL September 2001

20 Installation: obtaining the OAP
Algorithmic Parameters (APs) Known the SPs values, the Optimum Values for the APs are calculated (OAP): b block size p number of processors r  c logical topology grid configuration (logical 2D mesh) ICL September 2001

21 Installation ICL September 2001 LAR-DESIGNER LAR MODELLING LAR
IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER ICL September 2001

22 Installation: putting it all together
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS LIBRARY SYSTEM MANAGER ICL September 2001

23 Installation process finished
G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS SYSTEM MANAGER LIBRARY ICL September 2001

24 Experiments LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem. Platform: SGI Origin 2000 LAR: Gaussian elimination. Platform: NoW (heterogeneous system) LAR: block LU factorization. Platforms: IBM SP2, SGI Origin 2000, NoW Basic Libraries: reference BLAS, machine BLAS, ATLAS ICL September 2001

25 Jacobi on Origin 2000 Comparison of execution times using different sets of Algorithm Parameters (8 processors) ICL September 2001

26 LU on IBM SP2 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors. ICL September 2001

27 LU on Origin 2000 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4, 8 and 16 processors. ICL September 2001

28 LU on NoW Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic libraries. ICL September 2001

29 Gaussian elimination on Heterogeneous NoW
Homogeneous Hybrid Heterogeneous Quotient between the execution time with the parameters from the Installation Routine and the optimum execution time ICL September 2001

30 Future Works We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries: it is necessary to analyse the methodology in more systems and with more routines The Basic Linear Algebra Library to use can be considered as another parameter An installation strategy common to a set of routines must be developed At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes ICL September 2001

31                                             ICL September 2001


Download ppt "Automatic optimization of parallel linear algebra software"

Similar presentations


Ads by Google