Download presentation
Presentation is loading. Please wait.
Published byΚαλυψώ Δασκαλόπουλος Modified over 6 years ago
1
Automatic optimization of parallel linear algebra software
Domingo Giménez Department of Programming, Languages and Systems Teaching Algorithms and Parallel Programming Javier’s Ph. D. Director in collaboration with José Gonzalez (Department of Computer Architecture) Javier Cuenca Department of Computer Architecture Teaching Computer Structure Ph. D. Student: Automatic optimization of parallel linear algebra software University of Murcia SPAIN ICL September 2001
2
Current Situation of Linear Algebra Parallel Routines
Linear Algebra: highly optimizable operations, but optimizations are Platform Specific Traditional method: Hand-Optimization for each platform Time-consuming Incompatible with Hardware Evolution Incompatible with changes in the system (architecture and basic libraries) Unsuitable for systems with variable workload Misuse by non expert users ICL September 2001
3
Solutions to this situation?
Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. ICL September 2001
4
Our approach Routines Parameterised:
System parameters, Algorithmic parameters System parameters obtained at installation time Analytical model of the routine and simple installation routines to obtain the system parameters A reduced number of executions at installation time Algorithmic parameters From the analytical model with the system parameters obtained in the installation process ICL September 2001
5
Our approach: the scheme
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS LIBRARY SYSTEM MANAGER ICL September 2001
6
Design: Modelling the LAR
LAR-DESIGNER LAR MODELLING LAR LAR-MOD ICL September 2001
7
LAR-MOD:Analytical Model of LAR
The behaviour of the algorithm on the platform is defined Texec = f (SPs, n, APs) SPs = f(n, APs) System Parameters APs Algorithmic Parameters n Problem Size ICL September 2001
8
LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries LARs Performance ICL September 2001
9
LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs) LARs Performance ICL September 2001
10
LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs): ts start-up time tw word-sending time Arithmetic System Parameters (ASPs) LARs Performance ICL September 2001
11
LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs): tc arithmetic cost. Using BLAS: k1 k2 and k3 LARs Performance ICL September 2001
12
LAR-MOD:Analytical Model of LAR
System Parameters (SPs): Hardware Platform Physical Characteristics Current Conditions Basic libraries How to estimate each SP? 1º.- Obtain the kernel of performance cost of LAR 2º.- Make an Estimation Routine from this kernel LARs Performance ICL September 2001
13
Design ICL September 2001 LAR-DESIGNER LAR MODELLING LAR LAR-MOD D E S
14
Design: Making the LAR-ERs
LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001
15
LAR-ERs: Estimation Routines
Arithmetic System Parameters (ASPs): Computation Kernel of the LAR Estimation Routine Similar storage scheme Similar quantity of data Communication System Parameters (CSPs): Communication Kernel of the LAR Estimation Routine Similar kind of communication ICL September 2001
16
Design ICL September 2001 LAR-DESIGNER LAR MODELLING LAR
IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001
17
Design: Process has finished
LAR-DESIGNER HAND-MADE ONLY ONCE LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs ICL September 2001
18
Installation: Runing the LAR-ERs
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF LAR-SPF SYSTEM MANAGER ICL September 2001
19
Installation: obtaining the OAP
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER ICL September 2001
20
Installation: obtaining the OAP
Algorithmic Parameters (APs) Known the SPs values, the Optimum Values for the APs are calculated (OAP): b block size p number of processors r c logical topology grid configuration (logical 2D mesh) ICL September 2001
21
Installation ICL September 2001 LAR-DESIGNER LAR MODELLING LAR
IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER ICL September 2001
22
Installation: putting it all together
D E S I G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS LIBRARY SYSTEM MANAGER ICL September 2001
23
Installation process finished
G N LAR-DESIGNER LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs I N S T A L O BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS SYSTEM MANAGER LIBRARY ICL September 2001
24
Experiments LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem. Platform: SGI Origin 2000 LAR: Gaussian elimination. Platform: NoW (heterogeneous system) LAR: block LU factorization. Platforms: IBM SP2, SGI Origin 2000, NoW Basic Libraries: reference BLAS, machine BLAS, ATLAS ICL September 2001
25
Jacobi on Origin 2000 Comparison of execution times using different sets of Algorithm Parameters (8 processors) ICL September 2001
26
LU on IBM SP2 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors. ICL September 2001
27
LU on Origin 2000 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4, 8 and 16 processors. ICL September 2001
28
LU on NoW Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic libraries. ICL September 2001
29
Gaussian elimination on Heterogeneous NoW
Homogeneous Hybrid Heterogeneous Quotient between the execution time with the parameters from the Installation Routine and the optimum execution time ICL September 2001
30
Future Works We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries: it is necessary to analyse the methodology in more systems and with more routines The Basic Linear Algebra Library to use can be considered as another parameter An installation strategy common to a set of routines must be developed At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes ICL September 2001
31
ICL September 2001
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.