Presentation is loading. Please wait.

Presentation is loading. Please wait.

Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia.

Similar presentations


Presentation on theme: "Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia."— Presentation transcript:

1 Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia SPAIN Towards the Design of an Automatically Tuned Linear Algebra Library

2  Linear Algebra: highly optimizable operations, but optimizations are Platform Specific  Traditional method: Hand-Optimization for each platform Time-consuming Incompatible with Hardware Evolution Incompatible with changes in the system (architecture and basic libraries) Unsuitable for systems with variable workload Misuse by non expert users Current Situation of Linear Algebra Parallel Routines

3 Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. Solutions to this situation?

4  Routines Parameterised: System parameters, Algorithmic parameters  System parameters obtained at installation time Analytical model of the routine and simple installation routines to obtain the system parameters A reduced number of executions at installation time  Algorithmic parameters From the analytical model with the system parameters obtained in the installation process Our approach

5 Our approach: the scheme LAR-IF EXECUT. OF LAR-ERs BL LIBRARY INCLUSION PROCESS LAR-OAPF OAP SELECTION LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

6 Design: Modelling the LAR LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR

7 The behaviour of the algorithm on the platform is defined T exec = f (SPs, n, APs)  SPs = f(n, APs)System Parameters  APsAlgorithmic Parameters  nProblem Size LAR-MOD:Analytical Model of LAR

8 System Parameters (SPs): Hardware Platform  Physical Characteristics  Current Conditions Basic libraries LARs Performance LAR-MOD:Analytical Model of LAR

9 System Parameters (SPs): Hardware Platform  Physical Characteristics  Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs) LARs Performance LAR-MOD:Analytical Model of LAR

10 System Parameters (SPs): Hardware Platform  Physical Characteristics  Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs): t s start-up time t w word-sending time Arithmetic System Parameters (ASPs) LARs Performance LAR-MOD:Analytical Model of LAR

11 System Parameters (SPs): Hardware Platform  Physical Characteristics  Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs): t c arithmetic cost. Using BLAS: k 1 k 2 and k 3 LARs Performance LAR-MOD:Analytical Model of LAR

12 System Parameters (SPs): Hardware Platform  Physical Characteristics  Current Conditions Basic libraries How to estimate each SP? 1º.- Obtain the kernel of performance cost of LAR 2º.- Make an Estimation Routine from this kernel LARs Performance LAR-MOD:Analytical Model of LAR

13 Design LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR

14 Design: Making the LAR-ERs IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

15 Arithmetic System Parameters (ASPs): Computation Kernel of the LAR  Estimation Routine  Similar storage scheme  Similar quantity of data Communication System Parameters (CSPs): Communication Kernel of the LAR  Estimation Routine  Similar kind of communication  Similar quantity of data LAR-ERs: Estimation Routines

16 IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs Design

17 IMPLEMEN. OF LAR-ERs LAR-DESIGNER HAND-MADE ONLY ONCE MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs Design: Process has finished

18 Installation: Runing the LAR-ERs LAR-IF EXECUT. OF LAR-ERs BL LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

19 Installation: obtaining the OAP LAR-IF EXECUT. OF LAR-ERs BL LAR-OAPF OAP SELECTION LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

20 Algorithmic Parameters (APs) Known the SPs values, the Optimum Values for the APs are calculated ( OAP ): b block size pnumber of processors r  c logical topology grid configuration (logical 2D mesh) Installation: obtaining the OAP

21 Installation LAR-IF EXECUT. OF LAR-ERs BL LAR-OAPF OAP SELECTION LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

22 Installation: putting it all together LAR-IF EXECUT. OF LAR-ERs BL LIBRARY INCLUSION PROCESS LAR-OAPF OAP SELECTION LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

23 Installation process finished LAR-IF EXECUT. OF LAR-ERs BL LIBRARY INCLUSION PROCESS LAR-OAPF OAP SELECTION LAR-SPF INSTALLATIONINSTALLATION SYSTEM MANAGER IMPLEMEN. OF LAR-ERs LAR-DESIGNER MODELLING LAR LAR-MOD DESIGNDESIGN LAR LAR-ERs

24  LAR: Least Squares Toeplitz Routine. Platform:Network of PCs  LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem. Platform:SGI Origin 2000  LAR: Gaussian elimination. Platform:NoW (heterogeneous system)  LAR: block LU factorization. Platforms:IBM SP2, SGI Origin 2000, NoW Basic Libraries:reference BLAS, machine BLAS, ATLAS Experiments

25 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors. LU on IBM SP2

26 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4, 8 and 16 processors. LU on Origin 2000

27 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic libraries. LU on NoW

28  We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries: it is necessary to analyse the methodology in more systems and with more routines  The Basic Linear Algebra Library to use can be considered as another parameter  An installation strategy common to a set of routines must be developed  At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes  We are working in the design of a strategy for the parameters election in dynamic systems Future Works


Download ppt "Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia."

Similar presentations


Ads by Google