Download presentation
Presentation is loading. Please wait.
1
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth Roche
2
Optimisation of Linear Algebra Routines Traditional method: Hand-Optimisation for each platform ›Time-consuming ›Incompatible with Hardware Evolution ›Incompatible with changes in the system ›(architecture and basic libraries) ›Unsuitable for systems with variable load ›Misuse by non expert users
3
Solutions to this situation? Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. OCULTA
4
Our Approach Modelling the Linear Algebra Routine (LAR): T exec = f (SP, AP, n) SP:System Parameters AP:Algorithmic Parameters n:Problem size Estimation of SP Selection of AP values Execution of LAR DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
5
Our Approach LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME OCULTA
6
Our Approach LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2 Static Model of LAR: Situation of platform at installation time
7
Our Approach LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2 Static Model of LAR: Situation of platform at installation time Dynamic Model of LAR: Situation of platform at run-time.
8
DESIGN PROCESS DESIGNDESIGN LAR: Linear Algebra Routine Made by the LAR Designer LAR Example of LAR: Parallel Block LU factorisation
9
Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN
10
Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN T exec = f (SP, AP, n) SP: System Parameters AP: Algorithmic Parameters n : Problem size Made by the LAR-Designer Only once per LAR
11
Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN SP: k 3, k 2, t s, t w AP: p, b n : Problem size MODEL LAR: Parallel Block LU factorisation
12
Implementation of SP-Estimators LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators DESIGNDESIGN
13
Implementation of SP-Estimators LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators DESIGNDESIGN Estimators of Arithmetic-SP Computation Kernel of the LAR Similar storage scheme Similar quantity of data Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communication Similar quantity of data
14
INSTALLATION PROCESS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators INSTALLATIONINSTALLATION DESIGNDESIGN Installation Process Only once per Platform Done by the System Manager
15
Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION
16
Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION Basic Libraries Basic Communication Library: MPI PVM Basic Linear Algebra Library: reference-BLAS machine-specific-BLAS ATLAS Installation File SP values are obtained using the information (n and AP values) of this file.
17
Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION Estimation of the Static-SP t w-static (in sec) Message size (Kbytes)3225610242048 t w-static 0.7000.6900.6800.675 Platform:Cluster of Pentium III + Fast Ethernet Basic Libraries: ATLAS and MPI Estimation of the Static-SP k 3-static (in sec) Block size163264128 k 3-static 0.00380.00330.00300.0027
18
RUN-TIME PROCESS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION
19
LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP RUN-TIME PROCESS: Static approach
20
Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors. LU on IBM SP2 OCULTA
21
LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static approach
22
LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static approach p=4devStatic noptMODELMODEL 5120.250.250% 10241.361.360% 15363.223.220% 20486.766.760% 256011.8111.810% 307219.2819.411% OCULTA
23
LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static p=8devStatic noptMODELMODEL 10240.930.996% 20484.984.980% 307213.8113.810% 409627.6529.316% OCULTA
24
LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION RUN-TIME PROCESS: Dynamic Approach
25
Call to NWS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
26
RUN-TIMERUN-TIME NWS Information Call to NWS The NWS is called and it reports: the fraction of available CPU (f CPU ) the current word sending time (t w- current ) for a specific n and AP values (n 0, AP 0 ). Then the fraction of available network is calculated:
27
Call to NWS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
28
Dynamic Adjustment of SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
29
Dynamic Adjustment of SP Current-SP Dynamic Adjustment of SP NWS Information Call to NWS The values of the SP are adjusted, according to the current situation: Static-SP-File RUN-TIMERUN-TIME
30
Dynamic Adjustment of SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
31
Selection of Optimum AP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
32
Optimum-AP Selection of Optimum AP RUN-TIMERUN-TIME Selection of Optimum AP Current-SP Dynamic Adjustment of SP NWS Information Call to NWS Static-SP-File OCULTA
33
Execution of LAR LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME
34
Execution of LAR LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME OCULTA
35
Platform load: different situations studied nodo1nodo2nodo3nodo4nodo5nodo6nodo7nodo8 Situation A CPU avail.100%100%100%100%100%100%100%100% t w-current 0.7 sec Situation B CPU avail.80%80%80%80%100% 100%100%100% t w-current 0.8 sec0.7 sec Situation C CPU avail.60%60%60%60%100%100%100%100% t w-current 1.8 sec0.7 sec Situation D CPU avail.60%60%60%60%100%100%80%80% t w-current 1.8 sec0.7 sec0.8 sec Situation E CPU avail.60%60%60%60%100%100%50%50% t w-current 1.8 sec0.7 sec4.0 sec
36
Platform load: different situations studied OCULTA
37
Optimum AP for the different situations studied Block size Situations of the Platform Load nABCDE 10243232646464 2048646464128128 30726464128128128 Number of nodes to use p = r c Situations of the Platform Load nABCDE 10244 24 22 22 2 2 1 20484 24 22 22 22 1 30724 24 22 2 2 22 1
38
Experimental Time: deviations from the Optimum
41
Conclusions and Future Work The use of the proposed methodology is viable in systems where the load is stable or variable. Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time. The heterogeneous load case offers many more possibilities than the one studied.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.