Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth.

Similar presentations


Presentation on theme: "Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth."— Presentation transcript:

1 Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth Roche

2 Optimisation of Linear Algebra Routines Traditional method: Hand-Optimisation for each platform ›Time-consuming ›Incompatible with Hardware Evolution ›Incompatible with changes in the system ›(architecture and basic libraries) ›Unsuitable for systems with variable load ›Misuse by non expert users

3 Solutions to this situation? Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. OCULTA

4 Our Approach Modelling the Linear Algebra Routine (LAR): T exec = f (SP, AP, n) SP:System Parameters AP:Algorithmic Parameters n:Problem size Estimation of SP Selection of AP values Execution of LAR DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

5 Our Approach LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME OCULTA

6 Our Approach LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2 Static Model of LAR: Situation of platform at installation time

7 Our Approach LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2 Static Model of LAR: Situation of platform at installation time Dynamic Model of LAR: Situation of platform at run-time.

8 DESIGN PROCESS DESIGNDESIGN LAR: Linear Algebra Routine Made by the LAR Designer LAR Example of LAR: Parallel Block LU factorisation

9 Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN

10 Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN T exec = f (SP, AP, n) SP: System Parameters AP: Algorithmic Parameters n : Problem size Made by the LAR-Designer Only once per LAR

11 Modelling the LAR LAR Modelling the LAR MODEL DESIGNDESIGN SP: k 3, k 2, t s, t w AP: p, b n : Problem size MODEL LAR: Parallel Block LU factorisation

12 Implementation of SP-Estimators LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators DESIGNDESIGN

13 Implementation of SP-Estimators LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators DESIGNDESIGN Estimators of Arithmetic-SP Computation Kernel of the LAR Similar storage scheme Similar quantity of data Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communication Similar quantity of data

14 INSTALLATION PROCESS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators INSTALLATIONINSTALLATION DESIGNDESIGN Installation Process Only once per Platform Done by the System Manager

15 Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION

16 Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION Basic Libraries Basic Communication Library: MPI PVM Basic Linear Algebra Library: reference-BLAS machine-specific-BLAS ATLAS Installation File SP values are obtained using the information (n and AP values) of this file.

17 Estimation of Static-SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN INSTALLATIONINSTALLATION Estimation of the Static-SP t w-static (in  sec) Message size (Kbytes)3225610242048 t w-static 0.7000.6900.6800.675 Platform:Cluster of Pentium III + Fast Ethernet Basic Libraries: ATLAS and MPI Estimation of the Static-SP k 3-static (in  sec) Block size163264128 k 3-static 0.00380.00330.00300.0027

18 RUN-TIME PROCESS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION

19 LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP RUN-TIME PROCESS: Static approach

20 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors. LU on IBM SP2 OCULTA

21 LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static approach

22 LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static approach p=4devStatic noptMODELMODEL 5120.250.250% 10241.361.360% 15363.223.220% 20486.766.760% 256011.8111.810% 307219.2819.411% OCULTA

23 LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION Optimum-AP Selection of Optimum AP Execution of LAR RUN-TIME PROCESS: Static p=8devStatic noptMODELMODEL 10240.930.996% 20484.984.980% 307213.8113.810% 409627.6529.316% OCULTA

24 LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File DESIGNDESIGN RUN-TIMERUN-TIME INSTALLATIONINSTALLATION RUN-TIME PROCESS: Dynamic Approach

25 Call to NWS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

26 RUN-TIMERUN-TIME NWS Information Call to NWS The NWS is called and it reports:  the fraction of available CPU (f CPU )  the current word sending time (t w- current ) for a specific n and AP values (n 0, AP 0 ). Then the fraction of available network is calculated:

27 Call to NWS LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

28 Dynamic Adjustment of SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

29 Dynamic Adjustment of SP Current-SP Dynamic Adjustment of SP NWS Information Call to NWS The values of the SP are adjusted, according to the current situation: Static-SP-File RUN-TIMERUN-TIME

30 Dynamic Adjustment of SP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

31 Selection of Optimum AP LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

32 Optimum-AP Selection of Optimum AP RUN-TIMERUN-TIME Selection of Optimum AP Current-SP Dynamic Adjustment of SP NWS Information Call to NWS Static-SP-File OCULTA

33 Execution of LAR LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME

34 Execution of LAR LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Estimation of Static-SP Static-SP-File Basic LibrariesInstallation-File Current-SP Dynamic Adjustment of SP Optimum-AP Selection of Optimum AP Execution of LAR NWS Information Call to NWS DESIGNDESIGN INSTALLATIONINSTALLATION RUN-TIMERUN-TIME OCULTA

35 Platform load: different situations studied nodo1nodo2nodo3nodo4nodo5nodo6nodo7nodo8 Situation A CPU avail.100%100%100%100%100%100%100%100% t w-current 0.7  sec Situation B CPU avail.80%80%80%80%100% 100%100%100% t w-current 0.8  sec0.7  sec Situation C CPU avail.60%60%60%60%100%100%100%100% t w-current 1.8  sec0.7  sec Situation D CPU avail.60%60%60%60%100%100%80%80% t w-current 1.8  sec0.7  sec0.8  sec Situation E CPU avail.60%60%60%60%100%100%50%50% t w-current 1.8  sec0.7  sec4.0  sec

36 Platform load: different situations studied OCULTA

37 Optimum AP for the different situations studied Block size Situations of the Platform Load nABCDE 10243232646464 2048646464128128 30726464128128128 Number of nodes to use p = r  c Situations of the Platform Load nABCDE 10244  24  22  22  2 2  1 20484  24  22  22  22  1 30724  24  22  2 2  22  1

38 Experimental Time: deviations from the Optimum

39

40

41 Conclusions and Future Work The use of the proposed methodology is viable in systems where the load is stable or variable. Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time. The heterogeneous load case offers many more possibilities than the one studied.


Download ppt "Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth."

Similar presentations


Ads by Google