1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba 2 nd NEGST workshop at Tokyo May th, 2007
2 Outlines Introduction Distribution of the numerical method Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments Conclusion
3 Outlines ➔ Introduction Distribution of the numerical method Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments Conclusion
4 Introduction Huge number of nodes connected to Internet Clusters and NOWs of institutions,PCs of individual users Volunteer Constant availability of nodes, on-demand access HPC and large Grid Computing are complementary We do not target the highest performances We target a different community of users Why the real symmetric eigenproblem? Requires a lot of resources on the nodes Communications, synchronization points Useful problem Few similar studies for very large Grid Computing
5 Outlines Introduction ➔ Distribution of the numerical method Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments Conclusion
6 Distribution of the numerical method (1/2) Real symmetric eigenproblem Au=lu, A real symmetric Main steps: Lanczos tridiagonalization T=Q t AQ, T real symmetric tridiagonal Data accessed by means of MVP Bisection and Inverse Iteration Tv=lv, same eigenvalues as A (Ritz eigenvalues) Communication-free parallelism: task-farming Ritz eigenvectors computations (u) Accuracy tests |Au-lu| 2 <eps
7 Distribution of the numerical method (2/2) Reducing the memory usage Out-of-core Restarted scheme Reorthogonalization Bisection, Inverse Iteration Reduces the disk usage too Volume of communications Data-persistence (A and Q) Number of communications Task-farming Other issue to be improved Distribution of A
8 Outlines Introduction Distribution of the numerical method ➔ Experiments ➔ Experiments on world-wide grids: platforms, numerical settings ➔ Experiments on Grid'5000: motivations, platforms, numerical settings Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments Conclusion
9 World-wide grid experiments Experimental platforms, numerical settings (1/2) Computing and network resources University of Tsukuba Homogeneous dedicated clusters Dual Xeon ~3GHz, 1 to 4 GB University of Lille 1 Heterogeneous NOWs Celeron 1.4 GHz to P4 3.2 Ghz 128MB to 1GB Shared with students Internet
10 World-wide grid experiments Experimental platforms, numerical settings (2/2) 4 Platforms OmniRPC 2 local platforms: 29 / 58 nodes, Lille 2 world-wide platforms 58 (29 Lille+ 29 Tsukuba dual-proc.) 116 (58 Lille, 58 Tsukuba dual-proc.) Matrix N= million elements, avg 48 nnz/row Parameters M=10, 15, 20, 25 K=1, 2, 3, 4
11 Grid'5000 experiments Presentation, motivations Up to 9 sites distributed in France Dedicated PC with reservation policy Fast and dedicated Network RENATER (1GBit/s to 10GBit/s) PC are homogeneous (few exceptions) Homogeneous environment (deployment strategy) For those experiments Orsay: up to 300 single-CPU nodes Lille: up to 60 single-CPU nodes Nice: up to 60 dual-CPU nodes Rennes: up to 70 dual-CPU nodes
12 Grid'5000 experiments Platforms and numerical settings (1/2) Step 1: Goal: improving previous analysis. Platforms 29 Orsay, single-proc 58 Orsay, single-proc 58 Lille, Sophia dual-proc 116 Orsay, Sophia dual-proc (1 core/proc) Orsay, Lille, Sophia dual-proc (1 core/proc) 1 process/dual-processor Numerical settings Matrix: N=47792, 2.5 million elements, avg 48 nnz/row Parameters m=10, 15, 20, 25 k=1, 2, 3, 4
13 Grid'5000 experiments Platforms and numerical settings (2/2) Step 2: Goal: increasing the size of the problem. In progress N=430128, 193 million elements 7 OmniRPC relay nodes, 206 CPU 3 sites 11 OmniRPC relay nodes, 412 CPU 4 sites k=1, m=15
14 Outlines Introduction Distribution of the numerical method ➔ Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings ➔ Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments Conclusion
15 World-wide grid experiments Results Sing. Proc. Orsay Dual. Proc. Tsukuba (all proc. Used) 116 Sing. Proc. Lille Dual. Proc. Tsukuba (all proc. Used) 58 Sing. Proc Lille58 Sing. Proc. Lille29
16 Grid'5000 experiments – step 1 Results Sing. Proc. Orsay Sing. Proc. Lille Dual. Proc. Sophia (1 proc. Used) 116 Sing. Proc. Orsay Dual. Proc. Sophia (all proc. Used) 116 Sing. Proc. Lille Dual. Proc. Sophia (all proc. Used) 58 Sing. Proc Orsay58 Sing. Proc. Orsay29
17 Grid'5000 experiments – step 2 Results 119Ritz eigenvector 9<1Bisection + Inverse Iteration Wall-clock time Send new column of Q: 20 MVP: Reorthog: 159 Send new column of Q: 22 MVP: Reorthog: 129 Lanczos tridiagonalization Details for N=430128, m=15, k=1 Wall-clock times in seconds |Au-lu| < eps Number cpu Evaluation of the wall-clock-time for 1 MVP with the matrix A In the tridiagonalization: 15(m)*5(nb restarts)=75 MVPs 134 sec (206 cpu) and 164 sec (412 cpu) per MVP In the tests of convergence: 5(nb restarts) MVPs 138 sec (206 cpu) and 162 sec (412 cpu) per MVP
18 Outlines Introduction Distribution of the numerical method Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings Results ➔ YML ➔ Progress of YML ➔ YvetteML workflow of the real symmetric eigenproblem ➔ First experiments Conclusion
19 Progress of YML YML Stability, error reporting Collections of data out-of-core Variable lists of parameters Parameters in/out of the Workflow Mainly developed at the PRiSM laboratory, University of Versailles Olivier Delannoy, Nahid Emad
20 Resolution of the eigenproblem with YML No data persistence Future work: binary cache Re-usability / aggregation of components
21 Experiments with YML & OmniRPC back-end YML + OmniRPC back-end (wall-clock times in min) OmniRPC (wall-clock times in min) Overhead (in %) Sources of overhead No computation in the YvetteML workflow Sheduler, (un)packing the parameters Transfers of binaries
22 Outlines Introduction Distribution of the numerical method Experiments Experiments on world-wide grids: platforms, numerical settings Experiments on Grid'5000: motivations, platforms, numerical settings Results YML Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments ➔ Conclusion
23 Conclusion (1/3) Reminder of the scope of this work Large grid computing and HPC: complementary tools Used by people that have no access to HPC Significant computations (size of the problem) We do not (cannot) target the high performances The resources are not dedicated Slow networks, heterogeneous machines, external perturbations, etc Linear algebra problems are useful for many general applications Differences with HPC and cluster computing We must not have a “speed-up” approach of the computations Recommendations to save resources on nodes
24 Conclusion (2/3) We propose Scalable real symmetric eigensolver for large grids Next expected bounding limit: disk space for much larger or very dense matrix Before the implementation of the method, key choices must be done Numerical methods and programming paradigms Bisection (Task-farming) Restarted scheme (memory and disk) Out-of-core (memory) Data persistence (communication) New version of YML Workflow of the eigensolver and re-usable components In progress
25 Conclusion (3/3) Topics of study for the eigensolver Improving the distribution of A Testing more matrices Different kind of matrices (e.g. sparse, dense) Larger matrices Scheduling level adapting the workload balancing to the heterogeneity of the platforms Current and future work on YML Finishing the multi back-end support Binary cache