CRESCO Project: Salvatore Raia C.R. ENEA-Portici. 11/12/2007 Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2
OUTLINE: GRID, cluster and parallel Computing (Intro) ENEA-GRID. Architecture and functionality My Activity for CRESCO project and results on ENEA-GRID Conclusion and objectives C.R. ENEA-Portici. 11/12/2007
What is a cluster ? Collection of resources (HW, SW) connected via public or private network - Each CPU runs a separated istance of operating system -Administration: local Supercomputer= computer with many processors connected via high-speed computer bus and that share the memory (SMP) . It runs one Operating system cluster 1 supercomputer C.R. ENEA-Portici. 11/12/2007
GRID = nodes made of clusters and each node may have Shared or How to get a Grid ? Collection of interconnected clusters geographically distributed - administration: sometimes clusters belong to different department or company GRID = nodes made of clusters and each node may have Shared or Distributed memory architectures (Hybrid ) that share processes . ENEA-GRID has the same structure With 6 clusters: Bologna, Casaccia, Frascati, Portici, Trisaia, Brindisi cluster N cluster 3 cluster 2 cluster 1 GRID 1 C.R. ENEA-Portici. 11/12/2007
ENEA-GRID structure (HW) C.R. ENEA-Portici. 11/12/2007
GRID features Pro: Con: Shared resources Low costs (clock ?) Frequency scaling (domain ?) Power consumption P=C×V×V×F Pro: Shared resources Low costs (clock ?) Open systems Scalability Con: Several platforms Load balancing User Access How is it managed on ENEA-GRID ? C.R. ENEA-Portici. 11/12/2007
ENEA-GRID structure (SW) Resources management ICA client File System Operating Systems C.R. ENEA-Portici. 11/12/2007
User Interface Switch host USER ACCESS ICA client ssh o telnet web Run Appl. Jobs status USER ACCESS ICA client ssh o telnet web C.R. ENEA-Portici. 11/12/2007
My activity on ENEA-GRID (CRESCO pr.) Serial and Parallel (MPI) codes How to cope with ? Problem with: Multi platforms Load balancing User Access User interfaces LSF utilities Software dev. C.R. ENEA-Portici. 11/12/2007
Tools for Serial and Parallel (MPI) codes Multi Platform …So we need a lots of binaries for each platform. Launcher: after compiling our source code in each platform, we have “binary1”…”binaryN” for host1,…hostN. It is a shell script (placed on AFS) that selects the righteous “binary” for the selected host Serial codes Compilers GNU PGI IBM Parallel codes (MPI) MPI Implementations MPICH LAM-MPI POE Problems with execution too …tools C.R. ENEA-Portici. 11/12/2007
Some MPI problems C.R. ENEA-Portici. 11/12/2007
Results: tools serial and parallel (MPI) codes Program for Fortran 77/90,C and C++ serial compiling (look Java Interface) Launcher for “NS2” application (use external libraries) PARALLEL (MPI) Launcher for running a test program (check command) Launcher for HPL test on AIX and Linux user1 installation user2 installation C.R. ENEA-Portici. 11/12/2007
Analizing LSF utilities Serial and Parallel codes LSF Resources Serial codes Resources definition “NS2” application Serial LSF utilities Job array (Multicase) “lsgrun” Parallel codes (MPI) Parallel LSF utilities “mpijob” (MPICH) “poejob” (POE) Correlation No correlation C.R. ENEA-Portici. 11/12/2007
Results: Integration with other application (My)Java Interface Serial codes Parallel codes (MPI) C.R. ENEA-Portici. 11/12/2007
Conclusion and objectives Launcher + LSF utilities + User interface allow to create a omogeneous environment Objectives: Optimization of programs to launch serial and parallel codes, including checking resources to run the application (e.g. library, other programs, etc) Exploitation of LSF utilities in order to make easy running MPI programs (mpijob, poejob, etc) and load balancing Improve error handling for user interfaces … … C.R. ENEA-Portici. 11/12/2007
Andrew File System C.R. ENEA-Portici. 11/12/2007
LSF-Load Sharing Facilities C.R. ENEA-Portici. 11/12/2007