Distributed Resource Management and Parallel Computation Dr Michael Rudgyard Streamline Computing Ltd
Spin out of Warwick (& Oxford) University Specialising in distributed (technical) computing –Cluster and GRID computing technology 14 employees & growing; focussed expertise in: –Scientific Computing –Computer systems and support –Presently 5 PhDs in HPC and Parallel Computation –Expect growth to 20+ people in 2003
Strategy Establish an HPC systems integration company......but re-invest profits into software –Exploiting IP and significant expertise –First software product released –Two more products in prototype stage Two complementary ‘businesses’ –Both high growth
Track Record (2001 – date..) Installations include: –Largest Sun HPC cluster in Europe (176 proc) –Largest Sun / Myrinet cluster in UK (128 proc) –AMD, Intel and Sun clusters at 21 UK Universities –Commercial clients include Akzo Noble, Fujitsu, Maclaren F1, Rolls Royce, Schlumberger, Texaco…. Delivered a 264 proc Intel/Myrinet cluster: –1.3 Tflop/s Peak !! –Forms part of the White Rose Computational Grid
Streamline and Grid Computing Pre-configured ‘grid’-enabled systems: –Clusters and farms –The SCore parallel environment –Virtual ‘desktop’ clusters Grid-enabled software products: –The Distributed Debugging Tool –Large-scale distributed graphics –Scaleable, intelligent & fault tolerant parallel computing
‘Grid’-enabled turnkey clusters Choice of DRMs and schedulers: –(Sun) GridEngine –PBS / PBS-Pro –LSF / ClusterTools –Condor –Maui Scheduler Globus 2.x gatekeeper (Globus 3 ???) Customised access portal
The SCore parallel environment Developed by the Real World Computing Partnership in Japan ( Unique features, that are unavailable in most parallel environments: –Low latency, high bandwidth MPI drivers –Network transparency: Ethernet, Gigabit and Myrinet –Multi-user time-sharing (gang scheduling) –O/S level checkpointing and failover –Integration with PBS and SGE –MPICH-G port –Cluster management functionality
‘Desktop’ Clusters Linux Workstation Strategy –Integrated software stack for HPTC (compilers, tools & libraries) – cf. UNIX workstations Aim to provide a GRID at point of sale: –Single point of administration for several machines –Files served from front-end –Resource management –Globus enabled –Portal A cluster with monitors !!
The Distributed Debugging Tool A debugger for distributed parallel application –Launched at Supercomputing 2002 Aim is to be the de-facto HPC debugging tool –Linux ports for GNU, Absoft, Intel and PGI –IA64 and Solaris ports; AIX and HP-UX soon… –Commodity pricing structure ! Existing architecture lends itself to the GRID: –Thin client GUI + XML middleware + back-end –Expect GRID-enabled version in 2003
Distributed Graphics Software Aims –To enable very large models to be viewed and manipulated using commodity clusters –Visualisation on (local or remote) graphics client Technology –Sophisticated data-partitioning and parallel I/O tools –Compression using distributed model simplification –Parallel (real-time) rendering To be GRID-enabled within e-Science ‘Gviz’ project
Parallel Compiler and Tools Strategy Aim to invest in new computing paradigms Developing parallel applications is far from trivial –OpenMP does not marry with cluster architecture –MPI is too low-level –Few skills in the marketplace ! –Yet growth of MPPs is exponential… Most existing applications are not GRID-friendly –# of processors fixed –No Fault Tolerance –Little interaction with DRM
DRM for Parallel Computation Throughput of parallel jobs is limited by: –Static submission model: ‘mpirun –np …..’ –Static execution model: # processors fixed –Scaleability; many jobs use too many processors ! –Job Starvation Available tools can only solve some issues –Advanced reservation and back-fill (eg Maui) –Multi-user time-sharing (gang scheduling) The application itself must take responsibility !!
Dynamic Job Submission Job scheduler should decide the available processor resource ! The application then requires: –In built partitioning / data management –Appropriate parallel I/O model –Hooks into the DRM DRM requires: –Typical memory and processor requirements –LOS information –Hooks into the application
Dynamic Parallel Execution Additional resources may become available or be required by other applications during execution… Ideal situation: –DRM informs application –Application dynamically re-partitions itself Other issues: –DRM requires knowledge of the application (benefit of data redistribution must outweigh cost !) –Frequency of dynamic scheduling –Message passing must have dynamic capabilities
The Intelligent Parallel Application Optimal scheduling requires more information: –How well the application scales –Peak and average memory requirements –Application performance vs. architecture The application ‘cookie’ concept: –Application (and/or DRM) should gather information about its own capabilities –DRM can then limit # of available processors –Ideally requires hooks into the programming paradigm…
Fault Tolerance On large MPPs, processors/components will fail ! Applications need fault tolerance: –Checkpointing + RAID-like redundancy (cf SCore) –Dynamic repartitioning capabilities –Interaction with the DRM –Transparency from the user’s perspective Fault-tolerance relies on many of the capabilities described above…
Conclusions Commitment to near-term GRID objectives –Turn-key clusters, farms and storage installations –On going development of ‘GRID-enabled’ tools –Driven by existing commercial opportunities…. ‘Blue’-sky project for next generation applications –Exploits existing IP and advanced prototype –Expect moderate income from focussed exploitation –Strategic positioning: existing paradigms will ultimately be a barrier to the success of (V-)MPP computers / clusters !