Virtual Machines for HPC Paul Lu, Cam Macdonell Dept of Computing Science
The Problems 1.Making applications run faster –Not discussed today –Parallelism is not always the answer 2.Making it easier to use different clusters –Packaging of applications, scripts, and libraries –Dealing with differences in environment 3.Making it easier to manage your files –Distributed file systems
Making Use of Clusters Heterogeneity creates complexity How can a scientist make use of all these clusters, without becoming a computing scientist? Scientific Linux Red Hat Linux GROMACS BLAST Python Python 2.2 FFTW Globus Trellis Library X
Shrink-Wrapped VMs Package once –OS (e.g., Linux) –Libraries –Application(s) Run many places –Busby –Glacier –Favourite workstation Linux GROMACS Trellis Linux, Windows, Mac OS VM
HPC using VMs Packaged once, run on many x86 clusters Using Trellis, data is automatically moved from local-to- remote, and back GlacierBusby, AICT GROMACS Linux Trellis GROMACS Linux Trellis GROMACS Linux Trellis GROMACS Linux Trellis File Server, Laptop Local Remote
GROMACS on VM and HW
Concluding Remarks Small performance hit with VMs Much easier to package and use Potentially, access to many more compute nodes
There is hope! Virtualization!
What is Computing Science? “So…you…like…write programs or something?” Can you fix my printer?
Scientific Computing Scientific applications are on the leading edge of computing –Lots of resources –Complex interactions –Huge amounts of data
Fastest Supercomputer –IBM LLNL Previously fastest –NEC Earth Simulator Are computers good at solving problems in natural science?
Computing in Canada Canada lacks world class computing facilities We have to be able to aggregate resources from numerous institutions The CISS experiments explored aggregating computing resources –4000 CPUs, 19 ADs
Aggregating is difficult Different administration domains Running GROMACS –Requires fftw –Doesn’t like new compilers –Files must be in certain locations And this is just for one application!
Virtualization Is it appropriate for Scientific Computing? –Performance has improved –Pricing has improved (it’s become free)
Virtual Images Positives –Completely portable Less administration –Control entire environment within Virtual Image We can run any application in them We can bundle data control software within them
Virtual Images Negatives –Large size GBs for virtual disks –Performance Loss Virtualization is slower than running on hardware
VMware on Busby Gromacs test run on Busby1
Future Directions Resolve performance anomaly More accurate timings of phases Run other applications Get all 4 nodes running concurrently