Virtual Appliances CTS Conference 2011 Philadelphia May Geoffrey Fox Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington
Exploit electronic infrastructure to enhance learning Several quite old approaches are critical and dominant – “Just a bunch of web pages” aka digital library – Video conferencing – Shared material as in Webex, Adobe Connect Note asynchronous interaction via Twitter, Blackboard, Google docs etc. much easier (and successful) than synchronous (Polycom, access grid, Webex) approaches Interactive web learning environments such as Virtual worlds such as Second Life have not taken off but some think this will change as performance of clients and networks are improving dramatically (VRML failed ~1999) Must move to an environment consistent with world view of current students aka the “Twitter University”
C 4 Continuous Collaborative Computational Cloud C4C4 I N T E L I G L E N C E Motivating Issues job / education mismatch Higher Ed rigidity Interdisciplinary work Engineering v Science, Little v. Big science Modeling & Simulation C(DE)SE C 4 Intelligent Economy C 4 Intelligent People C 4 Intelligent Society NSF Educate “Net Generation” Re-educate pre “Net Generation” in Science and Engineering Exploiting and developing C 4 C 4 Curricula, programs C 4 Experiences (delivery mechanism) C 4 REUs, Internships, Fellowships Computational Thinking Internet & Cyberinfrastructure Higher Education 2020 CDESE is Computational and Data- enabled Science and Engineering
Educational appliances One component of C 4 A flexible, extensible platform for hands-on, lab- oriented education (on FutureGrid) Need to support appliances representing clusters of resources Virtual machines + social/virtual networking to create sandboxed modules – Virtual “Grid” appliances: self-contained, pre-packaged execution environments – Group VPNs: simple management of virtual clusters by students and educators
Why use Virtualization? Traditional ways of delivering hands-on training and education in parallel/distributed computing have non-trivial dependences on the environment Difficult to replicate same environment on different resources (e.g. HPC clusters, desktops) Difficult to cope with changes in the environment (e.g. software upgrades) Virtualization technologies remove key software dependences through a layer of indirection
Appliance Infrastructure - guiding principles Fidelity: activities should use full-fledged, executable software: education/training modules – Learn using the proper tools Reproducibility: Creators of content should be able to install, configure, and test their modules once, and be assured of the same functional behavior regardless of where the module is deployed – Incentive to invest effort in developing, testing and documenting new modules
Appliance Infrastructure - guiding principles Deployability: Students and users should be able to deploy modules in a simple manner, and in a variety of resources – Reduce barriers to entry; avoid dependences upon a particular infrastructure Community-oriented: Modules should be simple to share, discover, reuse, and expand – Create conditions for “viral” growth
Towards this vision in FutureGrid Executable modules – virtual appliances – Deployable on FutureGrid resources – Deployable on other cloud platforms, as well as virtualized desktops Community sharing – Web 2.0 portal, appliance image repositories – An aggregation hub for executable modules and documentation
What is a virtual appliance? An appliance that packages software and configuration needed for a particular purpose into a virtual machine “ image ” The virtual appliance has no hardware – just software and configuration The image is a (big) file It can be instantiated on hardware 9
Virtual appliance example Linux, Java, Hadoop, configuration scripts copy instantiate Hadoop image A Hadoop worker Another Hadoop worker Repeat… Virtualization Layer
What about the network? Multiple Web servers might be completely independent from each other Parallel processing: workers are not – Need to communicate and coordinate with each other – Each worker needs an IP address, uses TCP/IP sockets Cluster middleware stacks assume a collection of machines, typically on a LAN (Local Area Network) 11
Virtual cluster appliances Virtual appliance + virtual network copy instantiate Hadoop + Virtual Network A Hadoop worker Another Hadoop worker Repeat… Virtual machine Virtual network
Virtual cluster appliances Virtual appliance + virtual network 13 copy instantiate MPI + Virtual Network An MPI node Another MPI node Repeat… Virtual machine Virtual network
14 Background Virtual appliances – Encapsulate software environment in image Virtual disk file(s) and virtual hardware configuration The Grid appliance – Encapsulates cluster software environments Current examples: Condor, MPI, Hadoop – Homogeneous images at each node – Virtual LAN connecting nodes to form a cluster – Deploy within or across domains
Grid appliance in a nutshell Plug-and-play clusters with a pre-configured software environment – Linux + (Hadoop, Condor, MPI, …) – Scripts for zero-configuration –“ Virtual machine ” appliance; open-source software runs on Linux, Windows, Mac Hands-on examples, bootstrap infrastructure, and zero-configuration software – you ’ re off to a quick start 15
Grid appliance in a nutshell Creating an equivalent Grid on your own resources, or on cloud providers, is also easy Deploy image on FutureGrid, Amazon EC2 Copy the same appliance to clusters, PC labs Simple deployment and management of ad-hoc clusters – Opportunistic computing – Testing, evaluation – Education, training 16
Virtual Clusters in FutureGrid 17 Nimbus Eucalyptus Appliance image Education Training
Social virtual private networks Education/training: deploy your own cluster! 18 copy instantiate MPI + Virtual Network An MPI worker Another MPI worker Repeat… Virtual machine Group VPN GroupVPN Credentials (from Web site) Virtual IP - DHCP Virtual IP - DHCP
Where to go from here? You can download Grid appliances and run on your own resources You can create private virtual clusters and manage groups of users You can customize appliances with other middleware, create images, and share with other users Tutorials available at FutureGrid.org More information on Grid appliances also available at Grid- appliance.org Contact Renato Figueiredo for more information about 19