Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey University of Chicago Argonne National Laboratory
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Cloud Computing Science Clouds Elastic computing, Pay-as-you-go, Capital expense operational expense
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Everything-as-a-Service IaaS PaaS SaaS
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The Quest Begins l Code complexity l Resource control
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Workspaces l Dynamically provisioned environments u Environment control u Resource control l Hardware implementations vs virtualization
10/20/08 The Nimbus Toolkit: http//workspace.globus.org A Brief History of Nimbus Research on agreement-based services Xen released First Workspace Service release EC2 goes online EC2 gateway available STAR production runs on EC2 Nimbus Cloud comes online Support for EC2 interfaces
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Nimbus Overview l Goal: open source, extensible, IaaS implementation and tools u Specifically targeting scientific community u A platform for experimentation with features for scientific needs u Set up private clouds (privacy, expense considerations) l Tools u IaaS layer (Workspace Service) u Orchestration layer (Context Broker, gateway) l
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node VWS Service The Workspace Service
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The Workspace Service Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service publishes information on each workspace as standard WSRF Resource Properties. Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. VWS Service
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Workspace Service Interfaces and Clients l Web Services based l Web Service Resource Framework (WSRF) u GT-based l Elastic Computing Cloud (EC2) u Supported: ec2-describe-images, ec2-run-instances, ec2- describe-instances, ec2-terminate-instances, ec2-reboot- instances, ec2-add-keypair, ec2-delete-keypair u Unsupported: availability zones, security groups, elastic IP assignment, REST l Used alongside WSRF interfaces u E.g., the University of Chicago cloud allows you to connect via the cloud client or via the EC2 client
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Security l GSI authentication and authorization u PKI credential required u Works with Grid proxies u VOMS, Shibboleth (via GridShib), custom PDPs l Secure access to VMs u EC2 key generation or accessed from.ssh l Validating images and image data u Collaboration with Vienna University of Technology
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Networking l Network configuration u External: public IPs or private IPs (via VPN) u Internal: private network via a local cluster network l Each VM can specify multiple NICs mixing private and public networks (WSRF only) u E.g., cluster worker nodes on a private network, headnode on both public and private network
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The Back Story Pool node Trusted Computing Base (TCB) Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node VWS Service Workspace WSRF front-end that allows clients to deploy and manage virtual workspaces Resource manager for a pool of physical nodes Deploys and manages Workspaces on the nodes Each node must have a VMM (Xen) installed, as well as the workspace control program that manages individual nodes Workspace back-end:
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Workspace Components workspace control workspace resource manager workspace pilot workspace client workspace service EC2 WSRF
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Workspace Control l VM image propagation l Image management and reconstruction u Creating blank partitions, sharing partitions l VM control u Starting, stopping, pausing, etc. l Integrating a VM into the network u Assigning MAC addresses and IP addresses u DHCP delivery tool u Building up a trusted (non-spoofable) networking layer l Contextualization information management l Talks to the workspace service via ssh l Standalone component l Some functionality overlap with libvirt l Implementations in Xen and KVM (queued up for release)
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The Workspace Resource Manager l Basic slot fitting l Implements immediate leases l Extensible vehicle to experiment with different leases l Open source resource manager for multiple different VMMs l Datacenter technology equivalent u Can be replaced by OpenNebula or other datacenter technologies l Deployment u University of Chicago, University of Florida, Purdue, Masaryk University and all the other Science Cloud sites
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The Workspace Pilot l Challenge: how can I provide a virtualization solution without disrupting the current operation of my cluster? l Flying Low: the Workspace Pilot u Integrates with popular LRMs (such as PBS, SGE) u Implements best effort leases u Glidein approach: submits a pilot program that claims a resource slot u Includes administrator tools l Deployment u U of Victoria (Atlas), Ian Gable and collaborators u Adapting for the use of the Atlas CERN, Omer Khalid
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Cloud Closure workspace control workspace resource manager workspace pilot workspace client cloud client storage service workspace service EC2 WSRF
10/20/08 The Nimbus Toolkit: http//workspace.globus.org IaaS Gateway l Goals u Access to different IaaS infrastructures u Account management u Facilitate movement between academic and commercial clouds and creation of meta-clouds u Combine higher-level tools and IaaS l Released as service, not as code l First online in June 2007, currently in a rewrite l Used to move e.g., HEP STAR experiments between Science Clouds and EC2
10/20/08 The Nimbus Toolkit: http//workspace.globus.org The IaaS Gateway IaaS gateway EC2 potentially other providers workspace control workspace resource manager workspace pilot workspace client cloud client storage service workspace service EC2 WSRF
10/20/08 The Nimbus Toolkit: http//workspace.globus.org IP1HK1 IP2HK2 IP3HK3 MPI Reciprocal exchange of information: networking and security One-click Virtual Clusters l Parameterizable appliance l Tightly-coupled clusters
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Context BrokerIP1HK1 IP1 IP2 IP3 HK1 HK2 HK3 IP1HK1 IP1 IP1 IP1 IP1 IP1 IP1 IP1HK1 IP1 IP1 IP1 IP1 IP1 IP1 IP2HK2 IP1 IP2 IP3 HK1 HK2 HK3 IP3HK3 IP1 IP2 IP3 HK1 HK2 HK3
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Goals for Context Broker l Can work with every appliance u Appliance schema, can be implemented in terms of many configuration systems l Can work with every cloud provider u Simple and minimal conditions on generic context delivery l Can work across multiple cloud providers, in a distributed environment
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Status for Context Broker l Release history: u In alpha testing since August 07 u First released summer July 08 (v 1.3.3) u Latest update January 09 (v 2.2) l Used to contextualize 100s of nodes for EC2 STAR runs l Contextualized images on workspace marketplace l Working with rPath to make contextualizatin easier for the user
10/20/08 The Nimbus Toolkit: http//workspace.globus.org End of Nimbus Tour workspace control workspace resource manager workspace pilot workspace service workspace client cloud client IaaS gateway context broker context client EC2 potentially other providers storage service EC2 WSRF
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Science Clouds l Make it easy for scientific projects to experiment with cloud computing u Can cloud computing be used for science? l Evolve software in response to the needs of scientific projects u Start with EC2-like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases u Federating clouds: moving between cloud resources in academic and commercial space
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Science Cloud Resources l University of Chicago (Nimbus): u first cloud, online since March 4th 2008 u 16 nodes of UC TeraPort cluster, public IPs l University of Florida u Online since 05/08 u nodes, access via VPN l Other Science Clouds u Masaryk University, Brno, Czech Republic (08/08), Purdue (09/08) u Installations in progress: IU, Grid5K, others l Using EC2 for overflow l Minimal governance model l
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Cloud Use l ~100 DNs l Utilization: u Overall: 16% u Peak pw: 86% (week of 7/14) l Requests rejected: u None untill 7/14 u Lots afterwards ;-) Data scaled to the number of days
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Who Runs on Nimbus? Project diversity: Science, CS, education, build&test…
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Hadoop over ManyClouds l CS research: investigate latency-sensitive apps, e.g. Hadoop l Need access to distributed resources, and high level of privilege to run a ViNE router l Virtual workspace: ViNE router + application VMs l Paper: CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications by Andréa Matsunaga, Maurício Tsugawa and José Fortes. eScience U of Florida U of Chicago ViNE router ViNE router
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Alice HEP Experiment at CERN l CHEP paper in preparation
10/20/08 The Nimbus Toolkit: http//workspace.globus.org STAR l STAR: a high-energy physics experiment l Need resources with the right configuration u Complex environments: correct versions of operating systems, libraries, tools, etc all have to be installed. u Consistent environments: require validation l A virtual OSG STAR cluster u OSG cluster l OSG CE (headnode), gridmapfiles, host certificates, NSF, PBS u STAR worker nodes: SL4 + STAR conf l Requirements u One-click virtual cluster deployment u Migration: Science Clouds -> EC2
10/20/08 The Nimbus Toolkit: http//workspace.globus.org STAR (cntd) l From proof-of-concept to production runs u ~2 years ago: proof-of-concept u Last September: EC2 runs of up to 100 nodes (production scale, non-critical codes) u Testing for critical production deployment l Performance u Within 10% of expected performance for applications l Work by Jerome Lauret, Doug Olson, Leve Hajdu, Lidia Didenko
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Scalability Testing l Motivation u Test scalability of various Globus components u Test on a different platforms l Workspaces u Globus others l Requirements u very short-term but flexible access to diverse platforms l Work by various members of the Globus community (Tom Howe and John Bresnahan) l Resulted in provisioning a private cloud for Globus l Typically very short-lived communities of one
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Montage Workflows l Evaluating a cloud from users perspective Paper: Exploration of the Applicability of Cloud Computing to Large-Scale Scientific Workflows, C. Hoffa, T. Freeman, G. Mehta, E. Deelman, K. Keahey, SWBES08: Challenging Issues in Workflow Applications
10/20/08 The Nimbus Toolkit: http//workspace.globus.org User Environments Cloud Computing Ecosystem VMM/datacenter/IaaS Appliance Providers marketplaces commercial providers communities Deployment Orchestrator orchestrate the deployment of environments across possibly many cloud providers
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Open Source IaaS Implementations l OpenNebula u Open source datacenter implementation u University of Madrid, I. Llorente & team, 03/2008 l Eucalyptus u Open source implementation of EC2 u UCSB, R. Wolski & team, 06/2008 l Cloud-enabled Nimrod-G u Open source implementation of EC2 u Monash University, MeSsAGE Lab, 01/2009 l Industry efforts u openQRM, Enomalism
10/20/08 The Nimbus Toolkit: http//workspace.globus.org Friends and Family l Committers: Kate Keahey & Tim Freeman (ANL/UC), Ian Gable (UVIC) l A lot of help from the community, see: l Collaborations: u Cumulus: S3 implementation (Globus team) u EBS implementation with IU u Appliance management: rPath and Bcfg2 project u Virtual network overlays: University of Florida u Security: Vienna University of Technology
10/20/08 The Nimbus Toolkit: http//workspace.globus.org To the Future and Beyond l Increasing Importance of Appliance Providers l Cloud computing tools l Increased interest in cloud interoperability u Standards: rough consensus & working code u Image formats, contextualization capabilities, cloud interfaces, etc. l Cloud markets