Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/

Slides:

Advertisements

Similar presentations

First Steps in the Clouds

Advertisements

11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.

A Scalable Approach to Deploying and Managing Appliances Kate Keahey Rick Bradshaw, Narayan Desai, Tim Freeman Argonne National Lab, University of Chicago.

Virtualization, Cloud Computing, and TeraGrid Kate Keahey (University of Chicago, ANL) Marlon Pierce (Indiana University)

On-Demand Virtual Workspaces: Quality of Life in the Grid Kate Keahey Argonne National Laboratory.

Virtual Appliances for Scientific Applications Kate Keahey Argonne National Laboratory University of Chicago.

Virtual Workspaces State of the Art and Current Directions Borja Sotomayor University of Chicago (Dept. of CS) Kate Keahey ANL/UC.

The VM deployment process has 3 major steps: 1.The client queries the VM repository, sending a list of criteria describing a workspace. The repository.

Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab

From Sandbox to Playground: Dynamic Virtual Environments in the Grid Kate Keahey Argonne National Laboratory Karl Doering University.

Virtual Workspaces in the Grid Kate Keahey Argonne National Laboratory Ian Foster, Tim Freeman, Xuehai Zhang, Daniel Galron.

Sponsors and Acknowledgments This work is supported in part by the National Science Foundation under Grants No. OCI , IIP and CNS

Nimbus or an Open Source Cloud Platform or the Best Open Source EC2 No Money Can Buy ;-) Kate Keahey Tim Freeman University of Chicago.

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.

Bag-of-Tasks Scheduling under Budget Constraints Ana-Maria Oprescu, Thilo Kielman Presented by Bryan Rosander.

An OpenFlow based virtual network environment for Pragma Cloud virtual clusters Kohei Ichikawa, Taiki Tada, Susumu Date, Shinji Shimojo (Osaka U.), Yoshio.

Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility Chee Shin Yeo and Rajkumar Buyya Grid Computing and.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.

Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.

Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.

Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug

Virtual Infrastructure in the Grid Kate Keahey Argonne National Laboratory.

Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.

Network Aware Resource Allocation in Distributed Clouds.

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

Improving Network I/O Virtualization for Cloud Computing.

Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.

1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.

Globus Virtual Workspaces OOI Cyberinfrastructure Design Meeting, San Diego, October Kate Keahey University of Chicago Argonne National Laboratory.

Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.

1 Performance Optimization In QTP Execution Over Video Automation Testing Speaker : Krishnesh Sasiyuthaman Nair Date : 10/05/2012.

Running a Scientific Experiment on the Grid Vilnius, 13 rd May, 2008 by Tomasz Szepieniec IFJ PAN & CYFRONET.

Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

Luis Russi¹, Carlos R. Senna¹, Edmundo R. M. Madeira¹, Xuan Liu², Shuai Zhao², and Deep Medhi² Hadoop-in-a-Hybrid-Cloud GEC21 The 21st GENI Engineering.

Tools for collaboration How to share your duck tales…

The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.

Virtual Workspaces Kate Keahey Argonne National Laboratory.

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

Security Vulnerabilities in A Virtual Environment

QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.

1 Agility in Virtualized Utility Computing Hangwei Qian, Elliot Miller, Wei Zhang Michael Rabinovich, Craig E. Wills {EECS Department, Case Western Reserve.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.

GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory.

Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.

2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Dynamic Deployment of VO Specific Condor Scheduler using GT4

Management of Virtual Machines in Grids Infrastructures

GWE Core Grid Wizard Enterprise (

WP1 activity, achievements and plans

Management of Virtual Machines in Grids Infrastructures

GRUBER: A Grid Resource Usage SLA Broker

rvGAHP – Push-Based Job Submission Using Reverse SSH Connections

Experiences in Running Workloads over OSG/Grid3

Presentation transcript:

Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/ University of Chicago Tim Freeman Argonne National Laboratory/ University of Chicago Kate Keahey Argonne National Laboratory/ University of Chicago HPDC 2007 Hot Topics Session

Motivation Leasing resources for short periods of time can be of great value to many applications. Workflows, real-time applications, and applications requiring resource co-scheduling. Leasing semantics The glidein approach: Condor glideins, MyCluster, and Falkon Advance reservations Meta-scheduling, deadlines, demos Utilization problems We argue that virtualization can make resource leasing cost- effective, despite the overhead of using VMs, thus: Providing an incentive for resource providers to allow short- term leasing of resources. Creating an opportunity for scientific applications (resource consumers) that require multi-level scheduling.

Approach Separate resource provisioning from execution management. Resource provisioning is handled by a new component called the Lease Manager Execution management can continue to be handled by a site's current scheduler (PBS/Maui, SGE, Condor,...) All provisioning is handled via the use of VMs Including provisioning resources for a batch job Use VMs suspend/resume mechanisms to backfill and suspend non-interactive/batch applications

LRM Lease Manager Execution Manager Short-term leases Batch computation VMM-enabled Worker Nodes

SHORT-TERM LEASE SHORT-TERM LEASE Scheduling the lease without using virtualization : Scheduling the lease using virtualization:

Experiment Setting Simulated testbed of 8 nodes connected by 100Mbps network, such that at most two VMs can run simultaneously on one node. We consider the best and worst cases Traces Artificial traces, combining serial batch requests and ARs Would require 10h to run on testbed (assuming perfect utilization) VM runtime overhead assumed to be 10% Experiments

Experiment I Is using VMs for suspend/resume backfill worth the overhead? Assumption: we are using only one VM image Prototype scheduler supporting batch serial requests and advance reservations, using backfilling or suspend/resume to plan around the ARs. A Resource Management Model for VM-Based Virtual Workspaces, B.Sotomayor, Masters paper, University of Chicago. February 2007.

Best-case trace Trace characteristics Duration of batch requests: Avg=15 min. AR resource consumption: 75% - 100% Proportion of Batch/AR: 75%/25% Benefits from suspend/resume because the large number of relatively long batch requests limit the efficiency of backfilling.

One Image (best case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

One Image (best case) Add Runtime Overhead Running inside a VM adds runtime overhead, but not a big hit since images are predeployed.

One Image (best case) Use Suspend/Resume Allows for better resource utilization than backfilling, even better than baseline (because of long batch requests)

Worst-case trace Same as previous trace, but with shorter batch requests (avg=5 minutes) This also entails that there are more batch requests, since the total running time of the trace is still 10h With a large number of relatively short requests, backfilling is already very effective, and little is gained from suspend/resume. Furthermore, many more images have to be deployed in this case, which increases the preparation overhead.

One Image (worst case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

One Image (worst case) Add Runtime Overhead Running inside a VM adds runtime overhead, but not a big hit since images are predeployed.

One Image (worst case) Use Suspend/Resume Doesn't provide any significant advantage over backfilling because of short batch requests.

Experiment II How much do we pay for the added flexibility of operating in multiple virtualized environments? Assumption: we are using multiple images Scheduler also has application-specific knowledge (i.e., it knows it is scheduling VMs) so it is able to also schedule timely VM image transfer. Image reuse strategies: realistically not all images will be different Modification of Experiment I Use 37 possible 600MB VM images. 7 images account for 70% of requests.

Multiple Images (best case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

Multiple Images (best case) Transferring images Adds deployment overhead which delays starting time of batch requests.

Multiple Images (best case) Adding Runtime Overhead Makes running time even larger

Multiple Images (best case) Use Suspend/Resume Better resource utilization compensates for deployment overhead.

Multiple Images (best case) Image Reuse Improves performance slightly.

Multiple Images (worst case) Baseline Not using VMs (no runtime overhead) and backfilling instead of suspend/resume

Multiple Images (worst case) Transferring images Adds deployment overhead which delays starting time of batch requests.

Multiple Images (worst case) Adding Runtime Overhead Relatively small performance hit (the least of our concerns here)

Multiple Images (worst case) Use Suspend/Resume Doesn't improve significantly over backfilling, which already does a good job thanks to the presence of small batch requests

Multiple Images (worst case) Image Reuse Compensates for deployment overhead. Still not as good as baseline, but relatively small difference

Conclusions Using virtualization can make short-term leasing with interesting semantics cost- effective even in the presence of runtime overhead Given reasonable strategies of deployment overhead management the cost of using multiple images is acceptable. However, only artificial stress traces have been used so far. Preliminary results with real traces suggest that short-term leases can be integrated into real workloads and still be cost-effective (we will release these results as soon as they're solid)

Ongoing Work Develop a better scheduler Handle parallel batch submissions Integrate this virtualized resource manager with existing LRM This work is our top-down effort We also have a bottom-up effort Better modeling of traces Based on real world batch submissions Non-uniform overhead Understanding VM overhead in practice Virtualization in Practice:

Questions? Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/ University of Chicago Tim Freeman Argonne National Laboratory/ University of Chicago Kate Keahey Argonne National Laboratory/ University of Chicago