Understanding and Exploiting Amazon EC2 Spot Instances

Slides:



Advertisements
Similar presentations
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Advertisements

Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Yue Han and Lei Yu Binghamton University.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
E E Module 17 W. D. Grover TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 ATM VP-based (or MPLS path) Restoration with Controlled Over-
Multi-Variate Analysis of Mobility Models for Network Protocol Performance Evaluation Carey Williamson Nayden Markatchev
Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen IEEE TCE, 2010.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
Multimedia and Mobile communications Laboratory Augmenting Mobile 3G Using WiFi Aruna Balasubramanian, Ratul Mahajan, Arun Venkataramani Jimin.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
HOW TO BID THE CLOUD Liang Zheng, Carlee Joe-Wong, Chee Wei Tan, Mung Chiang, and Xinyu Wang SIGCOMM 2015 | London, UK | August
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
A Broker for Cost-efficient QoS aware Resource Allocation in EC2. Kurt Vermeersch Coordinator: Kurt Vanmechelen.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
I can be You: Questioning the use of Keystroke Dynamics as Biometrics —Paper by Tey Chee Meng, Payas Gupta, Debin Gao Presented by: Kai Li Department of.
L. Bertazzi, B. Golden, and X. Wang Route 2014 Denmark June
Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 by Dalia Kaulakiene, Aalborg University (Denmark) Christian Thomsen,
1 Self Similar Video Traffic Carey Williamson Department of Computer Science University of Calgary.
Kingfisher: A System for Elastic Cost-aware Provisioning in the Cloud
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.
A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.
Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.
Managing The GLOBAL Business
Bank Contingent Capital: Valuation and the Role of market discipline
GENESYS Redevelopment Strawman Proposal
An Empirical Examination of Transaction- and Firm-Level Influences on the Vertical Boundaries of the Firm Leiblein, Michael.
Introduction to Wireless Sensor Networks
Amazon Instance Purchasing Options
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Written by : Thomas Ristenpart, Eran Tromer, Hovav Shacham,
Scheduling Jobs Across Geo-distributed Datacenters
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Server Allocation for Multiplayer Cloud Gaming
CPSC 531: System Modeling and Simulation
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
CPSC 531: System Modeling and Simulation
Statistical Methods Carey Williamson Department of Computer Science
A New Multipath Routing Protocol for Ad Hoc Wireless Networks
Bot Contests - Learnings from Trading Agent Contest for SCM
CARP: Compression-Aware Replacement Policies
Pei Fan*, Ji Wang, Zibin Zheng, Michael R. Lyu
Congestion Control in SDN-Enabled Networks
ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC
The incidence of Mandated Maternity Benefits
Developing Advanced Applications with Windows Azure
Statistical Data Analysis
Carey Williamson Department of Computer Science University of Calgary
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Kyoungwoo Lee, Minyoung Kim, Nikil Dutt, and Nalini Venkatasubramanian
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Energy-Delay Tradeoffs in Smartphone Applications
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Congestion Control in SDN-Enabled Networks
Approximate Mean Value Analysis of a Database Grid Application
Presentation transcript:

Understanding and Exploiting Amazon EC2 Spot Instances Carey Williamson Department of Computer Science University of Calgary

Introduction and Motivation Within Amazon’s Elastic Compute Cloud (EC2) offerings, clients have a choice between services: On-demand instances: reserve for given price and duration Spot instances: cheaper price; unpredictable duration Challenges: Spot instance prices fluctuate a lot! Duration of instance availability unknown Bid failures and revocations adversely impact QoS How to “optimize” cost and performance in EC2 markets?

Example: On-Demand Prices in Amazon EC2

An Example of Spot Instance Dynamics

More Examples of Spot Instance Dynamics

Spot Instance Terminology

Two main contributions: This Paper Research Questions: After bidding, how long does it take to get an instance? What is the expected lifetime of an instance? What is the average cost incurred for an instance? What is the probability of instance revocations? How can spot instance features be exploited to devise efficient and cost-effective procurement algorithms? Two main contributions: Empirical study of Amazon EC2 instances (2015 and 2017) Two case studies to evaluate effectiveness of models C. Wang, Q. Liang, and B. Urgaonkar, “An Empirical Analysis of Amazon EC2 Spot Instance Features Affecting Cost-Effective Resource Procurement”, ACM ToMPECS, Vol. 3, No. 2, Article 6, pp. 1-24, March 2018. (Extended journal version of earlier ICPE 2017 paper)

Empirical Analysis (preview) Feature 1: Lifetime of spot instances Exhibits sufficient temporal dependence for prediction Feature 2: Average cost during lifetime Feature 3: Simultaneous revocations Some spatial dependence; weaker temporal dependence Feature 4: Time to start a spot instance No obvious structural properties that can be exploited

Figure 3 Simple CDF-based approaches from the literature are not sufficient for this problem , since they do not distinguish the following three (synthetic) scenarios Need to explicitly consider “lifetimes” of instances

Figure 4 Illustration of “lifetime” for a successful bid Illustration of “average cost” for a successful bid

Figure 5 Some markets may be strongly correlated (right), while others may only be weakly correlated (left) Correlated markets might have correlated bid failures (i.e., multiple spot instances revoked at similar time) Results are conditional upon bid values though!

Empirical correlation results Observations: Figure 6 Empirical correlation results Observations: Stronger spatial relationship within regions than across Detected by both metrics Proposed approach detects some spatial corr across regions Weak spatial correlation observed for instance types

Figure 7 Some temporal correlation in start up times No obvious correlation structure based on bids

Figure 8 Empirical measurements of “effective capacity” Relatively small variation for any of the metrics

Empirical Analysis (review) Feature 1: Lifetime of spot instances Exhibits sufficient temporal dependence for prediction Using 7-14 days of recent past history gives best results Feature 2: Average cost during lifetime Feature 3: Simultaneous revocations Some spatial dependence; weaker temporal dependence Suggests splitting jobs across regions and instance types Feature 4: Time to start a spot instance No obvious structural properties that can be exploited

Case Studies (preview) Application 1: Latency-sensitive data processing In-memory key-value store (memcached) Application 2: Delay-tolerant batch computing Primary spot instances with backup on-demand instances For each application, two types of evaluation: Trace-driven simulation (90 days of spot instance pricing) System prototyping (24-hour EC2 expt, real-time control)

Figure 9 Wikipedia workload characteristics Clear diurnal patterns Time-varying working set size

Figure 10 Trace-driven simulation evaluation of approaches Data loss performance for Application 1

Figure 11 Experimental evaluation/comparison of approaches Application 1: memcached

Figure 12 Trace-driven simulation evaluation of approaches Application 2: batch computing

Figure 13 Experimental evaluation/comparison of approaches Application 2: batch computing

Application 1: Latency-sensitive data processing Case Studies (review) Application 1: Latency-sensitive data processing In-memory key-value store (memcached) Comparable cost but much better performance Application 2: Delay-tolerant batch computing Primary spot instances with backup on-demand instances Comparable performance with 18% lower costs For each application, two types of evaluation: Trace-driven simulation (90 days of spot instance pricing) System prototyping (24-hour EC2 expt, real-time control)

Summary and Conclusions There are four key features of Amazon EC2 instances that need to be considered (i.e., spot instance lifetime; average price during lifetime; simultaneous revocations; time to start) Temporal and spatial dependence in some features can be exploited to improve predictions Need to consider impacts of (simultaneous) revocations Need to condition results based on bid price Two real-world case studies demonstrate the effectiveness of the proposed approach in practice Can save 20-70% on costs versus on-demand instances Can provide flexible tradeoffs between cost and performance