The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Symantec 2010 Windows 7 Migration Global Results.
1 A B C
Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.
Simplifications of Context-Free Grammars
Variations of the Turing Machine
Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
Flexible Budgets, Variances, and Management Control: II
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
David Burdett May 11, 2004 Package Binding for WS CDL.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1Y - Youth Chapter 5.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt RhymesMapsMathInsects.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
1 00/XXXX © Crown copyright Carol Roadnight, Peter Clark Met Office, JCMM Halliwell Representing convection in convective scale NWP models : An idealised.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Cost-Volume-Profit Relationships
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
Regression with Panel Data
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Chapter 1: Expressions, Equations, & Inequalities
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Essential Cell Biology
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
PSSA Preparation.
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
FIGURE 3-1 Basic parts of a computer. Dale R. Patrick Electricity and Electronics: A Survey, 5e Copyright ©2002 by Pearson Education, Inc. Upper Saddle.
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Towards Predictable Datacenter Networks
Presentation transcript:

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1

Cloud Computing is Hot 2 Private Cluster

Key Factors for Cloud Viability Cost Performance 3

Performance Variability in Cloud BW variation in cloud due to contention [Schad10 VLDB] Causing unpredictable performance 4

Reserving BW in Data Centers SecondNet [Guo10] – Per VM-pair, per VM access bandwidth reservation Oktopus [Ballani11] – Virtual Cluster (VC) – Virtual Oversubscribed Cluster (VOC) 5

How BW Reservation Works 6... Virtual Cluster Model Time Bandwidth N VMs Virtual Switch 1. Determine the model 2. Allocate and enforce the model 0T B Only fixed-BW reservation Request

Network Usage for MapReduce Jobs Hadoop Sort, 4GB per VM Hadoop Word Count, 2GB per VM Hive Join, 6GB per VM Hive Aggregation, 2GB per VM 7 Time-varying network usage

Motivating Example 4 machines, 2 VMs/machine, non-oversubscribed network Hadoop Sort – N: 4 VMs – B: 500Mbps/VM 1Gbps 500Mbps Not enough BW 8

Motivating Example 4 machines, 2 VMs/machine, non-oversubscribed network Hadoop Sort – N: 4 VMs – B: 500Mbps/VM 9 1Gbps 500Mbps

Under Fixed-BW Reservation Model 10 1Gbps 500Mbps Job3 Job2 Virtual Cluster Model Job1 Time Bandwidth

Under Time-Varying Reservation Model 11 1Gbps 500Mbps TIVC Model Job1 Time Job2 Job3 Job4 Job5 J1J2J3J4J5 Bandwidth Doubling VM, network utilization and the job throughput Hadoop Sort

Temporally-Interleaved Virtual Cluster (TIVC) Key idea: Time-Varying BW Reservations Compared to fixed-BW reservation – Improves utilization of data center Better network utilization Better VM utilization – Increases cloud providers revenue – Reduces cloud users cost – Without sacrificing job performance 12

Challenges in Realizing TIVC Virtual Cluster Model Time Bandwidth N VMs Virtual Switch 0T B Request Time Bandwidth 0T B Request Q1: What are right model functions? Q2: How to automatically derive the models? Q1: What are right model functions? Q2: How to automatically derive the models?

Challenges in Realizing TIVC 14 Q3: How to efficiently allocate TIVC? Q4: How to enforce TIVC? Q3: How to efficiently allocate TIVC? Q4: How to enforce TIVC?

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 15

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 16

How to Model Time-Varying BW? 17 Hadoop Hive Join

TIVC Models 18 Virtual Cluster T 11 T 32

Hadoop Sort 19

Hadoop Word Count 20 v

Hadoop Hive Join 21

Hadoop Hive Aggregation 22

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 23

Possible Approach White-box approach – Given source code and data of cloud application, analyze quantitative networking requirement – Very difficult in practice Observation: Many jobs are repeated many times – E.g., 40% jobs are recurring in Bings production data center [Agarwal12] – Of course, data itself may change across runs, but size remains about the same 24

Our Approach Solution: Black-box profiling based approach 1.Collect traffic trace from profiling run 2.Derive TIVC model from traffic trace Profiling: Same configuration as production runs – Same number of VMs – Same input data size per VM – Same job/VM configuration 25 How much BW should we give to the application?

Impact of BW Capping 26 No-elongation BW threshold

Choosing BW Cap Tradeoff between performance and cost – Cap > threshold: same performance, costs more – Cap < threshold: lower performance, may cost less Our Approach: Expose tradeoff to user 1.Profile under different BW caps 2.Expose run times and cost to user 3.User picks the appropriate BW cap 27 Only below threshold ones

From Profiling to Model Generation Collect traffic trace from each VM – Instantaneous throughput of 10ms bin Generate models for individual VMs Combine to obtain overall jobs TIVC model – Simplify allocation by working with one model – Does not lose efficiency since per-VM models are roughly similar for MapReduce-like applications 28

Generate Model for Individual VM 1.Choose B b 2.Periods where B > B b, set to B cap 29 BW Time B cap BbBb

Maximal Efficiency Model Enumerate B b to find the maximal efficiency model 30 BW Time B cap BbBb

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 31

TIVC Allocation Algorithm Spatio-temporal allocation algorithm – Extends VC allocation algorithm to time dimension – Employs dynamic programming Properties – Locality aware – Efficient and scalable 99 th percentile 28ms on a 64,000-VM data center in scheduling 5,000 jobs 32

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 33

Enforcing TIVC Reservation Possible to enforce completely in hypervisor – Does not have control over upper level links – Requires online rate monitoring and feedback – Increases hypervisor overhead and complexity Observation: Few jobs share a link simultaneously – Most small jobs will fit into a rack – Only a few large jobs cross the core – In our simulations, < 26 jobs share a link in 64,000-VM data center 34

Enforcing TIVC Reservation Enforcing BW reservation in switches – Avoid complexity in hypervisors – Can be implemented on commodity switches Cisco Nexus 7000 supports 16k policers 35

Challenges in Realizing TIVC What are the right model functions? How to automatically derive the models? How to efficiently allocate TIVC? How to enforce TIVC? 36

Proteus: Implementing TIVC Models Determine the model 2. Allocate and enforce the model

Evaluation Large-scale simulation – Performance – Cost – Allocation algorithm Prototype implementation – Small-scale testbed 38

Simulation Setup 3-level tree topology – 16,000 Hosts x 4 VMs – 4:1 oversubscription Workload – N: exponential distribution around mean 49 – B(t): derive from real Hadoop apps 39 50Gbps 10Gbps … … … 1Gbps … 20 Aggr Switch 20 ToR Switch 40 Hosts ………

Batched Jobs Scenario: 5,000 time-insensitive jobs 40 42%21%23%35% 1/3 of each type Completion time reduction All rest results are for mixed

Varying Oversubscription and Job Size % reduction for non-oversubscribed network

Dynamically Arriving Jobs Scenario: Accommodate users requests in shared data center – 5,000 jobs, Poisson arrival, varying load 42 Rejected: VC: 9.5% TIVC: 3.4% Rejected: VC: 9.5% TIVC: 3.4%

Analysis: Higher Concurrency Under 80% load 43 7% higher job concurrency 28% higher VM utilization Rejected jobs are large 28% higher revenue Charge VMs VM

Tenant Cost and Provider Revenue Charging model – VM time T and reserved BW volume B – Cost = N (k v T + k b B) – k v = 0.004$/hr, k b = $/GB 44 12% less cost for tenants Providers make more money Amazon target utilization

Testbed Experiment Setup – 18 machines – Tc and NetFPGA rate limiter Real MapReduce jobs Procedure – Offline profiling – Online reservation 45

Testbed Result 46 TIVC finishes job faster than VC, Baseline finishes the fastest TIVC finishes job faster than VC, Baseline finishes the fastest Baseline suffers elongation, TIVC achieves similar performance as VC Baseline suffers elongation, TIVC achieves similar performance as VC

Conclusion Network reservations in cloud are important – Previous work proposed fixed-BW reservations – However, cloud apps exhibit time-varying BW usage We propose TIVC abstraction – Provides time-varying network reservations – Uses simple pulse functions – Automatically generates model – Efficiently allocates and enforces reservations Proteus shows TIVC benefits both cloud provider and users significantly 47

Backup slides 48

Adding Cushions to Model 49 Without cushion With 60s cushion

Network Utilization VC reserves 26.4% abs. more bandwidth But less actual utilization (8.9% vs. 20.1%) 50

BW Variability on Cloud 51 [Ballani11]

Model Refinement Can we further reduced BW for low efficiency pulses without elongation? – This allows us potentially fit more jobs 52 Hadoop Hive Join

Model Refinement (cont.) If efficiency of a pulse < γ lower the cap so that efficiency = α γ = 8%, α = 20% 53