VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter Xitao Wen, Kai Chen, Yan Chen, Yongqiang Liu, Yong.

Slides:



Advertisements
Similar presentations
Reconsidering Reliable Transport Protocol in Heterogeneous Wireless Networks Wang Yang Tsinghua University 1.
Advertisements

Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
CEOSnet Update WGISS Infrastructure Task Team (WITT) Dave Hartzell, NASA Ames / CSC WGISS-20 September Kiev, Ukraine.
Advanced Technology Laboratories page 1 Network Performance Monitoring at Small Time Scales Dina Papagiannaki, Rene Cruz, Christophe Diot.
Improving Datacenter Performance and Robustness with Multipath TCP
U NDERSTANDING D ATA C ENTER T RAFFIC C HARACTERISTICS Theophilus Benson 1, Ashok Anand 1, Aditya Akella 1, Ming Zhang 2 University Of Wisconsin – Madison.
Vivek Jain, Anurag Gupta Dharma P. Agrawal
1 Understanding Buffer Size Requirements in a Router Thanks to Nick McKeown and John Lockwood for numerous slides.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner and Isaac Keslassy Department of Electrical Engineering, Technion. Gabi Bracha, Eyal.
Traffic Control and the Problem of Congestion within the Internet By Liz Brown and Nadine Sur.
Policy-based Congestion Management for an SMS Gateway Alberto Gonzalez (KTH) Roberto Cosenza (Infoflex) Rolf Stadler (KTH) June 8, 2004, Policy Workshop.
Towards Predictable Datacenter Networks
Traffic Engineering with Forward Fault Correction (FFC)
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Principles of Congestion Control Chapter 3.6 Computer Networking: A top-down approach.
Thesis Proposal: Achieving Security and Efficiency in Software-Defined Networks Xitao Wen It’s my pleasure to have you guys attending my thesis proposal.
Error Tolerant Address Configuration for Data Center Networks with Malfunctioning Devices Xingyu Ma, Chengchen Hu, Kai Chen, Che Zhang, Hongtao Zhang,
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
1 Service Differentiation at Transport Layer via TCP Westwood Low- Priority (TCPW-LP) H. Shimonishi, M.Y. Sanadidi and M. Geria System Platforms Research.
Networks: Congestion Control1 Congestion Control.
Tomography-based Overlay Network Monitoring and its Applications Joint work with David Bindel, Brian Chavez, Hanhee Song, and Randy H. Katz UC Berkeley.
Bridging Router Performance and Queuing Theory N. Hohn*, D. Veitch*, K. Papagiannaki, C. Diot *: University of Melbourne.
Performance Evaluation on Buddy-TCP By Felix. Simulation Setup S C1C1 CNCN … … T_Sink1 T_SinkN … T1T1 TNTN U U_Sink 4N Mbps 50 ms L Types of traffic:
And now … Graphs simulation input file parameters 10,000 requests 4 categories of file sizes 1K- 80% frequency 4K – 15% 16K – 4% 64K –1% poisson arrival.
On The Deflection Routing in QoS Supported Optical Burst-Switched Network Ching-Fang Hsu, Te-Lung Liu, and Nen-Fu Huang Dep. of CS, NTHU, Taiwan, ROC.
Distributed Virtual-Time Scheduling in Rings (DVSR) Chun-Hung Chen National Taipei University of Technology.
Computer Science 1 Characterizing Link Properties Using “Loss-pairs” Jun Liu (joint work with Prof. Mark Crovella)
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Network Analysis -- Available Bandwidth Estimation Using SoNIC Junyu Chen, Yicheng Liang, Zhihong Liu Cornell University 1.
Data Center Traffic and Measurements: Available Bandwidth Estimation Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance.
Bridging Router Performance and Queuing Theory N. Hohn*, D. Veitch*, K. Papagiannaki, C. Diot *: University of Melbourne This work appeared at ACM Sigmetrics.
Mirror Mirror on the Ceiling: Flexible Wireless Links for Data Centers Presenter: Lu Gong.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
CS 381 Final Exam Study Guide Final Exam Date: Tuesday, May 12 th Time: 10:30am -12:30pm Room: SB 105 Exam aid: 8 ½ x 11 page of notes front and back.
Performance Evaluation of L3 Transport Protocols for IEEE (2 nd round) Richard Rouil, Nada Golmie and David Griffith National Institute of Standards.
The Case for Addressing the Limiting Impact of Interference on Wireless Scheduling Xin Che, Xi Ju, Hongwei Zhang {chexin, xiju,
Understanding the Performance of TCP Pacing Amit Aggarwal, Stefan Savage, Thomas Anderson Department of Computer Science and Engineering University of.
CING-YU CHU INFOCOM Outline  Introduction  Measurement  Measurement Results  Modeling Skype Behaviors  Analysis on TCP-friendly.
Voice Design Last Update Copyright 2011 Kenneth M. Chipps Ph.D. 1.
Huazhong University of Science and Technology Evaluating Latency-Sensitive Applications’ Performance Degradation in Datacenters with Restricted Power Budget.
1 Modeling and Performance Evaluation of DRED (Dynamic Random Early Detection) using Fluid-Flow Approximation Hideyuki Yamamoto, Hiroyuki Ohsaki Graduate.
1 On Scalable Edge-based Flow Control Mechanism for VPN Tunnels --- Part 2: Scalability and Implementation Issues Hiroyuki Ohsaki Graduate School of Information.
15744 Course Project1 Evaluation of Queue Management Algorithms Ningning Hu, Liu Ren, Jichuan Chang 30 April 2001.
Performance Evaluation of TCP over Multiple Paths in Fixed Robust Routing Wenjie Chen, Yukinobu Fukushima, Takashi Matsumura, Yuichi Nishida, and Tokumi.
Analysis of Buffer Size in Core Routers by Arthur Dick Supervisor Anirban Mahanti.
Session 2.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
Stochastic Fair Blue An Algorithm For Enforcing Fairness Wu-chang Feng (OGI/OHSU) Dilip Kandlur (IBM) Debanjan Saha (Tellium) Kang Shin (University of.
Date:101/4/18 Publisher:IEEE globalcom 2010 Author:Cheng-Hung Lin, Sheng-Yu Tsai, Chen- Hsiung Liu, Shih-Chieh Chang, Jyuo-Min Shyu Presenter : Shi-qu.
Internet Measurement and Analysis Vinay Ribeiro Shriram Sarvotham Rolf Riedi Richard Baraniuk Rice University.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
1/18 Evaluating Potential Routing Diversity for Internet Failure Recovery *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
Predictable Wireless Networking for Real-Time Sensing and Control Hongwei Zhang  Hongwei Zhang, October.
EDL 525 Complete Class EDL 525 Accountability and Evaluation Version 2 To purchase this material click below link 525/EDL-525-Complete-Class.
Resilient Datacenter Load Balancing in the Wild
Receiver Assistant Congestion Control in High Speed and Lossy Networks
Network-based Intrusion Detection, Prevention and Forensics System
Tapping Into The Unutilized Router Processing Power
A Novel Linkage to Generalized Vitiligo on 4q13-q21 Identified in a Genomewide Linkage Analysis of Chinese Families  Jian-Jun Chen, Wei Huang, Jin-Ping.
Network Layer Functions
Zhichun Li, Gao Xia, Yi Tang, Yan Chen, and Bin Liu
Towards A Secure Controller Platform for OpenFlow Applications
2012 סיכום מפגש 2 שלב המשכי תהליך חזוני-אסטרטגי של המועצה העליונה של הפיזיותרפיה בישראל.
Iroko A Data Center Emulator for Reinforcement Learning
Dragonfly+: Low Cost Topology for scaling Datacenters
788.11J Presentation “Active Visitor Guidance System”
A Novel Linkage to Generalized Vitiligo on 4q13-q21 Identified in a Genomewide Linkage Analysis of Chinese Families  Jian-Jun Chen, Wei Huang, Jin-Ping.
Your feedback matters when you provide it.
Presentation transcript:

VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter Xitao Wen, Kai Chen, Yan Chen, Yongqiang Liu, Yong Xia, Chengchen Hu 1

Datacenter as Infrastructure 2

Congestion in Datacenter 10:1~100:1 2:1~10:1 Packet loss! Queuing delay! Degrading Throughput! 3

Congestion in the Wild 4 General Approaches Problem Formulation Main Design Evaluation

Spatial Pattern Unbalanced utilization – Hotspot: Hot links account for <10% core links [IMC10] – Spatially unbalanced utilization 5 Sender Receiver

Temporal Pattern Long congestion event – lasts for 10s of minutes – Individual event has clear spatial pattern 6 Core Link Index

Traffic Stability Bursty at a fine granularity – Not predictable at 10s or 100s or milliseconds [IMC10][SIGCOMM09] Predictable at timescale of 10s of minutes – 40% to 70% pairwise traffic can be expected stable – 90%+ predictable traffic aggregated at core links 7

8 General Approaches Problem Formulation Main Design Evaluation Congestion in the Wild

General Approaches Network Layer – Increase network bandwidth Fat-tree, BCube, OSA… – Optimize flow routing Hedera, MicroTE Application Layer – Optimize VM placement Expensive Requires to upgrade entire DC network Expensive Requires to upgrade entire DC network Not scalable Requires hardware support Depends on rich path diversity Not scalable Requires hardware support Depends on rich path diversity Scalable Lightweight deployment Suitable for existing over- subscribed network Scalable Lightweight deployment Suitable for existing over- subscribed network 9

Virtualization Layer VM Live Migration – Keep continuous service while migrating – 1.1x – 1.4x VM memory transfer Server VM Server DC Network VM Major Cost! 10 Background on Virtualized DC

Optimize VM Placement Offload traffic from congested link active VM idle VM 11

Congestion in the Wild General Approaches Problem Formulation 12 Main Design Evaluation

Design Goal Mitigate congestion – Maximum link utilization (MLU) Controllable migration traffic (i.e. moving VM) – Less than reduced traffic Reasonable runtime overhead – Far less than target timescale (10s of mins) Objective Constraint 13

Problem Statement Input – Topology and routing of physical servers – Traffic matrix among VMs – Current Placement Variable & Output – Optimized Placement NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 14

Related Work Optimize VM placement – Server consolidation [SOSP07] – Fault tolerance [ICS07] – Network scalability [INFOCOM10] 15

Main Design 16 Evaluation Congestion in the Wild General Approaches Problem Formulation

Inspiration Stretch the tie violently, making it loose and less tangled. Solve each tie gently, by carefully reeving the end out of the tie. 17

Two-step Algorithm Fast and greedy Search for localizing overall traffic May stuck in local minimum Fast and greedy Search for localizing overall traffic May stuck in local minimum Fine-grained and randomized Search for mitigating traffic on the most congested links Help avoid local minimum Fine-grained and randomized Search for mitigating traffic on the most congested links Help avoid local minimum 18

Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 19

Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 20

Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 21

MLU=.60 MLU=.53 Simulated Annealing Searching (SA) Randomized global searching Terminate when obtains satisfied solution, or predefined max depth is reached Randomized global searching Terminate when obtains satisfied solution, or predefined max depth is reached 22

Evaluation 23 Congestion in the Wild General Approaches Problem Formulation Main Design

Methodology Baseline Algorithm – Clustering-based algorithm – Pro: best-known static optimality – Con: high runtime and migration overhead Metrics – MLU reduction without migration overhead – Overhead Migration traffic Runtime overhead – Simulation results 24

MLU Reduction without Overhead 25 VirtualKnotter demonstrates similar static performance as that of Clustering.

Migration Traffic 26 VirtualKnotter shows significantly less migration traffic than that of Clustering.

Runtime Overhead 27 VirtualKnotter demonstrates reasonable runtime overhead.

Simulation Results 53% less congestion 28 Altogether, VirtualKnotter obtains significant gain on congestion resolving.

Conclusions Collaborative VM migration can substantially resolve long-term congestion in DC Trade-off between optimality and migration traffic is essential to harvest the benefit DC networking projects of Northwestern LIST: 29

Thank you! 30

Backup 31

General Approaches Cost Hardware Support Scalability Other Dependency Increase Bandwidth HighYesVaries Optimize Routing LowYesLow Rich path diversity Optimize VM Placement LowNoHigh VM deployment 32

Problem Statement Objective – Minimize Maximum Link Utilization (MLU) – Cool down the hottest spot Constraints – Migration traffic – Server hardware capacity – Inseparable VM NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 33

Observation Summary Unbalanced jam (spatial) Long-term congestion (temporal) Predictable at 10s of minutes scale (stability) 34

Two-step Algorithm Multiway Θ-Kernighan-Lin Algorithm (KL) Fast search for approximation Simulated Annealing Searching (SA) Fine search for better solution 35