DONAR Decentralized Server Selection for Cloud Services Patrick Wendell, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman, and.

Slides:



Advertisements
Similar presentations
Network Aware Forward Caching Presenter: Alexandre Gerber Jeffrey Erman, Mohammad T. Hajiaghayi, Dan Pei, Oliver Spatscheck AT&T Labs Research April 24.
Advertisements

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
C. Mastroianni, D. Talia, O. Verta - A Super-Peer Model for Resource Discovery Services in Grids A Super-Peer Model for Building Resource Discovery Services.
A Cloud Data Center Optimization Approach using Dynamic Data Interchanges Prof. Stephan Robert University of Applied Sciences.
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
High throughput chain replication for read-mostly workloads
Scalable Content-Addressable Network Lintao Liu
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Akamai DNS Offerings RSA © Conference ©2013 AKAMAI | FASTER FORWARD TM Akamai DNS Solutions Enhanced DNS (eDNS) Scalable, outsourced, DNS solution.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Content Distribution Networks Costin Raiciu Advanced Topics in Distributed Systems Fall 2012.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazières
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Anycast Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
1 Routing as a Service Karthik Lakshminarayanan (with Ion Stoica and Scott Shenker) Sahara/i3 retreat, January 2004.
Wide-area cooperative storage with CFS
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Multipath Protocol for Delay-Sensitive Traffic Jennifer Rexford Princeton University Joint work with Umar Javed, Martin Suchara, and Jiayue He
Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)
Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Additional SugarCRM details for complete, functional, and portable deployment.
Active Network Applications Tom Anderson University of Washington.
Symmetric Replication in Structured Peer-to-Peer Systems Ali Ghodsi, Luc Onana Alima, Seif Haridi.
Lecture 15 – Amazon Network as a Service. Recall IaaS Server as a Service Storage as a Service Network as a Service.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Oasis: Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazières in NSDI 2006 Presented by: Sailesh Kumar.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Software-Defined Networks Jennifer Rexford Princeton University.
Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.
DONAR Decentralized Server Selection for Cloud Services B96B02016 生化科技四 張煥基 B 電機三 姜慧如
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.
Jan 30, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Project Milestone 2 due today. Undergraduate projects should have 3 students per project.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
DNS Traffic Management and DNS data mining Making Windows DNS Server Cloud Ready ~Kumar Ashutosh, Microsoft.
Coral: A Peer-to-peer Content Distribution Network
Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.)
Alternative system models
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
CHAPTER 3 Architectures for Distributed Systems
EE5900: Cyber-Physical Systems
Phillipa Gill University of Toronto
Distributed File Systems
Content Distribution Networks
AWS Cloud Computing Masaki.
Small-Scale Peer-to-Peer Publish/Subscribe
Dynamic Replica Placement for Scalable Content Delivery
Presentation transcript:

DONAR Decentralized Server Selection for Cloud Services Patrick Wendell, Princeton University Joint work with Joe Wenjie Jiang, Michael J. Freedman, and Jennifer Rexford

Outline Server selection background Constraint-based policy interface Scalable optimization algorithm Production deployment

User Facing Services are Geo-Replicated

Reasoning About Server Selection Service Replicas Client Requests Mapping Nodes

Example: Distributed DNS Client 1 Client C DNS 1 DNS 2 DNS 10 Servers Auth. Nameservers Client 2 ClientsMapping NodesService Replicas DNS Resolvers

Example: HTTP Redir/Proxying Client 1 Client C Datacenters HTTP Proxies Client 2 ClientsMapping NodesService Replicas HTTP Clients Proxy 1 Proxy 2 Proxy 500

Reasoning About Server Selection Service Replicas Client Requests Mapping Nodes

Reasoning About Server Selection Service Replicas Client Requests Mapping Nodes Outsource to DONAR

Outline Server selection background Constraint-based policy interface Scalable optimization algorithm Production deployment

Naïve Policy Choices Load-Aware: Round Robin Service Replicas Client Requests Mapping Nodes

Naïve Policy Choices Location-Aware: Closest Node Service Replicas Client Requests Mapping Nodes Goal: support complex policies across many nodes.

Policies as Constraints Replicas DONAR Nodes bandwidth_cap = 10,000 req/m split_ratio = 10% allowed_dev = ± 5%

Eg. 10-Server Deployment How to describe policy with constraints?

No Constraints Equivalent to Closest Node Requests per Replica

No Constraints Equivalent to Closest Node Requests per Replica Impose 20% Cap

Cap as Overload Protection Requests per Replica

12 Hours Later… Requests per Replica

Load Balance (split = 10%, tolerance = 5%) Requests per Replica

Load Balance (split = 10%, tolerance = 5%) Requests per Replica Trade-off network proximity & load distribution

12 Hours Later… Requests per Replica Large range of policies by varying cap/weight

Outline Server selection background Constraint-based policy interface Scalable optimization algorithm Production deployment

Optimization: Policy Realization Global LP describing optimal pairing Clients: c CNodes: n NReplica Instances: i I Minimize network cost Server loads within tolerance Bandwidth caps met s.t.

Optimization Workflow Measure Traffic Track Replica Set Calculate Optimal Assignment

Optimization Workflow Per-customer!

Optimization Workflow Measure Traffic Track Replica Set Calculate Optimal Assignment Continuously! (respond to underlying traffic)

By The Numbers DONAR Nodes Customers replicas/customer client groups/ customer Problem for each customer: 10 2 * 10 4 = 10 6

Measure Traffic & Optimize Locally? Service Replicas Mapping Nodes

Not Accurate! Service Replicas Mapping Nodes Client Requests No one node sees entire client population No one node sees entire client population

Aggregate at Central Coordinator? Service Replicas Mapping Nodes

Aggregate at Central Coordinator? Service Replicas Mapping Nodes Share Traffic Measurements (10 6 )

Aggregate at Central Coordinator? Service Replicas Mapping Nodes Optimize

Aggregate at Central Coordinator? Service Replicas Mapping Nodes Return assignments (10 6 )

So Far AccurateEfficientReliable Local only NoYes Central Coordinator YesNo

Decomposing Objective Function Traffic from c prob of mapping c to i cost of mapping c to i = Traffic to this node We also decompose constraints (more complicated)

Decomposed Local Problem For Some Node (n*) min load i = f(prevailing load on each server + load I will impose on each server) Local distance minimization Global load information

DONAR Algorithm Service Replicas Mapping Nodes Solve local problem

DONAR Algorithm Service Replicas Mapping Nodes Solve local problem Share summary data w/ others (10 2 )

DONAR Algorithm Service Replicas Mapping Nodes Solve local problem

DONAR Algorithm Service Replicas Mapping Nodes Share summary data w/ others (10 2 )

DONAR Algorithm Provably converges to global optimum Requires no coordination Reduces message passing by 10 4 Service Replicas Mapping Nodes

Better! AccurateEfficientReliable Local only NoYes Central Coordinator YesNo DONAR Yes

Outline Server selection background Constraint-based policy interface Scalable optimization algorithm Production deployment

Production and Deployment Publicly deployed 24/7 since November 2009 IP2Geo data from Quova Inc. Production use: – All MeasurementLab Services (incl. FCC Broadband Testing) – CoralCDN Services around 1M DNS requests per day

Systems Challenges (See Paper!) Network availability Anycast with BGP Reliable data storage Chain-Replication with Apportioned Queries Secure, reliable updates Self-Certifying Update Protocol

CoralCDN Replicas DONAR Nodes Client Requests CoralCDN Experimental Setup split_weight =.1 tolerance =.02

Results: DONAR Curbs Volatility Closest Node policy DONAR Equal Split Policy

Results: DONAR Minimizes Distance

Conclusions Dynamic server selection is difficult – Global constraints – Distributed decision-making Services reap benefit of outsourcing to DONAR. – Flexible policies – General: Supports DNS & HTTP Proxying – Efficient distributed constraint optimization Interested in using? Contact me or visit

Questions?

Related Work (Academic and Industry) Academic – Improving network measurement iPlane: An informationplane for distributed services H. V. Madhyastha, T. Isdal, M. Piatek, C. Dixon, T. Anderson, A. Krishnamurthy, and A. Venkataramani,, in OSDI, Nov – Application Layer Anycast OASIS: Anycast for Any Service Michael J. Freedman, Karthik Lakshminarayanan, and David Mazières Proc. 3rd USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI '06) San Jose, CA, May 2006.NSDI '06 Proprietary – Amazon Elastic Load Balancing – UltraDNS – Akamai Global Traffic Management

Doesnt [Akamai/UltraDNS/etc] Already Do This? Existing approaches use alternative, centralized formulations. Often restrict the set of nodes per-service. Lose benefit of large number of nodes (proxies/DNS servers/etc).