C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Marco Canini (UCL) with Lalith Suresh, Stefan Schmid, Anja Feldmann (TU Berlin)

Slides:



Advertisements
Similar presentations
Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.
Advertisements

Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
High throughput chain replication for read-mostly workloads
Predictive Parallelization: Taming Tail Latencies in
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Active Queue Management: Theory, Experiment and Implementation Vishal Misra Dept. of Computer Science Columbia University in the City of New York.
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.
Closer to the Cloud - A Case for Emulating Cloud Dynamics by Controlling the Environment Ashiwan Sivakumar Shankaranarayanan P N Sanjay Rao School of Electrical.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Spring 2003CS 4611 Content Distribution Networks Outline Implementation Techniques Hashing Schemes Redirection Strategies.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Web Server Software Architectures Author: Daniel A. Menascé Presenter: Noshaba Bakht.
Performance Evaluation
Differentiated Multimedia Web Services Using Quality Aware Transcoding S. Chandra, C.Schlatter Ellis and A.Vahdat InfoCom 2000, IEEE Journal on Selected.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Saehoon Kim§, Yuxiong He. , Seung-won Hwang§, Sameh Elnikety
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Timecard: Controlling User-Perceived Delays in Server-Based Mobile Applications Lenin Ravindranath, Jitu Padhye, Ratul Mahajan, Hari Balakrishnan.
New Challenges in Cloud Datacenter Monitoring and Management
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Active Network Applications Tom Anderson University of Washington.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
Cassandra - A Decentralized Structured Storage System
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
© Lindsay Bradford1 Scaling Dynamic Web Content Provision Using Elapsed-Time- Based Content Degradation Lindsay Bradford, Stephen Milliner and.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Timecard: Controlling User-Perceived Delays in Server-Based Mobile Applications Lenin Ravindranath, Jitu Padhye, Ratul Mahajan, Hari Balakrishnan.
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
C-Hint: An Effective and Reliable Cache Management for RDMA- Accelerated Key-Value Stores Yandong Wang, Xiaoqiao Meng, Li Zhang, Jian Tan Presented by:
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Measuring and Mitigating Web Performance Bottlenecks in Broadband Access Networks Srikanth Sundaresan, Nick Feamster (Georgia Tech) Renata Teixeira (Inria)
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
ICDCS 2014 Madrid, Spain 30 June-3 July 2014
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Jihui Yang CS525 Advanced Distributed System March 1, 2016.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Web Servers load balancing with adjusted health-check time slot.
SEDA: An Architecture for Scalable, Well-Conditioned Internet Services
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Replication Middleware for Cloud Based Storage Service
Building a Database on S3
Resource-Efficient and QoS-Aware Cluster Management
Fast Congestion Control in RDMA-Based Datacenter Networks
Hawk: Hybrid Datacenter Scheduling
Modeling and Evaluating Variable Bit rate Video Steaming for ax
Presentation transcript:

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Marco Canini (UCL) with Lalith Suresh, Stefan Schmid, Anja Feldmann (TU Berlin)

Latency matters 2 Latency $$$$ +500ms lead to a revenue decrease of 1.2% [Eric Schurman, Bing] Every 100ms in latency cost them 1% in sales [Greg Linden, Amazon]

3

4 One user request Tens to Hundreds of servers involved

Variability of response time at system components inflate end-to-end latencies and lead to high tail latencies 5 Latency ECDF Size and system complexity At Bing, 30% of services have 95 th percentile > 3 x median latency [Jalaparti et al. SIGCOMM’13]

Performance fluctuations are the norm 6 Queueing delays Skewed access patterns CDF Resource contention Background activities

7 How can we enable datacenter applications* to achieve more predictable performance? * in this work: low-latency data stores

Replica selection 8 {request} ? ??? ??

Replica Selection Challenges Service-time variations Herd behavior 9

Request 4 ms 5 ms 30 ms Service-time variations 10

Request Herd Behavior 11 ? ? ? Request

Load Pathology in Practice Cassandra on EC2 –15-node cluster, skewed access pattern, workload by 120 YCSB generators Observe load on most heavily utilized node 12

13 Load Pathology in Practice 99.9 th percentile ~ 10x median latency

14 Load Conditioning in Our Approach

C3 Adaptive replica selection mechanism that is robust to service time heterogeinity 15

C3 Replica Ranking Distributed Rate Control 16

C3 Replica Ranking Distributed Rate Control 17

18 Client Server Client Server µ -1 = 2 ms µ -1 = 6 ms

19 Client Server Client Server µ -1 = 2 ms µ -1 = 6 ms

20 Server-side Feedback Client Server Clients maintain EWMAs of these metrics

21 Server-side Feedback Concurrency compensation

22 Server-side Feedback outstanding requests

23

24 Potentially long queue sizes Hampers reactivity

25 Penalizing Long Queues

26 Scoring Function Clients rank server according to

C3 Replica Ranking Distributed Rate Control 27

28 Need for distributed rate control Replica ranking insufficient Avoid saturating individual servers? Non-internal sources of performance fluctuations?

29 Cubic Rate Control Clients adjust sending rates according to cubic function If receive rate isn’t increasing further, multiplicatively decrease

30 Putting everything together On receiving a request: Client sorts replica servers by ranking function Select first replica server which is within rate limit If all replicas have exceeded rate limit, retain in backlog queue until a replica server is available

31 C3 implementation in Cassandra

32

A R3R3 R2R2 R1R1 Read ( “key” ) 33

A R3R3 R2R2 R1R1 ??? ??? Read ( “key” ) 34

A R3R3 R2R2 Who is the best replica? Read ( “key” ) Dynamic snitching 35

36 C3 Implementation Replacement for Dynamic Snitching Scheduler and backlog queue per replica group ~400 lines of code

37 Evaluation Amazon EC2 Controlled Testbed Simulations

38 Evaluation Amazon EC2 15-node Cassandra cluster Workloads generated using YCSB Read-heavy, update-heavy, read-only 500M 1KB records (larger than memory) Compare against Cassandra’s Dynamic Snitching

39 2x – 3x improved 99.9 th percentile latencies

40 Up to 43% improved throughput

41 Improved load conditioning

42 Improved load conditioning

43 How does C3 react to dynamic workload changes? Begin with 80 read-heavy workload generators 40 update-heavy generators join the system after 640s Observe latency profile with and without C3

44 Latency profile degrades gracefully with C3

45 Summary of other results Higher system load Skewed record sizes SSDs instead of HDDs > 3x better 99 th percentile latency > 50% higher throughput

46 Ongoing work Stability analysis of C3 Deployment in production settings at Spotify and SoundCloud Study of networking effects

47 Summary C3 combines careful replica ranking and distributed rate control to: Reduce tail, mean, and median latencies Improve load conditioning and throughput

Thank you! 48 Client Server { q s, 1/µ s }