Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.

Slides:

Advertisements

Similar presentations

Scheduling Algorithems

Advertisements

Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.

Capacity Planning in a Virtual Environment

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.

Introduction to DBA.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Performance Evaluation

6/25/2015Page 1 Process Scheduling B.Ramamurthy. 6/25/2015Page 2 Introduction An important aspect of multiprogramming is scheduling. The resources that.

Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

The Power of Choice in Data-Aware Cluster Scheduling

The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.

Distributed Low-Latency Scheduling

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical.

Adaptive Control of Virtualized Resources in Utility Computing Environments HP Labs: Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal University.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

Database Replication Policies for Dynamic Content Applications Gokul Soundararajan, Cristiana Amza, Ashvin Goel University of Toronto EuroSys 2006: Leuven,

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Improving Disk Latency and Throughput with VMware Presented by Raxco Software, Inc. March 11, 2011.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

MClock: Handling Throughput Variability for Hypervisor IO Scheduling in USENIX conference on Operating Systems Design and Implementation (OSDI ) 2010.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.

Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.

1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,

Using Map-reduce to Support MPMD Peng

Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

Purpose of Operating System Part 2 Monil Adhikari.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Workload-Management für komplexe Data Warehousing Umgebungen

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Measurement-based Design

Diskpool and cloud storage benchmarks used in IT-DSS

Process Scheduling B.Ramamurthy 9/16/2018.

Process Scheduling B.Ramamurthy 11/18/2018.

Chapter 5: CPU Scheduling

Cluster Resource Management: A Scalable Approach

Operating System Concepts

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

Process Scheduling B.Ramamurthy 12/5/2018.

Henge: Intent-Driven Multi-Tenant Stream Processing

Admission Control and Request Scheduling in E-Commerce Web Sites

Process Scheduling B.Ramamurthy 2/23/2019.

Process Scheduling B.Ramamurthy 2/23/2019.

Process Scheduling B.Ramamurthy 2/23/2019.

Process Scheduling B.Ramamurthy 4/11/2019.

Process Scheduling B.Ramamurthy 4/7/2019.

Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00

Process Scheduling B.Ramamurthy 4/19/2019.

Process Scheduling B.Ramamurthy 4/24/2019.

Process Scheduling B.Ramamurthy 5/7/2019.

Don Porter Portions courtesy Emmett Witchel

Presentation transcript:

Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1

2

3

4 Trend toward consolidation of front-end and back-end storage systems

BackendFront-end 5

Data Analysis Users BackendFront-end 6

Data Analysis Users BackendFront-end 7 Minutes to hours forever

Consolidated 8 Users

Consolidated 9 Users Provisioning Utilization Analysis time

10 Nope.

11 Cluster scheduling Privacy Failure domains Performance isolation

12 Performance isolation

4x increase in latency

14 1.Combine two workloads 2.Meet front-end latency requirements 3.Then maximize batch throughput Objective

Throughput Latency vs. 15

95 th 16 High-percentile latency 99 th

95 th 17 “…purely targeted at controlling performance at the 99.9 th percentile.” 99 th

scan get 18 put High-level operations

19 service-level objective performance metric requirement of type of request for

20 service-level objective 99 th percentile latency 100 milliseconds of get requests for

Approach Coordinated, multi-resource scheduler for consolidated storage workloads Add scheduling points within the system Dynamically adjust allocations to meet SLO of front-end workload Built on HBase/HDFS 21

22 ClientHBaseHDFS Front-end web server Batch MapReduce task

23 ClientHBaseHDFS Index lookup

24 ClientHBaseHDFS Read data

25 ClientHBaseHDFS Process

26 ClientHBaseHDFS Response

27 ClientHBaseHDFS CPU usage

28 ClientHBaseHDFS Disk usage

29 ClientHBaseHDFS Clean RPC interfaces

30 Client HBaseHDFS 11 1 st -level schedulers

31 Client HBaseHDFS nd- level scheduler 1

32 Client HBaseHDFS st- level schedulers

1 st -Level Schedulers Provide effective control over access to the underlying hardware resource – Low latency Expose scheduling knobs to the 2 nd -level scheduler 33

Three Requirements 1.Differentiated Scheduling 2.Chunk Large Requests 3.Limit # Outstanding Requests 34

Differentiated Scheduling 35 Requests are handled by HBase threads Threads wait in OS to be scheduled on a CPU HBase OS

Differentiated Scheduling 36 Unfairness with a single FIFO queue Front-end requests get stuck behind batch HBase OS

Differentiated Scheduling 37 Separate queues for each client Schedule based on allocations set by the 2 nd -level scheduler HBase OS

Chunk Large Requests 38 Large requests tie up resources while being processed Lack of preemption HBase OS

Chunk Large Requests 39 Split large requests into multiple smaller chunks Only wait for a chunk, rather than an entire large request HBase OS

Limit # of Outstanding Requests 40 Long queues outside of Cake impact latency Caused by too many outstanding requests Need to determine level of overcommit HBase OS

Limit # of Outstanding Requests 41 TCP Vegas-like scheme Latency as estimate of load on a resource Dynamically adjust # of threads HBase OS

Limit # of Outstanding Requests 42 Upper and lower latency bounds Additively increase # threads if underloaded Multiplicatively decrease # threads if overloaded Converges in the general case HBase OS

1 st -level Schedulers Separate queues enable scheduling at each resource Chunk large requests for better preemption Dynamically size thread pools to prevent long queues outside of Cake More details in the paper 43

44 Client HBaseHDFS nd -level Scheduler

Coordinates allocation at 1 st -level schedulers Monitors and enforces front-end SLO Two allocation phases – SLO compliance-based – Queue occupancy-based Just a sketch, full details are in the paper 45

SLO compliance-based Disk is the bottleneck in storage workloads Improving performance requires increasing scheduling allocation at disk (HDFS) Increase allocation when not meeting SLO Decrease allocation when easily meeting SLO 46

Queue Occupancy-based Want HBase/HDFS allocation to be balanced HBase can incorrectly throttle HDFS Look at queue occupancy – % of time a client’s request is waiting in the queue at a resource – Measure of queuing at a resource Increase when more queuing at HBase Decrease when more queuing at HDFS 47

Details Using both proportional share and reservations at 1 st -level schedulers Linear performance model Evaluation of convergence properties 48

Recap 1 st -level schedulers – Effective control over resource consumption – HBase and HDFS 2 nd -level scheduler – Decides allocation at 1 st -level schedulers – Enforces front-end client’s high-level SLO 49

Evaluation Consolidated workloads Front-end YCSB client doing 1-row gets Batch YCSB/MR client doing 500-row scans c1.xlarge EC2 instances 50

Diurnal Workload Front-end running web-serving workload Batch YCSB client running at max throughput Evaluate adaptation to dynamic workload Evaluate latency vs. throughput tradeoff 51

3x peak- to-trough

53 Front-end 99 th SLO % Meeting SLO Batch Throughput 100ms98.77%23 req/s 150ms99.72%38 req/s 200ms99.88%45 req/s

Analysis Cycle 20-node cluster Front-end YCSB client running diurnal pattern Batch MapReduce scanning over 386GB data Evaluate analytics time and provisioning cost Evaluate SLO compliance 54

55 ScenarioTimeSpeedupNodes Unconsolidated1722s1.0x40 Consolidated1030s1.7x20

56 Front-end 99 th SLO % Requests Meeting SLO 100ms99.95%

Evaluation Adapts to dynamic front-end workloads Can set different SLOs to flexibly move within latency vs. throughput tradeoff Significant gains from consolidation – 1.7x speedup – 50% provisioning cost 57

Conclusion Coordinated, multi-resource scheduler for storage workloads Directly enforce high-level SLOs – High-percentile latency – High-level storage system operations Significant benefits from consolidation – Provisioning costs – Analytics time 58

59

60

Limit Outstanding Requests 61

Limit Outstanding Requests 62