Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.

Slides:



Advertisements
Similar presentations
Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.
Advertisements

Tutorial 4 Scheduling. Why do we need scheduling? To manage processes according to requirements of a system, like: –User responsiveness or –Throughput.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Berkeley Data Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY.
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Resource Management with YARN: YARN Past, Present and Future
Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.
UC Berkeley Job Scheduling with the Fair and Capacity Schedulers Matei Zaharia Wednesday, June 10, 2009 Santa Clara Marriott.
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Towards Decoupling Storage and Computation in Hadoop with SuperDataNodes George Porter * Center for Networked Systems U. C. San Diego LADIS 2009 / Big.
HadoopDB An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads Presented By: Wen Zhang and Shawn Holbrook.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
© 2013 MediaCrossing, Inc. All rights reserved. Going Live: Preparing your first Spark production deployment Gary Malouf Architect,
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
The Power of Choice in Data-Aware Cluster Scheduling
The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.
Outline | Motivation| Design | Results| Status| Future
Distributed Low-Latency Scheduling
1 CS : Project Suggestions Ion Stoica ( September 14, 2011.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Web Server Support for Tired Services Telecommunication Management Lab M.G. Choi.
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Matchmaking: A New MapReduce Scheduling Technique
1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
BIG DATA/ Hadoop Interview Questions.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
Performance Assurance for Large Scale Big Data Systems
How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)
Machine Learning Library for Apache Ignite
Hadoop and Analytics at CERN IT
Measurement-based Design
Chapter 10 Data Analytics for IoT
Edinburgh Napier University
Spark Presentation.
PA an Coordinated Memory Caching for Parallel Jobs
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Sajitha Naduvil-vadukootu
Omega: flexible, scalable schedulers for large compute clusters
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
Introduction to Apache
3: CPU Scheduling Basic Concepts Scheduling Criteria
CPU SCHEDULING.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Resource-Efficient and QoS-Aware Cluster Management
Cloud Computing Large-scale Resource Management
Cloud Computing MapReduce in Heterogeneous Environments
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Module 5: CPU Scheduling
Fast, Interactive, Language-Integrated Cluster Computing
COS 518: Distributed Systems Lecture 11 Mike Freedman
Module 5: CPU Scheduling
Presentation transcript:

Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Outline The Spark scheduling bottleneck Sparrow’s fully distributed, fault-tolerant technique Sparrow’s near-optimal performance

Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling

Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling

Job Latencies Rapidly Decreasing 10 min. 10 sec. 100 ms 1 ms 2004: MapReduce batch job 2009: Hive query 2010: Dremel Query 2012: Impala query 2010: In- memory Spark query 2013: Spark streamin g

Job latencies rapidly decreasing

+ Spark deployments growing in size Scheduling bottleneck!

Spark scheduler throughput: 1500 tasks / second 1 second 100 ms 10 second Task Duration Cluster size (# 16-core machines)

Optimizing the Spark Scheduler 0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput

Is the scheduler the bottleneck in my cluster?

Worke r Cluster Scheduler Task launch Task completion

Worke r Cluster Scheduler Task launch Task completion

Worke r Cluster Scheduler Task launch Task completion Schedu ler delay

Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling

Future Spark Worke r User 1 User 2 User 3 Scheduler Query compilation Scheduler Query compilation Scheduler Query compilation Benefits: High throughput Fault tolerance

Future Spark Worke r User 1 User 2 User 3 Scheduler Query compilation Scheduler Query compilation Scheduler Query compilation Storage: Tachy on

Scheduling with Sparrow Worke r Schedu ler Stage Worke r

Stage Batch Sampling Worke r Schedu ler Worke r Place m tasks on the least loaded of 2m workers 4 probes (d = 2)

Queue length poor predictor of wait time Work er 80 ms 155 ms 530 ms Poor performance on heterogeneous workloads

Stage Late Binding Worker Scheduler Worker Place m tasks on the least loaded of d  m workers 4 probes (d = 2)

Late Binding Scheduler Place m tasks on the least loaded of d  m workers 4 probes (d = 2) Worker Stage

Late Binding Scheduler Place m tasks on the least loaded of d  m workers Worke r reques ts task Worker Stage

What about constraints?

Stage Per-Task Constraints Scheduler Worker Probe separately for each task

Technique Recap Scheduler Batch sampling + Late binding + Constraints Work er

How well does Sparrow perform?

How does Sparrow compare to Spark’s native scheduler? core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load

TPC-H Queries: Background TPC-H: Common benchmark for analytics workloads Sparrow Spark Shark : SQL execution engine

TPC-H Queries core EC2 nodes, 10 schedulers, 80% load Percent iles 5 Within 12% of ideal Median queuing delay of 9ms

Policy Enforcement Worker High Priority Low Priority Worker User A (75%) User B (25%) Fair Shares Serve queues using weighted fair queuing Priorities Serve queues based on strict priorities

Weighted Fair Sharing

Fault Tolerance Schedul er 1 Schedul er 2 Spark Client 1 ✗ Spark Client 2 Timeout: 100ms Failover: 5ms Re-launch queries: 15ms

Making Sparrow feature- complete Interfacing with UI Delay scheduling Speculation

(2) Distributed, fault-tolerant scheduling with Sparrow Scheduler Work er (1) Diagnosing a Spark scheduling bottleneck