Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.

Slides:

Advertisements

Similar presentations

Starfish: A Self-tuning System for Big Data Analytics.

Advertisements

Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

Effective Straggler Mitigation: Attack of the Clones [1]

NLIT 09 Presentation Page 1 Vision – Service – Partnership Page 1 WAN Acceleration Using Cisco WAAS Robert Morrow National Security Technologies LLC

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Packet Caches on Routers: The Implications of Universal Redundant Traffic Elimination Ashok Anand, Archit Gupta, Aditya Akella University of Wisconsin,

Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Quality-Aware Segment Transmission Scheduling in Peer-to-Peer Streaming Systems Cheng-Hsin Hsu Senior Research Scientist Deutsche Telekom R&D Lab USA Los.

Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.

The Power of Choice in Data-Aware Cluster Scheduling

Word Wide Cache Distributed Caching for the Distributed Enterprise.

Making Every Bit Count in Wide Area Analytics Ariel Rabkin Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai 1.

Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2

Network Aware Resource Allocation in Distributed Clouds.

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.

Ariel Rabkin Princeton University Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Work done with.

Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,

A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside

Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

Surviving Failures in Bandwidth Constrained Datacenters Authors: Peter Bodik Ishai Menache Mosharaf Chowdhury Pradeepkumar Mani David A.Maltz Ion Stoica.

Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.

Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

Performance Assurance for Large Scale Big Data Systems

R-Storm: Resource Aware Scheduling in Storm

Packing Tasks with Dependencies

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

Distributed Programming in “Big Data” Systems Pramod Bhatotia wp

International Conference on Data Engineering (ICDE 2016)

Measurement-based Design

ECE 544: Traffic engineering (supplement)

Managing Data Transfer in Computer Clusters with Orchestra

Spark Presentation.

A Study of Group-Tree Matching in Large Scale Group Communications

Seung-won Hwang, Kevin Chen-Chuan Chang

New Workflow Scheduling Techniques Presentation: Anirban Mandal

CFA: A Practical Prediction System for Video Quality Optimization

Resource Elasticity for Large-Scale Machine Learning

PA an Coordinated Memory Caching for Parallel Jobs

How to Achieve Application Performance

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Cof low A Networking Abstraction for Distributed

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Systems for ML Clipper Gaia Training (TensorFlow)

Introduction to Spark.

湖南大学-信息科学与工程学院-计算机与科学系

Haoyu Zhang, Microsoft and Princeton University;

StreamApprox Approximate Stream Analytics in Apache Spark

Cse 344 May 4th – Map/Reduce.

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

Reining in the Outliers in MapReduce Jobs using Mantri

Pramod Bhatotia, Ruichuan Chen, Myungjin Lee

Resource Allocation in a Middleware for Streaming Data

First Hop Offloading of Mobile DAG Computations

Presentation transcript:

Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Global DC Footprints Apps are deployed in many DCs and edges – To provide low latency access – Streaming videos, search, communication (OTT) Analyze data logs in geo-distributed DCs – 99 th percentile of search query latency – Top-k congested ISPs (for re-routing calls)

WAN Geo-distributed Data Analytics Seattle Berkeley Beijing London Wasteful & Slow Perf. counters User activities “Centralized” Data Analytics Paradigm

WAN Seattle Berkeley Beijing London A single logical analytics cluster across all sites Unchanged intra-DC queries run geo-distrib.

5 WAN Seattle Berkeley Beijing London Incorporating WAN bandwidths is key to geo-distributed analytics performance. A single logical analytics system across all sites

Inter-DC WAN Bandwidths Heterogeneous and (relatively) limited – Higher utilization than intra-DC bandwidths Growth is slowing down (expensive to lay) – Costly to use inter-DC links Used by intermediate tasks for data transfer – E.g., reduce shuffles or join operators

Geo-distrib. Query Execution Graph of dependent tasks Seattle 40GB 20GB London 40GB WAN

Incorporating WAN bandwidths Task placement – Decides the destinations of network transfers Data placement – Decides the sources of network transfers

Example Query [1] SELECT time_window, percentile(latency, 99) GROUP BY time_window Seattle 40GB 20GB London 40GB 800 MB/s 200 MB/s WAN

Task Fractions Upload Time (s) Download Time (s) Input Data (GB) Query [1]: Impact of Task Placement Seattle London GB 12.5s 50s s 2.5s 2.5x How to generally solve for any #sites, BW heterogeneity and data skew? Seattle London 40

Task Placement (TP Solver) Task 1 -> London Task 2 -> Beijing Task 5 -> London … Sites M Tasks N Data Matrix (MxN) Upload BWs Download BWs 11 TP Solver TP Solver Optimization Goal: Minimize the longest transfer of all links

Task Fractions Upload Time (s) Download Time (s) Input Data (GB) London Seattle 100GB 50s 6.25s 40GB 160GB s 6s 2x 50s How to jointly optimize data and task placement? Seattle London 100 Query [2]: Impact of Data Placement Query Lag

Iridium Jointly optimize data and task placement with greedy heuristic improve query response time bandwidth, query lags, etc Approach Goal Constraints

Iridium with Single Dataset Iterative heuristics for joint task-data placement 1, Identify bottlenecks by solving task placement 2, assess:find amount of data to move to alleviate current bottleneck TP Solver TP Solver TP Solver TP Solver Until query arrivals, repeat.

Iridium with Multiple Datasets Prioritize high-value datasets: score = value x urgency / cost - value = sum(timeReduction) for all queries - urgency = 1/avg(query_lag) - cost = amount of data moved

Iridium: putting together Placement of data – Before query arrival – prioritize the move of high-value datasets Placement of tasks – During query execution: – constrained solver TP Solver TP Solver See paper for: estimation of query arrivals, contention of move&query, etc

Evaluation Spark and HDFS – Override Spark’s task scheduler with ours – Data placement creates copies in cross-site HDFS Geo-distributed deployment across 8 regions – Tokyo, Singapore, Sydney, Frankfurt, Ireland, Sao Paulo, Virginia (US) and California (US).

Spark jobs, SQL queries and streaming queries – Conviva: video sessions paramters – Bing Edge: running dashboard, streaming – TPC-DS: decision support queries for retail – AMP BDB: mix of Hive and Spark queries Baseline: – “In-place”: Leave data unmoved + Spark’s scheduling – “Centralized”: aggregate all data onto one site How well does Iridium perform?

Iridium outperforms 4x-19x 3x-4x Conviva Bing-Edge TPC-DS Big-Data vs. In-place vs. Centralized 10x 19x 7x 4x Reduction (%) in Query Response Time 3x 4x 3x

Iridium subsumes both baselines! vs. Centralized: Data placement has higher contribution vs. In-place: Equal contributions from two techniques Median Reduction (%) Vs. CentralizedVs. In-place Task placement Data placement Iridium (both) 18% 38% 75% 24% 30% 63%

Bandwidth Cost Using inter-DC WAN has a cost $/byte  #bytes sent should be low Budgeted Scheme— 1.Bandwidth-optimal scheme (MinBW) 2.Budget of B * MinBW; B ≥ 1 3.Never violate the budget at any point in time

Reduction (%) in WAN Usage 1.5*MinBW 1.3*Bmin 1*MinBW (64%, 19%) better MinBW: a scheme that minimizes bandwidth Iridium: budget the bandwidth usage to be B * MinBW Iridium can speed up queries while using near-optimal WAN bandwidth cost Bandwidth Cost

Current limitations Modeling: – Compute constraints, complex DAGs – WAN topologies beyond congestion-free core – What is the optimal placement? Bandwidth cost vs. latency tradeoff with “look- ahead” capability Hierarchical GDA (semi-autonomy per site) Network-aware Query Optimizer

Related work JetStream (NSDI’14) – Data aggregation and adaptive filtering – Does not support arbitrary queries, nor optimizes task and data placement WANalytics (CIDR’15), Geode (NSDI’15) – Optimize BW usage for SQL & general DAG jobs – Can lead to poor query performance time

Low Latency Geo-distributed Data Analytics Data is geographically distributed Services with global footprints Analyze logs across DCs “99 percentile movie rating” “Median Skype call setup latency” Abstraction: Single logical analytics cluster across all sites  Incorporating WAN bandwidths  Reduce response time over baselines by 3x – 19x WAN Seattle Berkeley Beijing London