Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica
WAN Geo-distributed Data Analytics Seattle Berkeley Beijing London Slow & Wasteful 2 Perf. counters User activities … “Centralized” Data Analytics Paradigm
3 WAN Seattle Berkeley Beijing London A single logical analytics cluster across all sites.
44 WAN Seattle Berkeley Beijing London Incorporating WAN bandwidths is key to geo-distributed analytics performance. A single logical analytics system across all sites.
Incorporating WAN bandwidths Task placement – Decides the destinations of network transfers Data placement – Decides the sources of network transfers 5
Example Analytics Job SELECT time_window, percentile(latency, 99) GROUP BY time_window Seattle 40GB 20GB London 40GB 800 MB/s 200 MB/s WAN
Task Fractions Upload Time (s) Download Time (s) Input Data (GB) Calculating Transfer Time Seattle London GB 12.5s 50s s 2.5s 2.5x How to solve the general case, with more sites, BW heterogeneity and data skew? Seattle London 40
Task Placement (TP Solver) Task 1 -> London Task 2 -> Beijing Task 5 -> London … Sites M Tasks N Data Matrix (MxN) Upload BWs Download BWs 8 TP Solver TP Solver Optimization Goal: Minimize the longest transfer of all links
Task Fractions Upload Time (s) Download Time (s) Input Data (GB) London Seattle 100GB 50s 6.25s 40GB 160GB s 6s 2x 50s How to jointly optimize data and task placement? Seattle London 100 Another example Query Lag
Iridium Jointly optimize data and task placement with greedy heuristic improve query response time bandwidth, query arrivals, etc Approach Goal Constraints 10
Iridium with Single Dataset Iterative heuristics for joint task-data placement. 1, Identify bottlenecks by solving task placement 2, assess:find amount of move data to alleviate current bottleneck 11 TP Solver TP Solver TP Solver TP Solver Until query arrivals, repeat.
Iridium with Multiple Datasets Prioritize high-value datasets: score = value x urgency / cost - value = sum(timeReduction) for all queries - urgency = 1/avg(query_lag) - cost = amount of data moved 12
13 Iridium: putting together Placement of data – Before query arrival – prioritize the move of high-value datasets Placement of tasks – During query execution: – constrained solver TP Solver TP Solver Not talked about: estimation of query arrivals, contention of move&query, etc
Evaluation Spark and HDFS – Override Spark’s task scheduler with ours – Data placement creates copies in cross-site HDFS Geo-distributed EC2 deployment across 8 regions – Tokyo, Singapore, Sydney, Frankfurt, Ireland, Sao Paulo, Virginia (US) and California (US). 14
Spark jobs, SQL queries and streaming queries – Conviva: video sessions paramters – Bing Edge: running dashboard, streaming – TPC-DS: decision support queries for retail – AMP BDB: mix of Hive and Spark queries Baseline: – “In-place”: Leave data unmoved + Spark’s scheduling – “Centralized”: aggregate all data onto one site How well does Iridium perform? 15
Iridium outperforms 4x-19x 3x-4x Conviva Bing-Edge TPC-DS Big-Data vs. In-place vs. Centralized 16 10x 19x 7x 4x Reduction (%) in Query Response Time 3x 4x 3x
Iridium subsumes both baselines! vs. Centralized: Data placement has higher contribution vs. In-place: Equal contributions from two techniques Median Reduction (%) Vs. CentralizedVs. In-place Task placement Data placement Iridium (both) 18% 38% 75% 24% 30% 63%
Reduction (%) in WAN Usage 1.5xBmin 1.3xBmin 1xBmin (64%, 19%) better MinBW: a scheme that minimizes bandwidth, to Bmin Iridium: budget the bandwidth usage to be m*Bmin Iridium can speed up queries while using near-optimal bandwidth cost Bandwidth Cost
Related work JetStream (NSDI’14) – Data aggregation and adaptive filtering – Does not support arbitrary queries, nor optimizes task and data placement WANalytics (CIDR’15), Geode (NSDI’15) – Optimize BW usage for SQL & general DAG jobs – Can lead to poor query performance time 19
20 Low Latency Geo-distributed Data Analytics Data is geographically distributed Services with global footprints Analyze logs across DCs “99 percentile movie rating” “Median Skype call setup latency” Abstraction: Single logical analytics cluster across all sites Incorporating WAN bandwidths Reduce response time over baselines by 3x – 19x WAN Seattle Berkeley Beijing London