Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis.

Similar presentations


Presentation on theme: "Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis."— Presentation transcript:

1 Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis Johns Hopkins University A. Deshpande University of Maryland

2 Hopkins Storage Systems Lab, Department of Computer Science Time/Cost Trade-off for Reaching Isla Mujeres You are here Playa Tortuga Puerto Juarez Downtown Isla Mujeres 5 min, 7p 35 min, 150p 20 min, 7p 5 min, 30p 30 min, 70p 55 min, 107 pesos 40 min, 157 pesos

3 Hopkins Storage Systems Lab, Department of Computer Science Outline Target Application – Join scheduling in SkyQuery – Incorporating network structure Balanced network utilization metric – Exploit high throughput paths – Limitations Algorithms – Two-approximate, MST-based solution – Heuristic extensions (clustering, semi-joins, bushy plans)

4 Hopkins Storage Systems Lab, Department of Computer Science SkyQuery Publicly accessible federation of sky surveys (a virtual telescope) Autonomous, heterogeneous, and geographically distributed sites (30 across NA, EA, EU) Data intensive workload – Terabyte data sets – Hundred megabyte intermediate join results – Queries take ten to over a hundred seconds – Network transfers consume up to 70% of the time Principal federated query is cross-match

5 Hopkins Storage Systems Lab, Department of Computer Science Cross-Match Queries Join by increasing cardinality (count *) – Minimal I/O – Fewer bytes on the network Query Mediator Probe Query Result Count: 30Count: 100Count: 800

6 Hopkins Storage Systems Lab, Department of Computer Science Incorporating Network Structure

7 Hopkins Storage Systems Lab, Department of Computer Science Balanced Network Utilization Metric Exploit excess capacity and avoid long haul paths Minimizes aggregative time on the network Similar metrics used for stream-processing, multicast, and optimal link layer routing (Bertsekas & Gallager) Minimizes response time for serial schedules – Avoid over utilizing resources for bushy schedules – Does not account for I/O

8 Hopkins Storage Systems Lab, Department of Computer Science How to Extract Network Structure?

9 Hopkins Storage Systems Lab, Department of Computer Science Volatility in TCP Throughput

10 Hopkins Storage Systems Lab, Department of Computer Science Limitations Perfect join selectivity assumption – Observations against the same sky – Allows for polynomial-time solutions No attribute aggregation – Address heuristically Local optimizations at the mediators – Decentralized to achieve scale using aggregate stats – Routing at the application layer – Improve end performance and preserve I/O

11 Hopkins Storage Systems Lab, Department of Computer Science Spanning Tree Approximation (STA) B C A D E F G H

12 Hopkins Storage Systems Lab, Department of Computer Science STA: Find MST B C A D E F G H

13 Hopkins Storage Systems Lab, Department of Computer Science STA: Join Using Paths on the MST B C A D E F G H 1 2 3 5 4 6 7 9 8 10 12 11 13

14 Hopkins Storage Systems Lab, Department of Computer Science STA: Shortcutting in Metric Regions B C A D E F G H 1 2 3 5 4 6 7 10 9 8

15 Hopkins Storage Systems Lab, Department of Computer Science C-STA: Clustering TCP Throughput

16 Hopkins Storage Systems Lab, Department of Computer Science C-STA: Clustering TCP Throughput

17 Hopkins Storage Systems Lab, Department of Computer Science C-STA: Combine STA & Count * B C A D E F G H 1 2 3 4 5 6 7 8 9

18 Hopkins Storage Systems Lab, Department of Computer Science STA-SJ: Semi-joins and Attribute Agg. B C A D E F G H 1 2 3 5 4 6 7 9 8 10 12 11 13 Aggregation Join Attr.

19 Hopkins Storage Systems Lab, Department of Computer Science STA-BP: Exploring Bushy Plans Poly-time DP Algorithm that explores bushy plans using MST paths – Evaluates regions in parallel when beneficial (avoids sending data down the tree) – May operate on larger intermediate results Intuition: Do not need to traverse STA paths twice if sites have low cardinality R ≤ 2R R > 2R

20 Hopkins Storage Systems Lab, Department of Computer Science Experiments: Network Utilization

21 Hopkins Storage Systems Lab, Department of Computer Science Experiments: I/O Overhead

22 Hopkins Storage Systems Lab, Department of Computer Science Experiments: Algorithms Compared

23 Hopkins Storage Systems Lab, Department of Computer Science Discussion DP solution w/o selectivity, aggregation, MST- based assumptions – T: O(n3 n ), S: O(n2 n ) – Applicability beyond SkyQuery (distributed OLAP/DSS) – May tolerate exponential complexity – Value in capturing network structure Don’t address multi-query optimization – Incomplete info about link layer – Global knowledge incurs high overhead

24 Hopkins Storage Systems Lab, Department of Computer Science Which Path to Choose? You are here Playa Tortuga Puerto Juarez Downtown Isla Mujeres 20 min, 7p 5 min, 30p 30 min, 70p 55 min, 107 pesos

25 Hopkins Storage Systems Lab, Department of Computer Science Questions ???


Download ppt "Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis."

Similar presentations


Ads by Google