Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis.

Slides:



Advertisements
Similar presentations
EdgeNet2006 Summit1 Virtual LAN as A Network Control Mechanism Tzi-cker Chiueh Computer Science Department Stony Brook University.
Advertisements

Choosing an Order for Joins
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
SDN + Storage.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
1 Traffic Engineering (TE). 2 Network Congestion Causes of congestion –Lack of network resources –Uneven distribution of traffic caused by current dynamic.
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
The Organic Grid: Self- Organizing Computation on a Peer-to-Peer Network Presented by : Xuan Lin.
Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems Yingwu Zhu, Yiming Hu Appeared on IEEE Trans. on Parallel and Distributed Systems,
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
The Maryland Optics Group Multi-Hop View: Interfaces not available between (s, d): Try to create multi-hop path. Link Selection: Local Optimization: Select.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
Chapter 3: Data Storage and Access Methods
A Row-Permutated Data Reorganization Algorithm for Growing Server-less VoD Systems Presented by Ho Tsz Kin.
WiOpt’04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks March 24-26, 2004, University of Cambridge, UK Session 2 : Energy Management.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Searching in Unstructured Networks Joining Theory with P-P2P.
Game-Theoretic Models for Reliable Path- Length and Energy-Constrained Routing With Data Aggregation -Rajgopal Kannan and S. Sitharama Iyengar Xinyan Pan.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
The Power of Choice in Data-Aware Cluster Scheduling
High Throughput Route Selection in Multi-Rate Ad Hoc Wireless Networks Dr. Baruch Awerbuch, David Holmer, and Herbert Rubens Johns Hopkins University Department.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Load Balancing in Structured P2P System Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica IPTPS ’03 Kyungmin Cho 2003/05/20.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Resilient Overlay Networks By David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT RON Paper from ACM Oct Advanced Operating.
Minimax Open Shortest Path First (OSPF) Routing Algorithms in Networks Supporting the SMDS Service Frank Yeong-Sung Lin ( 林永松 ) Information Management.
Xiaodan Wang Department of Computer Science Johns Hopkins University Processing Data Intensive Queries in Scientific Database Federations.
Xiaodan Wang, Randal Burns Department of Computer Science Johns Hopkins University Tanu Malik Cyber Center Purdue University LifeRaft: Data-Driven, Batch.
Communication Support for Location- Centric Collaborative Signal Processing in Sensor Networks Parmesh Ramanathan University of Wisconsin, Madison Acknowledgements:K.-C.
Workshop on Networking Meets Databases (NetDB’07) Throughput-Optimized, Global-Scale Join Processing in Scientific Federations Xiaodan Wang, Randal Burns,
Department of Computer Science Aruna Balasubramanian, Brian Neil Levine, Arun Venkataramani DTN Routing as a Resource Allocation Problem.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
1 G-REMiT: An Algorithm for Building Energy Efficient Multicast Trees in Wireless Ad Hoc Networks Bin Wang and Sandeep K. S. Gupta Computer Science and.
Distributed and hierarchical deadlock detection, deadlock resolution
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
 Tree in Sensor Network Patrick Y.H. Cheung, and Nicholas F. Maxemchuk, Fellow, IEEE 3 rd New York Metro Area Networking Workshop (NYMAN 2003)
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
Dzmitry Kliazovich University of Luxembourg, Luxembourg
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
QOS Routing: The Precomputation Perspective Ariel Orda and Alexander Sprintson Presented by: Jing, Niloufer, Tri.
S. K. S. Gupta, Arizona State Univ On Maximizing Lifetime of Multicast Trees in Wireless Ad hoc Networks Bin Wang and Sandeep K. S. Gupta Computer Science.
Multicast Scaling Laws with Hierarchical Cooperation Chenhui Hu, Xinbing Wang, Ding Nie, Jun Zhao Shanghai Jiao Tong University, China.
Submitted by: Sounak Paul Computer Science & Engineering 4 th Year, 7 th semester Roll No:
Approximation Algorithms
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Department of Computer Science Johns Hopkins University Xiaodan Wang Advisor: Randal Burns Processing Data-Intensive Queries in Petabyte-Scale Scientific.
Architecture and Algorithms for an IEEE 802
CSCI5570 Large Scale Data Processing Systems
Introduction to Wireless Sensor Networks
Efficient Join Query Evaluation in a Parallel Database System
A Black-Box Approach to Query Cardinality Estimation
Wireless Sensor Network Architectures
Ahmed Helmy Computer and Information Science and Engineering (CISE)
Net 435: Wireless sensor network (WSN)
Parallel Programming in C with MPI and OpenMP
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
High Throughput Route Selection in Multi-Rate Ad Hoc Wireless Networks
Akshay Tomar Prateek Singh Lohchubh
Outline Introduction Background Distributed DBMS Architecture
Distributed Database Management Systems
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Performance-Robust Parallel I/O
SANDIE: Optimizing NDN for Data Intensive Science
Presentation transcript:

Hopkins Storage Systems Lab, Department of Computer Science Network-Aware Join Processing in Global-Scale Database Federations X. Wang, R. Burns, A. Terzis Johns Hopkins University A. Deshpande University of Maryland

Hopkins Storage Systems Lab, Department of Computer Science Time/Cost Trade-off for Reaching Isla Mujeres You are here Playa Tortuga Puerto Juarez Downtown Isla Mujeres 5 min, 7p 35 min, 150p 20 min, 7p 5 min, 30p 30 min, 70p 55 min, 107 pesos 40 min, 157 pesos

Hopkins Storage Systems Lab, Department of Computer Science Outline Target Application – Join scheduling in SkyQuery – Incorporating network structure Balanced network utilization metric – Exploit high throughput paths – Limitations Algorithms – Two-approximate, MST-based solution – Heuristic extensions (clustering, semi-joins, bushy plans)

Hopkins Storage Systems Lab, Department of Computer Science SkyQuery Publicly accessible federation of sky surveys (a virtual telescope) Autonomous, heterogeneous, and geographically distributed sites (30 across NA, EA, EU) Data intensive workload – Terabyte data sets – Hundred megabyte intermediate join results – Queries take ten to over a hundred seconds – Network transfers consume up to 70% of the time Principal federated query is cross-match

Hopkins Storage Systems Lab, Department of Computer Science Cross-Match Queries Join by increasing cardinality (count *) – Minimal I/O – Fewer bytes on the network Query Mediator Probe Query Result Count: 30Count: 100Count: 800

Hopkins Storage Systems Lab, Department of Computer Science Incorporating Network Structure

Hopkins Storage Systems Lab, Department of Computer Science Balanced Network Utilization Metric Exploit excess capacity and avoid long haul paths Minimizes aggregative time on the network Similar metrics used for stream-processing, multicast, and optimal link layer routing (Bertsekas & Gallager) Minimizes response time for serial schedules – Avoid over utilizing resources for bushy schedules – Does not account for I/O

Hopkins Storage Systems Lab, Department of Computer Science How to Extract Network Structure?

Hopkins Storage Systems Lab, Department of Computer Science Volatility in TCP Throughput

Hopkins Storage Systems Lab, Department of Computer Science Limitations Perfect join selectivity assumption – Observations against the same sky – Allows for polynomial-time solutions No attribute aggregation – Address heuristically Local optimizations at the mediators – Decentralized to achieve scale using aggregate stats – Routing at the application layer – Improve end performance and preserve I/O

Hopkins Storage Systems Lab, Department of Computer Science Spanning Tree Approximation (STA) B C A D E F G H

Hopkins Storage Systems Lab, Department of Computer Science STA: Find MST B C A D E F G H

Hopkins Storage Systems Lab, Department of Computer Science STA: Join Using Paths on the MST B C A D E F G H

Hopkins Storage Systems Lab, Department of Computer Science STA: Shortcutting in Metric Regions B C A D E F G H

Hopkins Storage Systems Lab, Department of Computer Science C-STA: Clustering TCP Throughput

Hopkins Storage Systems Lab, Department of Computer Science C-STA: Clustering TCP Throughput

Hopkins Storage Systems Lab, Department of Computer Science C-STA: Combine STA & Count * B C A D E F G H

Hopkins Storage Systems Lab, Department of Computer Science STA-SJ: Semi-joins and Attribute Agg. B C A D E F G H Aggregation Join Attr.

Hopkins Storage Systems Lab, Department of Computer Science STA-BP: Exploring Bushy Plans Poly-time DP Algorithm that explores bushy plans using MST paths – Evaluates regions in parallel when beneficial (avoids sending data down the tree) – May operate on larger intermediate results Intuition: Do not need to traverse STA paths twice if sites have low cardinality R ≤ 2R R > 2R

Hopkins Storage Systems Lab, Department of Computer Science Experiments: Network Utilization

Hopkins Storage Systems Lab, Department of Computer Science Experiments: I/O Overhead

Hopkins Storage Systems Lab, Department of Computer Science Experiments: Algorithms Compared

Hopkins Storage Systems Lab, Department of Computer Science Discussion DP solution w/o selectivity, aggregation, MST- based assumptions – T: O(n3 n ), S: O(n2 n ) – Applicability beyond SkyQuery (distributed OLAP/DSS) – May tolerate exponential complexity – Value in capturing network structure Don’t address multi-query optimization – Incomplete info about link layer – Global knowledge incurs high overhead

Hopkins Storage Systems Lab, Department of Computer Science Which Path to Choose? You are here Playa Tortuga Puerto Juarez Downtown Isla Mujeres 20 min, 7p 5 min, 30p 30 min, 70p 55 min, 107 pesos

Hopkins Storage Systems Lab, Department of Computer Science Questions ???