Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

Chapter 5: Tree Constructions
ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Discovering Queries based on Example Tuples
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
LOGO Association Rule Lecturer: Dr. Bo Yuan
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
Linked Bernoulli Synopses Sampling Along Foreign Keys Rainer Gemulla, Philipp Rösch, Wolfgang Lehner Technische Universität Dresden Faculty of Computer.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Deliver Multimedia Streams with Flexible QoS via a Multicast DAG Yu Cai 02/26/2004.
Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 21. Review ANALYSIS PHASE (OBJECT ORIENTED DESIGN) Functional Modeling – Use case Diagram Description.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Database Management 9. course. Execution of queries.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Discovering Meta-Paths in Large Heterogeneous Information Network
1 Some of my XML/Internet Research Projects CSCI 6530 October 5, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
GSLPI: a Cost-based Query Progress Indicator
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
Mayssam Sayyadian, AnHai Doan University of Wisconsin - Madison Hieu LeKhac University of Illinois - Urbana Luis Gravano Columbia University Efficient.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
Lu Qin Center of Quantum Computation and Intelligent Systems, University of Technology, Australia Jeffery Xu Yu The Chinese University of Hong Kong, China.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
CSCI5570 Large Scale Data Processing Systems
Efficient Join Query Evaluation in a Parallel Database System
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
MCN: A New Semantics Towards Effective XML Keyword Search
Clock Tree Routing With Obstacles
Prefer: A System for the Efficient Execution
The Gamma Database Machine Project
Presentation transcript:

Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494

October 13, Example Definitions Algorithm CN Generation Sequential Algorithm CLP : Naïve CLP : New OLP DLP Performance Studies CN Evaluation CONTENTS

October 13, BANKS Model Author1Author2 Paper1 Author1Author2 Paper2 Steiner Trees

October 13, DISCOVER Model Author1Author2 Paper1 TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITES PAPERCITE Writes {} Paper {} Writes {} Joining Network Of Tuples Joining Network Of Tuple Sets Author1: Paper1 Author2: Paper1 Author1Author2 Paper2 Author1: Paper2 Author2: Paper2 Author Author1 Author Author2 Author Author1 Writes {} Paper {} Writes {} Author Author2

5 Background : DISCOVER October 13, 2011

6 Background : DISCOVER Schema Graph (TPC-H) October 13, 2011

Background : DISCOVER 7 Example Data Source : Discover[3] October 13, 2011

Background : DISCOVER 8 Query: Smith,Miller” Source : Discover[3] October 13, 2011

9 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 October 13, 2011

10 Source : Discover[3] Background : DISCOVER Query: Smith,Miller” SIZERESULT 2 O1 C1 O2 4 O1 C1 N1 C2 O3 Joining Network Of Tuples October 13, 2011

11October 5, 2011 Joining Network Of Tuple Sets Background : DISCOVER Source : Discover[2]

12 Background : DISCOVER October 13, 2011

13 Background : DISCOVER October 13, 2011

14  Candidate Networks Generation  Complete : Every possible MTJNT is produced by a candidate network output by the algorithm  Minimal : Does not produce any redundant candidate networks Example:  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS Miller ⋈ CUSTOMER{}  ORDERS Smith ⋈ CUSTOMER{} ⋈ ORDERS{}  ORDERS Smith ⋈ LINEITEM{} ⋈ ORDERS Miller  Tmax : Maximum number of tuple sets in a CN Background : DISCOVER October 13, 2011

15 CN Generation October 13, 2011 Source : Discover[2]

16 CN Generation October 13, 2011 Source : Discover[2]

17 CN Generation October 13, 2011 Source : Discover[2]

18 CN Evaluation : October 13, 2011

Sequential Algorithm : Example 19  Dataset : DBLP Source : TTS[1] TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011

20 Source : TTS[1] Sequential Algorithm : Example TID NAME TID NAME TID AID PID TID PID1 PID2 AUTHORWRITE PAPERCITE October 13, 2011

CN Evaluation : state-of-art sequential algorithm 21October 13, 2011

22 Source : TTS[1] Sequential Algorithm : Execution Graph October 13, 2011

23 Sequential Algorithm : Execution Graph October 13, 2011

24 New Solution  Use of multi-core architecture  Why not existing parallel multi-query processing?  Large number of queries  Large sharing between queries  Large intermediate results  What we need on multi-core archs?  CNs in the same core share : most computational cost  CNs in different cores share : least computational cost  Handle high workload skew  Handle errors caused by estimation adaptively October 13, 2011

25 CN Level Parallelism : Straightforward Approach  largest first rule : partition with the least workload Final Cost : max(cost of each core) = 1949 Source : TTS[1] October 13, 2011

26 CLP : Straightforward Approach Source : TTS[1] select the core : O(n) October 13, 2011

27 CLP: Sharing-Aware CN Partitioning  Which CN to distribute first?  the largest not-shared/extra cost  To which partition?  with maximum sharing if it does not destroy the workload balancing.  Total cost for a partition = cost after sharing sub-expressions for all CNs in that partition October 13, 2011

APPAPP W C CWC C PPP Core 1Core 2Core 3 CNMinCost MaxHeap : Non-Exec Graph of Core 3 October 13,

APPAPP W C CWC C PPP MaxHeap Core 1Core 2Core 3 CNMinCost October 13,

APPP W C WC C PP Core 1Core 2Core 3 CNMinCost MaxHeap October 13,

PPP C WC C P CNMinCost MaxHeap Core 1Core 2Core 3 October 13,

PP WC C CNMinCost Core 1Core 2Core 3 MaxHeap October 13,

33 CLP: Sharing-Aware CN Partitioning Source : TTS[1] October 13, 2011

34 CLP: Sharing-Aware CN Partitioning Source : TTS[1] Initialization October 13, 2011

35 CLP: Error Accumulation Source : TTS[1] October 13, 2011

36 Operator Level Parallelism October 13, 2011

37 Operator Level Parallelism Source : TTS[1] October 13, 2011

38 OLP : Overcoming Error Accumulation October 13, 2011

39 OLP : Overcoming Accumulated Cost Source : TTS[1] October 13, 2011

40 Operator Level Parallelism Source : TTS[1] October 13, 2011

41 Data Level Parallelism  each operation in GE can be performed on multiple cores  uses the operation level parallelism if there is no workload skew  partition data adaptively before each time workload skew happens  Which node to partition?  Most costly node if its dominant  When to merge the sub-results?  At final phase October 13, 2011

42 Data Level Parallelism Source : TTS[1] Core 1 Core 2Core 3 October 13, 2011

43 Data Level Parallelism Source : TTS[1] Divide the tuples of child node Select the child node to be partitioned Makes copies of selected child node and all its father nodes Adds corresponding edges Re-estimate October 13, 2011

44 Performance Studies October 13, 2011

45 Source : TTS[1] Performance Studies October 13, 2011

46 Source : TTS[1] October 13, 2011

47 Source : TTS[1] October 13, 2011

48 Source : TTS[1 ] October 13, 2011

References 1. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Ten Thousand SQLs: Parallel Keyword Queries Computing, Proceedings of the VLDB Endowment, Volume 3 Issue 1-2, September 2010, Singapore 2. Vagelis Hristidis, Yannis Papakonstantinou, Discover: keyword search in relational databases, VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases, Hong Kong 3. [PPT] DISCOVER: Keyword Search in Relational Databases 49October 13, 2011

50October 13, 2011