Wei Wang University of New South Wales, Australia

Slides:

Advertisements

Similar presentations

Ting Chen, Jiaheng Lu, Tok Wang Ling

Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.

Spatio-temporal Databases

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.

Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.

Indexing and Range Queries in Spatio-Temporal Databases

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen.

Spatial Queries Nearest Neighbor Queries.

Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.

Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.

CS4432: Database Systems II Query Processing- Part 2.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.

File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.

XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)

Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.

By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01

B-Trees B-Trees.

Efficient processing of path query with not-predicates on XML data

B-Trees B-Trees.

Efficient Filtering of XML Documents with XPath Expressions

Chapter Trees and B-Trees

Chapter Trees and B-Trees

TT-Join: Efficient Set Containment Join

OrientX: an Integrated, Schema-Based Native XML Database System

Spatial Online Sampling and Aggregation

Joining Interval Data in Relational Databases

External Joins Query Optimization 10/4/2017

Selected Topics: External Sorting, Join Algorithms, …

Chapters 15 and 16b: Query Optimization

CSIT 402 Data Structures II With thanks to TK Prasad

Incremental Maintenance of XML Structural Indexes

Evaluation of Relational Operations: Other Techniques

Overview of Query Evaluation: JOINS

Lecture 11: B+ Trees and Query Execution

Lecture 20: Indexes Monday, February 27, 2006.

Indexing, Access and Database System Architecture

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

Efficient Processing of Top-k Spatial Preference Queries

Donghui Zhang, Tian Xia Northeastern University

Efficient Aggregation over Objects with Extent

External Sorting Dina Said

Forest Packing: Fast Parallel, Decision Forests

Presentation transcript:

Wei Wang University of New South Wales, Australia 7/4/2019 Efficient Processing of XML Path Queries Using the Disk-based F&B Index Wei Wang University of New South Wales, Australia With Hongzhi Wang (HIT), Hongjun Lu (HKUST), Haifeng Jiang (IBM), Xuemin Lin (UNSW), Jianzhong Li (HIT) Dr. Wei Wang @ CSE, UNSW

XML Query Processing XML Query by structural constraint 7/4/2019 XML Query Processing XML Modeled as a labeled tree Query by structural constraint Simple Path Queries, e.g., //Customer//Name Branching/Twig Queries, e.g., //Customer[//Zipcode]//Name 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Index or Join? Index-based approaches Join-based approaches 7/4/2019 Index or Join? Q1: /a/b Index-based approaches DataGuide, 1-index F&B Index and a few approximate indexes Join-based approaches Structural join Twig join a b b a Also hybrid approach, e.g., MIXED mode paper from wisc in VLDB 2003. If XML is a tree, all those indexes are trees. b Join-based approaches appear to be more actively researched! 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Outline Introduction Disk-based F&B Index Experiment Conclusions 7/4/2019 VLDB 2005

XML Structural Indexes 7/4/2019 XML Structural Indexes “Exact” Indexes 1-index Based on backward bisimilarity Covers all simple path queries F&B Index Based on backward and forward bisimilarity Covers all branching queries (optimally) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

A Running Example extent Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b} 7/4/2019 A Running Example Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b} extent F&B is refined from 1-index 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Problems with F&B Index? 7/4/2019 Problems with F&B Index? Lack of scalability Usually large in practice No immediate solution when it cannot be accommodated in memory Unbalanced, all-leaf-nodes tree Naïve solutions (e.g., B+-tree, pre-order clustering in Lore, subtree clustering in Natix) do not work well Lack of efficiency Non-deterministic searching //-axis requires traversing the whole subtrees Much more costly when the index is not in the memory 100M XMark, 2M doc nodes  0.5 million F&B nodes if treated as a tree. 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Outline Introduction Disk-based F&B Index Experiment Conclusions 7/4/2019 VLDB 2005

7/4/2019 Disk-based F&B Index Overcome the memory limit by putting F&B index to the disk Naïve method does not work well For this query, need to touch all the pages + random I/O Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Basic Idea Moral: Clustering is important Cluster by tag  tape 7/4/2019 Basic Idea Moral: Clustering is important Cluster by tag  tape Cluster by parent  segment & block Cluster by 1-index ID  chunk Benefits: Optimized tree traversals Enable other intelligent algorithms 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

7/4/2019 Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

7/4/2019 Q.P. by Tree Traversal Dim 1: DFS/BFS Dim 2: Path/Branching Path Dim 3: / or // Q5: /a/b/c Q2: /a/b[d] Q4: /a//c Problem: Still have to traverse the entire subtrees to process // 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Q.P. by RangeFetch H(1, c) = [3, 6] 7/4/2019 Q.P. by RangeFetch H(1, c) = [3, 6] (chunkID, tagName) Q4: /a//c Restriction: Can only answer /p//q, where p is a simple path. 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

More Data Structures 3 more tapes: 7/4/2019 More Data Structures 3 more tapes: Add region code for each d-node in the extents  Extents Tape Use physical (start, end) codes Sort d-nodes according to (start, end) Add Doc Tape Add Value Tape 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

7/4/2019 Example 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

SegSJ Key observation: SegSJ(/p//q) 7/4/2019 SegSJ Key observation: Structural relationship between two segments can be inferred from the relationship between their first d-nodes in their extent. SegSJ(/p//q) R(s, e)  A = /p S(s, e)  D = //q Structural join R and S Using partition-based or sorting-based SJ algorithm b1  (10,78), (210, 297), … d1  (19,25), (54, 66), … Take the (s, e) of the first d-node in each segment 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Outline Introduction Disk-based F&B Index Experiment Conclusions 7/4/2019 VLDB 2005

Experiments Setup DBLP/XMark/TreeBank 8 representative queries 7/4/2019 Experiments Setup DBLP/XMark/TreeBank 8 representative queries Dim 1: PC/AD Dim 2: Path/Twig Dim 3: Large/Small DFS, BFS, RangeFetch, SegSJ NoK, TwigStack, Kaushik’s algorithm in [SIGMOD 04] Metric: time/PIO/LIO * Kaushik: On the integration … 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Varying Buffer Size (PC-Path) 7/4/2019 Varying Buffer Size (PC-Path) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Varying Buffer Size (PC-Twig) 7/4/2019 Varying Buffer Size (PC-Twig) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Varying Buffer Size (AD-Path) 7/4/2019 Varying Buffer Size (AD-Path) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Varying Buffer Size (AD-Twig) 7/4/2019 Varying Buffer Size (AD-Twig) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

7/4/2019 Buffer Hit Ratio 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

7/4/2019 Scalability 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Comparing with Other Systems 7/4/2019 Comparing with Other Systems 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Outline Introduction Disk-based F&B Index Experiment Conclusions 7/4/2019 VLDB 2005

Conclusions Disk-based F&B Index 7/4/2019 Conclusions Disk-based F&B Index Store and cluster the index on the disk More efficient and intelligent query processing algorithms Demonstrated good scalability and query efficiency Expecting new query processing algorithms based on index probing (in addition to join-based approaches) 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW

Q&A Thank You! 7/4/2019 VLDB 2005

Related Work Indexes Join-based approaches 7/4/2019 Related Work Indexes Exact: DataGuide, 1-index, F&B Index Approx: Approx. DataGuide, A(k)-index, D(k)-index, M*(k)-index Join-based approaches Hybrid approach: “mixed-mode” in [VLDB 03] Niagara [VLDB 03] combines tree traversals + joins [SIGMOD 04] use 1-index to accelerate joins Clustering Lore: pre-order Natix: subtree 7/4/2019 VLDB 2005 Dr. Wei Wang @ CSE, UNSW