Wei Wang University of New South Wales, Australia

Wei Wang University of New South Wales, Australia
7/4/2019 Efficient Processing of XML Path Queries Using the Disk-based F&B Index Wei Wang University of New South Wales, Australia With Hongzhi Wang (HIT), Hongjun Lu (HKUST), Haifeng Jiang (IBM), Xuemin Lin (UNSW), Jianzhong Li (HIT) Dr. Wei CSE, UNSW

XML Query Processing XML Query by structural constraint
7/4/2019 XML Query Processing XML Modeled as a labeled tree Query by structural constraint Simple Path Queries, e.g., //Customer//Name Branching/Twig Queries, e.g., //Customer[//Zipcode]//Name 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Index or Join? Index-based approaches Join-based approaches
7/4/2019 Index or Join? Q1: /a/b Index-based approaches DataGuide, 1-index F&B Index and a few approximate indexes Join-based approaches Structural join Twig join a b b a Also hybrid approach, e.g., MIXED mode paper from wisc in VLDB 2003. If XML is a tree, all those indexes are trees. b Join-based approaches appear to be more actively researched! 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Outline Introduction Disk-based F&B Index Experiment Conclusions
7/4/2019 VLDB 2005

XML Structural Indexes
7/4/2019 XML Structural Indexes “Exact” Indexes 1-index Based on backward bisimilarity Covers all simple path queries F&B Index Based on backward and forward bisimilarity Covers all branching queries (optimally) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

A Running Example extent Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b}
7/4/2019 A Running Example Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b} extent F&B is refined from 1-index 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Problems with F&B Index?
7/4/2019 Problems with F&B Index? Lack of scalability Usually large in practice No immediate solution when it cannot be accommodated in memory Unbalanced, all-leaf-nodes tree Naïve solutions (e.g., B+-tree, pre-order clustering in Lore, subtree clustering in Natix) do not work well Lack of efficiency Non-deterministic searching //-axis requires traversing the whole subtrees Much more costly when the index is not in the memory 100M XMark, 2M doc nodes  0.5 million F&B nodes if treated as a tree. 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 VLDB 2005

7/4/2019 Disk-based F&B Index Overcome the memory limit by putting F&B index to the disk Naïve method does not work well For this query, need to touch all the pages + random I/O Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Basic Idea Moral: Clustering is important Cluster by tag  tape
7/4/2019 Basic Idea Moral: Clustering is important Cluster by tag  tape Cluster by parent  segment & block Cluster by 1-index ID  chunk Benefits: Optimized tree traversals Enable other intelligent algorithms 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 Q.P. by Tree Traversal Dim 1: DFS/BFS Dim 2: Path/Branching Path Dim 3: / or // Q5: /a/b/c Q2: /a/b[d] Q4: /a//c Problem: Still have to traverse the entire subtrees to process // 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Q.P. by RangeFetch H(1, c) = [3, 6]
7/4/2019 Q.P. by RangeFetch H(1, c) = [3, 6] (chunkID, tagName) Q4: /a//c Restriction: Can only answer /p//q, where p is a simple path. 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

More Data Structures 3 more tapes:
7/4/2019 More Data Structures 3 more tapes: Add region code for each d-node in the extents  Extents Tape Use physical (start, end) codes Sort d-nodes according to (start, end) Add Doc Tape Add Value Tape 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 Example 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

SegSJ Key observation: SegSJ(/p//q)
7/4/2019 SegSJ Key observation: Structural relationship between two segments can be inferred from the relationship between their first d-nodes in their extent. SegSJ(/p//q) R(s, e)  A = /p S(s, e)  D = //q Structural join R and S Using partition-based or sorting-based SJ algorithm b1  (10,78), (210, 297), … d1  (19,25), (54, 66), … Take the (s, e) of the first d-node in each segment 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 VLDB 2005

Experiments Setup DBLP/XMark/TreeBank 8 representative queries
7/4/2019 Experiments Setup DBLP/XMark/TreeBank 8 representative queries Dim 1: PC/AD Dim 2: Path/Twig Dim 3: Large/Small DFS, BFS, RangeFetch, SegSJ NoK, TwigStack, Kaushik’s algorithm in [SIGMOD 04] Metric: time/PIO/LIO * Kaushik: On the integration … 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Varying Buffer Size (PC-Path)
7/4/2019 Varying Buffer Size (PC-Path) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Varying Buffer Size (PC-Twig)
7/4/2019 Varying Buffer Size (PC-Twig) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Varying Buffer Size (AD-Path)
7/4/2019 Varying Buffer Size (AD-Path) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Varying Buffer Size (AD-Twig)
7/4/2019 Varying Buffer Size (AD-Twig) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 Buffer Hit Ratio 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 Scalability 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Comparing with Other Systems
7/4/2019 Comparing with Other Systems 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

7/4/2019 VLDB 2005

Conclusions Disk-based F&B Index
7/4/2019 Conclusions Disk-based F&B Index Store and cluster the index on the disk More efficient and intelligent query processing algorithms Demonstrated good scalability and query efficiency Expecting new query processing algorithms based on index probing (in addition to join-based approaches) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Q&A Thank You! 7/4/2019 VLDB 2005

Related Work Indexes Join-based approaches
7/4/2019 Related Work Indexes Exact: DataGuide, 1-index, F&B Index Approx: Approx. DataGuide, A(k)-index, D(k)-index, M*(k)-index Join-based approaches Hybrid approach: “mixed-mode” in [VLDB 03] Niagara [VLDB 03] combines tree traversals + joins [SIGMOD 04] use 1-index to accelerate joins Clustering Lore: pre-order Natix: subtree 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW

Wei Wang University of New South Wales, Australia

Similar presentations

Presentation on theme: "Wei Wang University of New South Wales, Australia"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wei Wang University of New South Wales, Australia

Similar presentations

Presentation on theme: "Wei Wang University of New South Wales, Australia"— Presentation transcript:

Similar presentations

About project

Feedback