Download presentation
Presentation is loading. Please wait.
Published byPierre-Louis Ménard Modified over 5 years ago
1
Wei Wang University of New South Wales, Australia
7/4/2019 Efficient Processing of XML Path Queries Using the Disk-based F&B Index Wei Wang University of New South Wales, Australia With Hongzhi Wang (HIT), Hongjun Lu (HKUST), Haifeng Jiang (IBM), Xuemin Lin (UNSW), Jianzhong Li (HIT) Dr. Wei CSE, UNSW
2
XML Query Processing XML Query by structural constraint
7/4/2019 XML Query Processing XML Modeled as a labeled tree Query by structural constraint Simple Path Queries, e.g., //Customer//Name Branching/Twig Queries, e.g., //Customer[//Zipcode]//Name 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
3
Index or Join? Index-based approaches Join-based approaches
7/4/2019 Index or Join? Q1: /a/b Index-based approaches DataGuide, 1-index F&B Index and a few approximate indexes Join-based approaches Structural join Twig join a b b a Also hybrid approach, e.g., MIXED mode paper from wisc in VLDB 2003. If XML is a tree, all those indexes are trees. b Join-based approaches appear to be more actively researched! 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
4
Outline Introduction Disk-based F&B Index Experiment Conclusions
7/4/2019 VLDB 2005
5
XML Structural Indexes
7/4/2019 XML Structural Indexes “Exact” Indexes 1-index Based on backward bisimilarity Covers all simple path queries F&B Index Based on backward and forward bisimilarity Covers all branching queries (optimally) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
6
A Running Example extent Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b}
7/4/2019 A Running Example Q1: /a/b Q2: /a/b[d] Q3: /a/b[c][d] {b, b, b} extent F&B is refined from 1-index 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
7
Problems with F&B Index?
7/4/2019 Problems with F&B Index? Lack of scalability Usually large in practice No immediate solution when it cannot be accommodated in memory Unbalanced, all-leaf-nodes tree Naïve solutions (e.g., B+-tree, pre-order clustering in Lore, subtree clustering in Natix) do not work well Lack of efficiency Non-deterministic searching //-axis requires traversing the whole subtrees Much more costly when the index is not in the memory 100M XMark, 2M doc nodes 0.5 million F&B nodes if treated as a tree. 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
8
Outline Introduction Disk-based F&B Index Experiment Conclusions
7/4/2019 VLDB 2005
9
7/4/2019 Disk-based F&B Index Overcome the memory limit by putting F&B index to the disk Naïve method does not work well For this query, need to touch all the pages + random I/O Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
10
Basic Idea Moral: Clustering is important Cluster by tag tape
7/4/2019 Basic Idea Moral: Clustering is important Cluster by tag tape Cluster by parent segment & block Cluster by 1-index ID chunk Benefits: Optimized tree traversals Enable other intelligent algorithms 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
11
7/4/2019 Q1: /a/b 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
12
7/4/2019 Q.P. by Tree Traversal Dim 1: DFS/BFS Dim 2: Path/Branching Path Dim 3: / or // Q5: /a/b/c Q2: /a/b[d] Q4: /a//c Problem: Still have to traverse the entire subtrees to process // 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
13
Q.P. by RangeFetch H(1, c) = [3, 6]
7/4/2019 Q.P. by RangeFetch H(1, c) = [3, 6] (chunkID, tagName) Q4: /a//c Restriction: Can only answer /p//q, where p is a simple path. 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
14
More Data Structures 3 more tapes:
7/4/2019 More Data Structures 3 more tapes: Add region code for each d-node in the extents Extents Tape Use physical (start, end) codes Sort d-nodes according to (start, end) Add Doc Tape Add Value Tape 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
15
7/4/2019 Example 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
16
SegSJ Key observation: SegSJ(/p//q)
7/4/2019 SegSJ Key observation: Structural relationship between two segments can be inferred from the relationship between their first d-nodes in their extent. SegSJ(/p//q) R(s, e) A = /p S(s, e) D = //q Structural join R and S Using partition-based or sorting-based SJ algorithm b1 (10,78), (210, 297), … d1 (19,25), (54, 66), … Take the (s, e) of the first d-node in each segment 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
17
Outline Introduction Disk-based F&B Index Experiment Conclusions
7/4/2019 VLDB 2005
18
Experiments Setup DBLP/XMark/TreeBank 8 representative queries
7/4/2019 Experiments Setup DBLP/XMark/TreeBank 8 representative queries Dim 1: PC/AD Dim 2: Path/Twig Dim 3: Large/Small DFS, BFS, RangeFetch, SegSJ NoK, TwigStack, Kaushik’s algorithm in [SIGMOD 04] Metric: time/PIO/LIO * Kaushik: On the integration … 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
19
Varying Buffer Size (PC-Path)
7/4/2019 Varying Buffer Size (PC-Path) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
20
Varying Buffer Size (PC-Twig)
7/4/2019 Varying Buffer Size (PC-Twig) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
21
Varying Buffer Size (AD-Path)
7/4/2019 Varying Buffer Size (AD-Path) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
22
Varying Buffer Size (AD-Twig)
7/4/2019 Varying Buffer Size (AD-Twig) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
23
7/4/2019 Buffer Hit Ratio 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
24
7/4/2019 Scalability 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
25
Comparing with Other Systems
7/4/2019 Comparing with Other Systems 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
26
Outline Introduction Disk-based F&B Index Experiment Conclusions
7/4/2019 VLDB 2005
27
Conclusions Disk-based F&B Index
7/4/2019 Conclusions Disk-based F&B Index Store and cluster the index on the disk More efficient and intelligent query processing algorithms Demonstrated good scalability and query efficiency Expecting new query processing algorithms based on index probing (in addition to join-based approaches) 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
28
Q&A Thank You! 7/4/2019 VLDB 2005
29
Related Work Indexes Join-based approaches
7/4/2019 Related Work Indexes Exact: DataGuide, 1-index, F&B Index Approx: Approx. DataGuide, A(k)-index, D(k)-index, M*(k)-index Join-based approaches Hybrid approach: “mixed-mode” in [VLDB 03] Niagara [VLDB 03] combines tree traversals + joins [SIGMOD 04] use 1-index to accelerate joins Clustering Lore: pre-order Natix: subtree 7/4/2019 VLDB 2005 Dr. Wei CSE, UNSW
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.