Download presentation
Presentation is loading. Please wait.
Published byJunior Elliott Modified over 9 years ago
1
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva, November 2003
2
Overview Modeling XML Queries Structure-encoded sequences Indexing ViST Experimental Results
3
Modeling XML Queries
4
DTD of purchase records: (!ELEMENT purchases (purchase*)) (!ELEMENT purchase (seller, buyer)) (!ATTRIST seller ID ID location CDATA name CDATA) (!ELEMENT seller (item*)) (!ATTRIST buyer ID ID location CDATA name CDATA) (!ELEMENT item (item*)) (!ATTRIST item name CDATA manufacturer CDATA)
5
Modeling XML Queries Focus in XML query language design: ability to express complex structural or graphical queries
6
Modeling XML Queries Querying XML data = finding sub structures of the data graph that match the sequence Structure-encoded sequences: a sequential representation of both XML data and XML queries
7
Structure-Encoded Sequences
8
Maps the data and the queries Matches the subsequence Purpose: to avoid as many join operations as possible Def. Sequence of (symbol, prefix) pairs
9
Mapping Data Represent XML document/tree in preorder Represent in structure-encoded seq
10
Mapping Queries Benefit of sequence matching: query gets processed as whole Path Expression
11
Structure-Encoded Sequences Query Data
12
Querying XML through Structure-Encoded Sequence Matching
13
Indexing
14
Role of Indexing To provide an algorithm to perform this sequence matching Desired features for algorithm: –Efficient support for subsequence matching –Use well-supported DB indexing techniques such as B+ trees –Allow dynamic index insertion
15
What is indexing useful for Auxiliary access structures –Used to speed up the retrieval of records –In response to certain search conditions Provide efficient support for arbitrary structured queries –Using wild-cards // and *
16
Indexing State-of the-art approaches –Indexes on paths –Indexes on nodes –Indexes on both (structures) – ViST
17
ViST
18
Algorithms Naïve Algorithm based on Suffix Trees RIST: Relationships Indexed Suffix Tree ViST: Virtual Suffix Tree
19
Algorithm Using Suffix Trees Suffix Tree: a compact index to all distinct, contiguous substrings of a string D-Ancestorship – in XML doc tree Through structure-encoded sequence S-Ancestorship – in suffix tree
20
Example Using Suffix Trees
21
Algorithm Using Suffix Trees Searches –first by S-Ancestorship: searching under suffix tree –then by D-Ancestorship: matching nodes and prefixes Disadvantages: –Costly – traverse large portion of subtree –Most commercial DBMSs do not support
22
RIST: Indexing by Ancestor- Descendant Relationships Jumps directly to the nodes Y to which X is both a D-Ancestor and S-Ancestor Index Construction: uses B+ trees
23
RIST: Indexing by Ancestor- Descendant Relationships Subsequence Matching Determine D-Ancestorship by prefixes Determine S-Ancestorship by label x – suffix tree node (root of S-tree) nx – prefix traversal order sizex – number of descendants
24
ViST: the Virtual Suffix Tree Same sequence algorithm as RIST BUT supports dynamic insertions Uses dynamic method to assign labels Once assigned, the labels are fixed and are not affected by subsequent data insertion or deletion Labeling the suffix tree w/o building it Relies on statistical information about the XML data
25
ViST: the Virtual Suffix Tree Index structure contains the sequence: Sequence to be inserted: Dynamic scope of x =
26
ViST: the Virtual Suffix Tree
27
Experimental Results Datasets used –DBLP: CS bibliography DBDBLP 289,627 records/publications Each publication – tree of max depth 6 Avg length of structure-encoded seq = 31 –XMARKXMARK 1 record Complicated tree structure –Synthetic
28
Experimental Results Comparison Methods –Index Fabric Algorithm – XML paths –XISS – uses nodes as basic query unit –ViST – appx. 1/10 of time to perform queries due to (multiple) join operations
29
Experimental Results - remove Index Structure and Size (1/3 less from suffix tree) –DocId B+ Tree – N elements –Combined D-ancestor and S-ancestor B+ tree - N x L elements Index Construction
30
Conclusion XML Queries = Subsequence Matching Advantages of ViST – algorithm for subsequence matching –Avoids expensive join operations –Index on both content and structure of XML documents –B+ trees – supported by disk-based data –Dynamic data insertion and deletion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.