VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
1 from the seminar support for non-standard datatypes in dbms Held by Brendan Briody Accelerating XPath Location Steps.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Chapter 3: Data Storage and Access Methods
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
1 General Trees & Binary Trees CSC Trees Previous data structures (e.g. lists, stacks, queues) have a linear structure. Linear structures represent.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Preorder Traversal with a Stack Push the root onto the stack. While the stack is not empty n pop the stack and visit it.
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Storing and Querying Multi-version XML Documents using Durable Node Numbers Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of.
1 Chapter 18 Trees Objective To learn general trees and recursion binary trees and recursion tree traversal.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Data Structures Week 8 Further Data Structures The story so far  Saw some fundamental operations as well as advanced operations on arrays, stacks, and.
Tree Data Structures. Introductory Examples Willliam Willliam BillMary Curt Marjorie Richard Anne Data organization such that items of information are.
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
Starting at Binary Trees
Efficient Complex Query Support For Multi-version XML Documents Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of CS&E UC Riverside.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Session 1 Module 1: Introduction to Data Integrity
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Lessons 10, 11, 12 & 13 McManus COP  Basic Concepts  Sorting Techniques  Stacks  Queues  Records  Linked Lists  Binary Trees McManusCOP10062.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Trees What is a Tree? Tree terminology Why trees? What is a general tree? Implementing trees Binary trees Binary tree implementation Application of Binary.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
TREES General trees Binary trees Binary search trees AVL trees Balanced and Threaded trees.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
1 Native Databases for XML. 2 Store XML as a tree Main Challenge: make querying efficient (recall the difficulties when storing XML as a file) –appropriate.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
What is a Tree? Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
DS.T.1 Trees Chapter 4 Overview Tree Concepts Traversals Binary Trees Binary Search Trees AVL Trees Splay Trees B-Trees.
SUYASH BHARDWAJ FACULTY OF ENGINEERING AND TECHNOLOGY GURUKUL KANGRI VISHWAVIDYALAYA, HARIDWAR.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
CSCE 210 Data Structures and Algorithms
Lecture 1 (UNIT -4) TREE SUNIL KUMAR CIT-UPES.
Efficient processing of path query with not-predicates on XML data
(b) Tree representation
General Trees & Binary Trees
General Trees & Binary Trees
Important Problem Types and Fundamental Data Structures
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Efficient Aggregation over Objects with Extent
Presentation transcript:

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, Vassilis J. Tsotras, Carlo Zaniolo VLDB 2002

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML2 Overview Motivation Problem Description Structural Joins Structural Joins using B+-trees Structural Joins using R-trees Problem Variations Experimental Results

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML3 Motivation (1) Query languages for XML qualify documents for retrieval both by their structure and the values of their elements. Example: section[title=“Overview”]//figure[caption=“R-tree”] (path-expression query)

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML4 Motivation (2) When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. Numbering Schemes Each node is assigned a unique interval. The intervals of a parent node contains the intervals of all its children.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML5 Motivation (2) When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. From path expressions to structural join: two nodes qualify for a path expression query if one is an ancestor of the other. With intervals, this is equivalent to containment.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML6 Problem Description Structural Join: Let A and D be two lists containing the instances of two particular tags in an XML document, join A and D using their containment associations as the join condition. [Al-Khalifa, etc. 2002] proposed non- indexed structural join algorithms. We extend their algorithms to take advantage of existing indices on the two lists.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML7 Structural Joins, no indices 1.Let a, d be the first elements of A and D 2.while (A, D are not empty or the stack is not empty) do 3. if (a.start > stack.top and d.start > stack.top) then 4. stack.pop() 5. else if (a.start < d.start) then 6. stack.push(a) 7. Let a be the next element in A 8. else 9. output d as descendant of all elements in stack 10. let d be the next element in D 11. endif 12.endwhile

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML8 Example a1a1 d1d1 d2d2 a2a2 a4a4 a3a3 d3d3

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML9 Structural Joins using B+-trees Existing structural join algorithms sequentially scan the input lists. Durable numbering schemes have enabled indexing of XML files with mainstream indices. Such indices can result in sub-linear access time as they provide the facility to skip elements that don’t participate in the join.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML10 Motivation for using the B+-tree index (1) a1a1 d1d1 d2d2 a2a2 a 12 a3a3 a4a4 a8a8 a5a5 a9a9 a6a6 a7a7 a 10 a 11

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML11 Motivation for using the B+-tree index (2) d2d2 d3d3 d 13 d4d4 d5d5 d9d9 d6d6 d 10 d7d7 d8d8 d 11 d 12 a1a1 a2a2 d1d1 d 14

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML12 Structural Joins using B+-trees 1.Put pointers a and d at the beginning of lists A and D 2.while ( not at the end of A or D ) do 3. if ( a is an ancestor of d ) then 4. Push into stack all elements in A that are ancestors of d 5. Join d with all elements in stack and let d=d->next 6. else if ( a.start < d.start ) then // jump ancestor A 7. Pop all elements in stack which are before d 8. Move a forward by skipping sub-trees of last element popped 12. else // a is after d; jump descendant D 13. Join d with all elements in stack 14. Move d forward by skipping all D elements with start<a.start

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML13 Containment forest Structure linking elements that belong to the same tag. Each element corresponds to a node in the structure and is linked to other elements via parent, first-child and right-sibling pointers. Can be embedded within the associated B+- tree Improves CPU time

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML14 Containment forest example A (150,250) A (10,500) A (800,900) A (1400,2000) A (300,400) A (830,860) A (1530,1560) A (1700,1800)

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML15 Containment forest properties The (start, end) interval of each node contains all intervals in its subtree. The start numbers in the forest follow a preorder traversal. The start (end) numbers of sibling nodes are in increasing order. Containment forest can be dynamically maintained. Efficient algorithms for element insertion/deletion

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML16 Structural Join using R-trees (1) The interval (start, end) of an element can be mapped to a point (e.start, e.end) in the 2-D space which is then indexed by an R-tree. An R-tree can also be used to index the element (start, end) ranges as 1-D intervals

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML17 Structural Join using R-trees (2) two pointstwo pages

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML18 Problem Variations Self Joins non-indexed algorithm that traverses the element list exactly once Structural Join in a pipelining environment Feedback between modules can help to skip elements that don’t take part in the join

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML19 Performance Analysis (1) Effect of skipping only ancestors in join performance Join Ancestors no-indexB+B+pspB+spR*R*2 90% % % % % %

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML20 Performance Analysis (2) Effect of skipping only descendants in join performance

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML21 Performance Analysis (3) Effect of skipping both ancestors and descendants

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML22 Performance Analysis (4) Comparison of B+-tree and B+psp algorithms

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML23 Conclusions We presented efficient ways to perform structural joins over XML data utilizing existing indices. Experimental results showed that among the indexed approaches, the B+-tree with sibling pointers performs the best. Easily maintainable solution that provided drastic improvement over no-index case.