Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.

Slides:



Advertisements
Similar presentations
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Advertisements

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Trees Rosen Chapter 9 (page 631 onwards).
Trees1 More on Trees University Fac. of Sci. & Eng. Bus. School Law School CS Dept. EE Dept. Math. Dept.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Presentation for Cmpe-521 VIST – Virtual Suffix Tree Prepared by: Evren CEYLAN – Aslı UYAR
Chapter 15 B External Methods – B-Trees. © 2004 Pearson Addison-Wesley. All rights reserved 15 B-2 B-Trees To organize the index file as an external search.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
Chapter 18 - basic definitions - binary trees - tree traversals Intro. to Trees 1CSCI 3333 Data Structures.
Trees CS /02/05 L7: Trees Slide 2 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved Definition.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Trees Chapter 8. 2 Tree Terminology A tree consists of a collection of elements or nodes, organized hierarchically. The node at the top of a tree is called.
COSC2007 Data Structures II
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Represented by: Ai Mu Based on the paper written by Ning Zhang, Varun.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Ceng-112 Data Structures I 1 Chapter 7 Introduction to Trees.
Saturday, 04 Apr 2010 University of Palestine Computer Science II Trees.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
COMP20010: Algorithms and Imperative Programming Lecture 1 Trees.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Compiled by: Dr. Mohammad Omar Alhawarat
Trees Chapter 8. 2 Tree Terminology A tree consists of a collection of elements or nodes, organized hierarchically. The node at the top of a tree is called.
Starting at Binary Trees
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Introduction to Trees IT12112 Lecture 05 Introduction Tree is one of the most important non-linear data structures in computing. It allows us to implement.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
24 January Trees CSE 2011 Winter Trees Linear access time of linked lists is prohibitive  Does there exist any simple data structure for.
CSE3201/CSE4500 XPath. 2 XPath A locator for items in XML document. XPath expression gives direction of navigation.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
TREES General trees Binary trees Binary search trees AVL trees Balanced and Threaded trees.
XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Trees A non-linear implementation for collection classes.
What is a Tree? Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
DS.T.1 Trees Chapter 4 Overview Tree Concepts Traversals Binary Trees Binary Search Trees AVL Trees Splay Trees B-Trees.
Chapter 10 Trees © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
CSCE 210 Data Structures and Algorithms
Efficient Filtering of XML Documents with XPath Expressions
External Methods Chapter 15 (continued)
CS223 Advanced Data Structures and Algorithms
OrientX: an Integrated, Schema-Based Native XML Database System
XML Query Processing Yaw-Huei Chen
CS210- Lecture 9 June 20, 2005 Announcements
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Introduction to Trees Chapter 6 Objectives
Presentation transcript:

Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.

Querying XML XML has tree structured data model. Queries involve navigating data using regular path expressions.(e.g., XPath) e.g. Frogs”] Accessing all elements with same name string. Ancestor-descendant relationship between elements.

Contribution New system for Indexing XML data. Querying XML data based on a numbering scheme for elements Join algorithms for processing complex regular path expressions.

Outline Numbering scheme Index structure Join algorithms Experimental results

Path expression evaluation Previous approaches Conventional tree traversals  Disadvantage: Overhead of traversing for long or unknown path lengths. New approach Indexing for efficient element access. Numbering scheme for ancestor- descendant relationship.

Dietz’s Numbering Scheme for two given nodes x and y, x is an ancestor of y, if and only if x occurs before y in the preorder traversal of T and after y in postorder traversal. (1,7) (2,4) (3,1) (4,2) (5,3) (6,6) (7,5)

Proposed numbering scheme This associates with each node a pair of numbers as follows: For a tree node y and its parent x, order(x) < order(y) order(y)+size(y) =< order(x) + size(x) For two sibling nodes x and y, if x is the predecessor of y in preorder traversal then order(x) + size(x) < order(y) (1,100) (10,30) (11,5) (17,5) (25,5) (41,10) (45,5)

Advantages Efficient Updates Extra space can be reserved to accommodate future insertions.

Ancestor–descendant relationship For two given nodes x and y of a tree T, x is an ancestor of y if and only if order(x) < order(y) =< order(x) + size(x).

Outline Numbering scheme Index structure Join algorithms Experimental results

Index and Data Organization XML Raw Data Document Loader Element Index Attribute Index Structure Index Name Index Value Table Paged File Query Processor Query XISS Result

Element Index Element nid Document ID list Element list with the Same name in the Same Document B+-tree Depth, Parent ID Element Record Element nid B+-tree

Structure Index Document ID (did) Array of All Elements And Attributes in the Same Document nid,, Parent order, Child order, Sibling order, Attribute order B+-tree

Outline Numbering scheme Index structure Join algorithms Experimental results

Regular Path expression complex regular path expressions. e.g., Frogs”] SymbolFunction of symbol __Any single node /Union of node *Zero or more occurrences of a attributes

Regular expression Decomposition A regular path expression can be decomposed to a combination of following basic subexpressions: 1. A subexpression with a single element or a single attribute, 2. A subexpression with an element and an attribute ( e.g., = “Tree Frogs”]) 3. A subexpression with two elements (e.g., chapter/figure or chapter/_*/figure), 4. A subexpression with a Kleene closure (+,*) of another subexpression, and 5. A subexpression that is a union of two other subexpressions.

Example ( E1 / E2 ) * / E3 / ( ( E4 = v ] ) | ( E5 / _* / E6 ) ) * [ ] E1 / / / / /_*/EE-Join KC-Join EE-Join Union EA-JoinEE-Join

Join algorithms Element – Attribute join Element – Element join Kleene – Closure join

EA-Join Algorithm Input: {E1..Em}: Ei is a set of elements having a common document identifier; {A1..An}: Aj is a set of attributes having a common document identifier; Output: A set of (e,a) pairs such that the element e is the parent of the attribute a. //Sort-merge {Ei} and {Aj} by document identifier. For each Ei and Aj with the same did do //Sort-merge Ei and Aj by PARENT-CHILD relationship. For each e in Ei and a in Aj do If ( e is a parent of a) then output (e,a); End End.

Example chapter appendix Figure book

Attribute-element position chapter name chapter name chapter name

EE-Join Algorithm Input: {E1..Em} and {F1..Fn}: Ei and Fj is a set of elements having a common document identifier. Output: A set of (e,f) pairs such that the element e is an ancestor of the element f. //Sort-merge {Ei} and {Fj} by doc. identifier. For each Ei and Fj with the same did do //Sort-merge Ei and Fj by ANCESTOR-DESCENDANT relationship. For each e in Ei and f in Fj do If (e is an ancestor of f ) then output (e,f) End

Extreme case of EE-Join chapter figure

KC-Join Algorithm Input: {E1..Em}: where Ei is a group of elements from an XML document. Output: A Kleene Closure of {E1..Em} //Apply EE-Join algorithm repeatedly. Set x = 1; Set Ki = {E1..Em}; Repeat Set I = I +1; Set Ki = EE-Join(Ei-1, E1); Until ( Ki is empty); Output union of K1,K2..Ki-1.

Outline Numbering scheme Index structure Join algorithms Experimental results

Experiment Results Comparison with top-down and bottom- up evaluation methods. Comparison for EE-Join ( E1 /_*/ E2 ) EA-Join ( ) Scalability test

EE-Join performance

EA-Join performance

Results EE-Join algorithm outperformed bottom-up. EA-Join algorithm is comparable with top-down but outperformed bottom-up. Both are linearly scalable.