1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Slides:



Advertisements
Similar presentations
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
Chapter 6: Transform and Conquer
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Trees Chapter 8.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Query Processing Presented by Aung S. Win.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Querying Structured Text in an XML Database By Xuemei Luo.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
1 Chapter 10 Trees. 2 Definition of Tree A tree is a set of linked nodes, such that there is one and only one path from a unique node (called the root.
Trees Chapter 8. 2 Tree Terminology A tree consists of a collection of elements or nodes, organized hierarchically. The node at the top of a tree is called.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 4. Trees.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
XML and Database.
XML Access Control Koukis Dimitris Padeleris Pashalis.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
CS4432: Database Systems II Query Processing- Part 2.
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
24 January Trees CSE 2011 Winter Trees Linear access time of linked lists is prohibitive  Does there exist any simple data structure for.
Grouping Robin Burke ECT 360. Outline Extra credit Numbering, revisited Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method.
Grouping Robin Burke ECT 360. Outline Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method Generated ids Keys Moded Templates.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
What is a Tree? Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
CSCE 210 Data Structures and Algorithms
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Database Management System
External Sorting Chapter 13
Sorting.
CS223 Advanced Data Structures and Algorithms
External Sorting Chapter 13
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
External Sorting Chapter 13
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li

2 Our Objective Developing a system that will enable us to perform XML data queries efficiently.

3 XML Queries Languages Used for retrieving data from XML files. Use a regular path expression syntax. e.g. XPath, XQuery.

4 Queries Today - Inefficient Usually XML tree traversals – Inefficient. –Top-Down Approach –Bottom-Up Approach –An example: the query: /chapter/_*/figure (finding all figures in all chapters.)

5 Our Objective - Refined Developing a system that will enable us to perform XML data queries efficiently Developing such a system consists of: –Developing a way to efficiently store XML data. –Developing efficient algorithms for processing regular path expressions (e.g. XQuery expressions).

6 Storing XML Documents - XISS XISS - XML Indexing and Storage System. Provides us with ways to: –efficiently find all elements or attributes with the same name string grouped by document which they belong to. –quickly determine the ancestor-descendant relationship between elements and/or attributes in the hierarchy of XML data hierarchy.

7 Determining Ancestor-Descendent Relationship According to Dietz’s: for two given nodes x and y of a tree T, x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal. Example:

8 Determining Ancestor-Descendent Relationship – cont. Advantage: the ancestor-descendent relationship can be determined in constant time. Disadvantage: a lack of flexibility. –e.g. inserting a new node requires recomputation of many tree nodes.

9 A new numbering scheme: –Each node is associated with a pair: For a tree node y and its parent x: [order(y), order(y) + size(y)]  (order(x), order(x) + size(x)] For two sibling nodes x and y, if x is the predecessor of y in preorder traversal holds: order(x) + size(x) < order(y). Determining Ancestor-Descendent Relationship – cont. exclusive

10 Determining Ancestor-Descendent Relationship – cont. Fact: for two given nodes x and y of a tree T, x is an ancestor of y iff: order(x) < order(y)  order(x) + size(x)

11 Determining Ancestor-Descendent Relationship – cont. Properties: –the ancestor-descendent relationship can be determined in constant time. –flexibility – node insertion usually doesn’t require recomputation of tree nodes. –an element can be uniquely identified in a document by its order value.

12 XISS System Overview

13 Name Index and Value Table Objective: minimizing the storage and computation overhead by eliminating replicated strings and string comparisons. Name Index - mapping distinct name strings into unique name identifiers (nid). Value Table - mapping distinct value strings (i.e. attribute value and text value) into unique value identifiers (vid). Both implemented as a B + -tree.

14 The Element Index Objective: quickly finding all elements with the same name string. Structure:

15 The Attribute Index Objective: quickly finding all elements with the same name string. Structure: –Same structure as the Element Index except that the record in attribute index has a value identifier vid which is a key used to obtain the attribute from the value table.

16 The Structure Index Objectives: –Finding the parent element and child elements (or attributes) for a given element. –Finding the parent element for a given attribute. Structure:

17 The Structure Index – cont. Structure: –B + -tree using document identifier (did) as a key. –Leaf nodes: linear arrays with records for all elements and attributes from an XML document. –Each record: {nid,, Parent order, Child order, Sibling order, Attribute order}. –Records are ordered by order value.

18 Querying Method Decomposing path expressions into simple path expressions. Applying algorithms on simple path expressions and their intermediate results.

19 Decomposition of Path Expressions The main idea: –A complex path expression is decomposed into several simple path expressions. –Each simple path expression produces an intermediate result that can be used in the subsequent stage of processing. –The results of the simple path expressions are than combined or joined together to obtain the final result of the given query.

20 Basic Subexpressions - Example Decomposition of (E 1 /E 2 ) * / E 3 / ((E 4 | (E 5 /_ * /E 6 )): (1) Single Element/Attribute (2) Element-Attribute (3) Element-Element (4) Kleene Closure (5) Union / /_ * / *| [ ]/ / (4) (2) (3) (5) (3) (1)

21 Example: EA-Join: Element and Attribute Join

22 EA-Join: Element and Attribute Join Input: {E 1,…,E m }: E i is a set of elements having a common document identifier ( did ); {A 1,…,A n }: A j is a set of elements having a common document identifier ( did ); Output: A set of (e,a) pairs such that the element e is the parent of the attribute a.

23 EA-Join: Element and Attribute Join The Algorithm: // Sort-merge {E i } and {A j } by did. (1)foreach E i and A j with the same did do: // Sort-merge E i and A j by // PARENT-CHILD relationship (2)foreach e  E i and a  A j do (3)if (e is a parent of a) then output (e,a) end

24 EA-Join – Example Consider the XML document: And the query: Ele Att

25 Sort-merging “Ele”s and “Att”s by parent-child relation ship will give us the list:,,, Finding the elements “Ele”s with a child attribute “Att” with a value “A1” from the accepted list is easy using the information in the Element Record. EA-Join – Querying Ele Att

26 EA-Join – Comments Only a two-stage sort-merge operation without additional cost of sorting: –First merge: by did. –Second merge: by examining parent-child relationship. This merge is based on the order values of the element and attribute as defined by the numbering scheme. Attributes should be placed before their sibling elements in the order of the numbering scheme. –guarantees that elements and attributes with the same did can be merged in a single scan.

27 Conclusions XISS can efficiently process regular path expression queries. Performance improvement over the conventional methods by up to an order of magnitude. Future work: optimal page size or the break-even point between the two criteria.

28 Thank you so much!