QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.

Slides:



Advertisements
Similar presentations
APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
Advertisements

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Presentation for Cmpe-521 VIST – Virtual Suffix Tree Prepared by: Evren CEYLAN – Aslı UYAR
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
Important Problem Types and Fundamental Data Structures
Chapter 18 - basic definitions - binary trees - tree traversals Intro. to Trees 1CSCI 3333 Data Structures.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Querying Structured Text in an XML Database By Xuemei Luo.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 4. Trees.
Starting at Binary Trees
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
CE 221 Data Structures and Algorithms Chapter 4: Trees (Binary) Text: Read Weiss, §4.1 – 4.2 1Izmir University of Economics.
Introduction to Trees IT12112 Lecture 05 Introduction Tree is one of the most important non-linear data structures in computing. It allows us to implement.
Discrete Mathematics Chapter 5 Trees.
© University of Auckland Trees – (cont.) CS 220 Data Structures & Algorithms Dr. Ian Watson.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
24 January Trees CSE 2011 Winter Trees Linear access time of linked lists is prohibitive  Does there exist any simple data structure for.
1 Trees General Trees  Nonrecursive definition: a tree consists of a set of nodes and a set of directed edges that connect pairs of nodes.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
TREES General trees Binary trees Binary search trees AVL trees Balanced and Threaded trees.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 4. Trees.
CSCE 210 Data Structures and Algorithms
Top 50 Data Structures Interview Questions
A Simple Syntax-Directed Translator
Database Management System
CS223 Advanced Data Structures and Algorithms
Heaps and the Heapsort Heaps and priority queues
CE 221 Data Structures and Algorithms
CS210- Lecture 9 June 20, 2005 Announcements
Database Systems (資料庫系統)
Important Problem Types and Fundamental Data Structures
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Presentation transcript:

QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA

Need for this paper XML – emerged as a popular standard for data representation and data exchange on the Internet XML Query Languages use Regular Path Expressions to query the data Conventional approaches (for indexing & searching this data) based on Tree traversals goes for a toss! – under heavy access requests  Traversing this hierarchy of XML data becomes a overhead if the path lengths are long or unknown What can be done???

Try our System and the Algorithms !!! New system for indexing & storing XML data – XISS  New numbering scheme for elements and attributes  Quick in figuring-out ‘ancestor-descendant’ relationship  New index structures  Easier to find all elements and attributes with a particular given name string Join algorithms for processing Reg-Path-Exp queries  EE-Join – to search paths from element to element  EA-Join – to find element-attribute pairs  KC-Join – to find KC (*) on repeated paths or elements

Go XISS!!! In general, XML data can be queried for a particular value (or) a structure By Value: get me “document”; get me “element=‘node1’ ” or “attribute=10” By Structure: get me parent and child elements/attributes for a given element Components:  Index Structure: element, attribute and structure (index)  Data Loader  Query Processor Numbering Scheme first…..

Deitz vs. Li-Moon… Deitz says, “If x and y are the nodes of a tree T, x is an ancestor of y iff x comes before y when I climb down the tree (pre-order), and after y when I climb up (post-order)” and shows us his scheme, Ancestor-Descendant relationship determination in constant time Li-Moon says, “but this lacks flexibility” This leads to many re-computations when a new node is inserted. Hmm… let us check-out Li-Moon’s….

Li-Moon’s Numbering… Hey folks, we are going to extend this preorder and cover up a range of descendants Just associate a pair of numbers with each node Parent node x says to its child node y, “I came before you so my order is less than yours & my size is >= (your order + your size) and so your interval is always contained in my interval” If there are siblings x & y (same parent), say, x is before y, then order(x) + size(x) < order(y)

Voila! Here it goes, So, for any node x, size(x) >= size of all its direct children [ size(x) is Laarrrge!] That being said, “Given nodes x and y of a tree T, x is an ancestor of y iff order(x) < order(y) <= order(x) + size(x)

Good news! Easy accommodation of future insertions – more flexible Global reordering not necessary until no more reserved spaces order in pair is an unique identifier for each element and attribute in the document Attribute nodes are placed before their sibling elements in the order – why? How this scheme helps? – wait till the algorithms! Switching back to XISS…

Internals of XISS Index Structure Overview

More structures… Element Index Structure Index

Path Join Algorithms Conventional approaches (top down, bottom up and hybrid traversals) – not effective Main Idea of proposed algorithm: For a given query “chapter/-*/figure”, - find all ‘chapter’ elements - find all ‘figure’ elements - join the qualified ‘chapter-figure’ pairs without traversing XML data trees (if ancestor- descendant relationship is obtained quickly)

Complex -> Simple Complex path expression decomposed to many simple path expressions Intermediate results are joined to get the final result. Different types of sub-expressions

EA-Join Algorithm To join intermediate results from sub-expressions with a list of elements and a list of attributes E.g. Attributes should be placed before sibling elements in the order by the numbering scheme

EA-Join Algorithm Input: List of “figure” elements and List of “caption” attributes grouped by documents Steps: (2 stages)  Element sets and attribute sets merged by doc. Id (single scan)  Elements and attributes are merged by figuring out the parent- child relationship using value (single scan) Output: A set of (e, a) pairs where e is the parent of a

EE-Join Algorithm To join intermediate results each of which is a list of elements from a sub-expression E.g. “chapter/-*/figure” Input: List of “chapter” elements and List of “figure” elements Steps (2 stages) are similar to EA-Algorithm  Both element sets are merged by doc. Id (single scan)  Chapter element and Figure element are merged by finding the ancestor-descendant relationship using values Output: A set of (e, f) pairs where e is the ancestor of f

EE-Algorithm The second stage cannot be done in a single scan In this E.g., a “figure” element can be descendant of more than one “chapter” element (see book1.xml) order(figure) will lie in more than one chapter interval ([order(chapter), order(chapter) + size(chapter)]) This multiple-times scan is still highly effective in searching long or unknown length paths when compared to the conventional tree traversals.

KC-Algorithm Processes a regular path expression with zero, one or more occurrences of a subexpression E.g. “chapter*”, “chapter+” Input: Set of elements from an XML document Steps:  In each stage applies EE-Algorithm to previous stage’s result  Repeat until no change in result Output: Kleene Closure of all elements in the given input set

Experiments..   Prototype of XISS was implemented Query Interface – C++; Parse XML – Gnome XML Parser; B+-Tree - GiST C++ Library Workstation:  Sun Ultrasparc-II running on Solaris 2.7  RAM: 256 MB; Hard-disk: 20GB Data Sets  Shakespeare’s Plays  SIGMOD Record  NITF100 and NITF1

Performance Comparison EE-Join Query:  Outperformed bottom-up method by a wide margin  Real-World data set: an order of magnitude faster  Synthetic data set: 6 to 10 times faster  Disk IO was a dominant Cost factor – 60% to 90% of total elapsed time EA-Join Query:  It was comparatively better than top-down and bottom-up approaches KC-Join Query:  Performance was not measured; dependent on EE’s performance

THE END! Hope this presentation was useful THANKS!