TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs –

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

XML: Extensible Markup Language
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
XPath Laks V.S. Lakshmanan UBC CPSC 534B. Overview data model recap XPath examples some advanced features summary.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
A Graphical Environment to Query XML Data with XQuery
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
XSL Concepts Lecture 7. XML Display Options What can XSL Transformations do? generation of constant text suppression of content moving text (e.g., exchanging.
1 COS 425: Database and Information Management Systems XML and information exchange.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Rutgers University Relational Algebra 198:541 Rutgers University.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
2005rel-xml-iii1  View forests and query composition The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)
Introduction to XQuery Resources: Official URL: Short intros:
XML-QL A Query Language for XML Charuta Nakhe
1 XML-KSI, 2004 XML- : an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
Extracting Relations from XML Documents C. T. Howard HoJoerg GerhardtEugene Agichtein*Vanja Josifovski IBM Almaden and Columbia University*
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
Querying Structured Text in an XML Database By Xuemei Luo.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Lecture 6: XML Query Languages Thursday, January 18, 2001.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
TAX: A Tree Algebra for XML Reference: Jagadish et al. DBPL 2001.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Advanced Relational Algebra & SQL (Part1 )
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Grouping Robin Burke ECT 360. Outline Extra credit Numbering, revisited Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
Querying Large XML Data Hsuan-Heng, Wu Shawn Ju. XML V.S. HTML XML is designed to describe data XML don’t use predefined tags XML is used to exchange.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Querying XML and Semistructured Data
Holistic Twig Joins: Optimal XML Pattern Matching
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra 1.
The Relational Algebra and Relational Calculus
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Semi-Structured data (XML Data MODEL)
Structure and Content Scoring for XML
XML Query Processing Yaw-Huei Chen
Implementation of Relational Operations
Structure and Content Scoring for XML
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs – Research Univ. of Michigan Work supported by NSF and NSERC.

Overview Why an algebra for XML? Main challenges Data model Patterns & Witnesses Tree Value Functions Some Example Operators Translation Example – XQuery

Overview (contd.) Main Results Optimization Examples Implementation Summary & Future Work

Why an Algebra (for XML)? (aka Related Work) Bulk algebra for tree manipulation – efficient implementation of XML queries Algebra for manipulating trees (has been attempted before) Feature algebras – linguistics; efficient implementation? Grammar-based algebra for trees [Tompa+ 87, Gyssens+ 89] Aqua project [Zdonik+95]

Why XML algebra? [Related work] (contd.) GraphLog, Hy+ [Consens+90], GOOD [Paradaens+92] – cannot exploit special properties of trees (e.g., support for arbitrary recursion vs. ancestors, order) SS data – Lorel [Abiteboul+ 96], UnQL [Buneman+ 96]. XML algebras – [Beech+ 99], [Fernandez+ 00] (mainly type system issues), [Christofidis+ 00] (trees  tuples), [Ludascher+ 00] (nodes, not trees), SAL [Beeri+ 99] (ordered lists of nodes)

Why? (contd.) be close to relational model, but direct support for (collections of) trees express at least RA + aggregation capture substantial fragment of XQuery admit efficient implementation and effective query optimization

Main Chellanges Capture rich variety of manipulations in a simple algebra Handle heterogeneity in tree collections structure “schema” of nodes of the same “type” Handle order (documents are ordered) sometimes important (e.g., author list) sometimes not (e.g., publisher vs. authors)

Data Model Data tree = rooted ordered tree Data in node = set of attr-val pairs Special attribute: pedigree – where did I come from? “doc id + offset in doc”. preserved for (copies of) original nodes thru manipulations. play important role in grouping, sorting, etc. null for new nodes. Collections (of trees) – unordered.

Patterns & Witnesses first challenge: how do you get at nodes and/or attributes? our solution: patterns – enable specification of parameters for most operations only show parts of interest: Need not know/care about entire structure of trees in collection

Patterns & Witnesses (contd.) Example P1: $1 $2$3 pcad $1.tag = book & $2.tag = year & $2.content < 2000 & $3.tag = author  Structural part  Condition part Additional parameters possible: e.g., selection/projection lists, grouping, ordering, etc. pc = direct ad = transitive

Patterns & Witnesses (contd.) What does a pattern do for you? generate witnesses against i/p collection one for each matching of pattern against i/p conditions must be respected (sub)structure preserved in o/p e.g., witness trees for pattern P1 – one tree for each author of each book published before 2000, showing year & author book-author link may be transitive in i/p but is necessarily direct in o/p source trees = trees witnesses “came from”

Tree Value Functions (TVF) What are they? Primitive recursive functions on structure of source trees Where are they used? grouping, ordering, aggregation, etc. Here is an example: f: T  value of author, number of authors, tuple of authors, {author tuple, title}, etc. Complete example coming up …

Example Database bib book author name firstlastmid deg name title year firstlast 1910 Principia Mathematica AlfredNorthWhitehead BertrandRussel Sc.D., FRS M.A., FRS author name Panini Ashtadhyayi (First book on Sanskrit Grammar) year 560 BC

Example Operators – Selection Input: collection; parameters: pattern, selection list (pattern nodes) Example pattern P1 and empty SL: same witness trees as before pattern P1 with SL = {$1}: whole book subtrees (i.e. retain $1’s descendants) One-zero/more op in general Could retain other “relatives” instead (e.g., siblings)

Selection with P1 (empty SL) book authoryear 1910 author year 560 BC book year author Whole author subtree included when SL = {$3}. 1910

Example operators – Projection Input: collection; parameters: pattern, projection list Example Pattern P1 w/ PL = {$1, $2, $3}: one tree for each book published before 2000, showing year and author(s) Pattern P1 w/ PL = {$3}: one tree for each author of aforementioned books `*’ in PL causes descendants to be retained One-zero/more op (for reasons diff. from select)

Projection: P1 w/ PL = {$1,$2,$3} book author year 1910 author year 560 BC With $3*, can include whole author subtrees.

Selection vs. Projection Example FOR $b IN document(“doc.xml”)//book FOR $y IN $b/year[data() $y $a versus FOR $b IN document(“doc.xml”)//book[/year/data() $b/year $b/author  selection projection 

Example operators – grouping Input: collection; parameters: pattern, grouping TVF, ordering TVF. Example input: collection of books pattern: $1 $2$3 $4 $1.tag = book & $2.tag = title & $3.tag = author & $4.tag = name f_g(T) = “$4.content” f_o(T) = “$2.content” pc ad pc

Grouping (contd.) Here is what the o/p looks like: -- books ordered by title in each group … tax_group_root tax_group_basistax_group_subroot author book

Other operators Derived operators – various joins. Set operations: When are two data trees the “same”? Equality (shallow/deep) vs. isomorphism (include pedigree or not?) Multiset versions of operators Aggregation, Reordering, Renaming.

Translation Examples – XQuery FOR $b IN RETURN $b/title IF SOME $a IN $b//author SATISFIES $a/data() = “divesh” THEN $b/author

XQuery Translation (contd.) Pre-IF part E: select w/ then project w/ $1 $2 $1.tag=book & $2.tag=author & $2.hobby=tennis SL = $1* $3 $4 $3.tag=book & $4.tag=title PL = $3, $4 $3 $4 $3.tag=book & $4.tag=title PL = $3, $4

XQuery Translation (contd.) IF part F: select w/ then project w/ $5 $6 $5.tag=book & $6.tag=author & $6.content = divesh SL = $5* $7 $8 $7.tag=book & $8.tag=author PL = $7, $8

XQuery Translation (contd.) Do a left outerjoin of E with F w/ the condition $3 = $7  Project w/ Rename tax_prod_root  sportydiveshbook. tax_prod_root / \ book book... | /... \ title author author PL = $9 $9.tag != book $9

Main Results Duplicate elimination by value can be expressed in TAX. The operators in TAX are independent. TAX is complete for relational algebra w/ aggregation. TAX can capture the fragment of XQuery FLWR expressions w/o function calls, recursion, w/ all path expressions using only constants, wildcards, and / & //, when no new ancestor- descendant relationships are created.

Optimization Examples Revisit translation example: E can be simplified to – project w/ Similar simplification applies to F Self-join can sometimes be eliminated Associativity, commutativity issues $1 $2$3 $1.tag=book & $2.tag=author & $2.hobby=tennis & $3.tag=title PL= $1,$3

Implementation TIMBER system at Univ. of Michigan Find pattern tree matches via Index scans Full scans Twig joins Joins implemented on streams Pedigree – implemented as position of element within document Pedigrees similar to RID at impl. level

Summary & Future Work TAX – extension of RA for handling heterogeneous collections of ordered labeled trees Simplicity; few more operators Recognize selective importance of order and handle elegantly Bulk algebra for efficient implementation of XML querying Stay tuned for TIMBER release(s) Future Arbitrary restructuring: copy-and-paste Updates: principled via operators