1 Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Relational Algebra Dashiell Fryer. What is Relational Algebra? Relational algebra is a procedural query language. Relational algebra is a procedural query.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL - An XML ALgebra for Query Optimization.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XML Algebra Comparison between: XPERANTO NIAGARA.
Introduction to XML Algebra
1 COS 425: Database and Information Management Systems XML and information exchange.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
1 XQuery to XAT Xin Zhang. 2 Outline XAT Data Model. XAT Operator Design. XQuery Block Identification. Equivalent Rewriting Rules. Computation Pushdown.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Query Processing Presented by Aung S. Win.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Relational Algebra Instructor: Mohamed Eltabakh 1.
XML-QL A Query Language for XML Charuta Nakhe
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Querying Structured Text in an XML Database By Xuemei Luo.
XQL, OQL and SQL Xia Tang Sixin Qian Shijun Shen Feb 18, 2000.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Database Systems Part VII: XML Querying Software School of Hunan University
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
PROCESSING AND QUERYING XML 1. ROADMAP Models for Parsing XML Documents XPath Language XQuery Language XML inside DBMSs 2.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
1 The XPath Language. 2 XPath Expressions Flexible notation for navigating around trees A basic technology that is widely used uniqueness and scope in.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Ritu CHaturvedi Some figures are adapted from T. COnnolly
Querying and Transforming XML Data
Database Processing with XML
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Evaluation of Relational Operations: Other Operations
The Relational Algebra and Relational Calculus
Instructor: Mohamed Eltabakh
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
An algebra for XML Leonidas Galanis, Stratis Viglas
Relational Algebra Friday, 11/14/2003.
Wednesday, May 29, 2002 XML Storage Final Review
Query Optimization.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
CS561-Spring 2012 WPI, Mohamed eltabakh
Presentation transcript:

1 Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1

2 Outline Reasons for XML algebra Niagara algebra AT&T Algebra

3 Data Model and Design We need a clear framework to design a database A data model is like creating different data structures for appropriate programming usage. It is a type system, it is abstract. Relational database is implemented by tables, XML format is a new one method for information integration.

4 Why XML Algebra? It is common to translate a query language into the algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.

5 XML Algebra History Lore Algebra (August 1999) -- Stanford University IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison

6 NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

7 OutLine Concepts of Niagara Algebra Operations Optimization

8 Goals of Niagara Algebra Be independent of schema information Query on both structure and content Generate simple,flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques

9 Example: XML Source Documents Invoice.xml 2 AT&T $ Sprint $ AT&T $0.75 Customer.xml 1 Tom 2 George

10 XML Data Model and Tree Graph Example: Invoice_Document Invoice … number carrier total number carrier total 2 AT&T$0.251 Sprint $ Sprint $ Sprint $1.20 Ordered Tree Graph, Semi structured Data

11 XML Data Model [GVDNM01] Collection of bags of vertices. Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number Invoice-element-content element-content [Root “invoice.xml ”, invoice, invoice. account_number ]

12 Data Model Bag elements are reachable by path expressions. The path expression consists of two parts : An entry point A relative forward part Example: account_number:invoice

13 Operators Source S, Follow , Select , Join, Rename , Expose , Vertex, Group , Union , Intersection , Difference -, Cartesian Product .

14 Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documents S (invoice*.xml) All XML documents whose filename matches “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

15 Follow operator  Input : a path expression in entry point notation Functionality : extracts vertices reachable by path expression Output : a new bag that consist of the extracted vertex + all the contents of the original bag (in care of unnesting follow)

16 Follow operator (Example*) Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice invoice.carrier Invoice-element-content carrier -element-content  (carrier:invoice) *Unnesting Follow {[Root invoice.xml, invoice]} {[Root invoice.xml, invoice, invoice.carrier]}

17 Select operator  Input : a set of bags Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)

18 Select operator (Example)  invoice.carrier =Sprint Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content {[Root invoice.xml, invoice], [Root invoice.xml, invoice], ……………} {[Root invoice.xml, invoice],… }

19 Join operator Input: two collections of bags Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate

20 Join operator (Example) Root invoice.xml invoice Invoice-element-content Root customer.xml customer customer-element-content account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer Invoice-element-content customer-element-content {[Root invoice.xml, invoice]}{[Root customer.xml, customer]} {[Root invoice.xml, invoice, Root customer.xml, customer]}

21 Expose operator  Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order

22 Expose operator (Example) Root invoice.xml invoice. bill_period invoice.carrier carrier-element-content bill_period -element-content  (bill_period,carrier) {[Root invoice.xml, invoice.bill_period, invoice.carrier]} Root invoice.xml invoice invoice.carrier invoice.bill_period Invoice-element-content bill_period -element-content {[Root invoice.xml, invoice, invoice.carrier, invoice.bill_period]} carrier-element-content

23 Vertex operator Creates the actual XML vertex that will encompass everything created by an expose operator Example : (Customer_invoice)[  ( (account)[invoice.account_number], (inv_total)[invoice.total])]

24 Other operators Group  : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename  : Changes the entry point annotation of the elements of a bag. Example: (invoice.bill_period,date)

25 Example: XML Source Documents Invoice.xml 2 AT&T $ Sprint $ $0.75 maria Customer.xml 1 Tom 2 George

26 Xquery Example List account number, customer name, and invoice total for all invoices that has carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN $i/account_number, $c/name, $i/total

27 Example: Xquery output 1 Tom $1.20

28 Algebra Tree Execution customer (2)customer(1)Invoice (1)invoice (2)invoice (3) Source (Invoices.xml)Source (cutomers.xml) Follow (*.invoice)Follow (*.customer) Select (carrier= “Sprint” ) invoice (2) Join (*.invoice.account_number=*.customer.account) invoice(2) customer(1) Expose (*.account_number, *.name, *.total ) Account_number name total

29 Optimization with Niagara Optimizer based on the Niagara algebra Use the operation more efficiently Produce simpler expression by combining operations

30 Language Convention A and B are path expressions A< B --  Path Expression A is prefix of B AnB ---  Common prefix of path A and B AńB ---  Greatest common of path A and B ┴ ---  Null path Expression

31 Use of Rule 8.5 Make profit of rule 8.5 Allows optimization based on path selectivity When applying un-nesting follow operation Φ μ

32 Φ μ (A) [Φ μ (B)]=Φ μ (B)[Φ μ (A)] True When Exist C / C <A && C < B C = AńB Or AnB = ┴ Interchangeability of Follow operation

33 Application of 8.5 With Invoice Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] * ?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] ** Both Share the common prefix invoice Case AńB = invoice

34 Benefit of Rule Application Note if: acc_Num required for each invoice Element carrier is not required for invoice Element Then using * Φ μ (acc_Num:invoice)[Φ μ (acc_Num:customer)] make more sense than ** Why?

35 Reduction of Input Size on the first Sub-operation Φ μ (carrier:invoice) Should we or can we apply the 8.5 below? Φ μ (acc_Num:invoice)[Φ μ (acc_Num:Customer)] Why?

36 acc_Num:invoice and acc_Num:Customer are totally different path Case is: AnB = ┴ Then yes

37 Rule 8.7, 8.9, 8.11 Interesting Helps identify When and where to use selection  to decrease size of input operation to subsequent operation Example Algebra tree slide 28 Selected before join.

38 Addition would be Give computation for finding when rule can be applied automatically in a case and then apply it.

39 AT&T Algebra

40

41 AT&T Algebra Introduction The algebra is derived from the nested relational algebra. AT&T algebra makes heavy use of list comprehensions, a standard notation in the function programming community. AT&T algebra uses the functional programming language Haskell as a notation from presenting the algebra.

42 AT&T data model The data model merges attribute and element nodes, and eliminates comments. Declare Basic Type: Node. Text :: String ->node elem :: Tag -> [Node] ->node ref :: Node ->Node Data on the Web Data on the Web </bib> elem “bib” [ elem “book”[ elem [ text “1999” ], elem “title” [text “Data on the web” ] ]]

43 Basic Type Declarations To find the type of a node, isText :: Node -> Bool isElem :: Node -> Bool isRef :: Node -> Bool For a text node, string :: Node -> String For an element node, 1)tag :: Node -> Tag 2)children :: Node -> [Node] For a reference node, dereference :: Node -> Node

44 Nested relational algebra… In the nested relational approach, data is composed of tuples and lists. Tuple values and tuple types are written in round brackets. (1999,"Data on theWeb",["Abiteboul"]) :: (Int,String,[String]) Decompose values: year :: (Int,String,[String]) year (x,y,l) = x

45 Nested relational algebra… Comprehensions: List comprehensions can be used to express fundamental query operations, navigation, cartesian product, nesting, joins. Example: [ value x | x <- children book0, is "author" x ] ==> [ "Abiteboul" ] Normal expression:[ exp | qual1,...,qualn ] bool-exp pat <- list-exp

46 Nested relational algebra… Using comprehensions to write queries. Navigate follow :: Tag -> Node -> [Node] follow t x = [ y | y <- children x, is t y ] Cartesian product [ (value y, value z) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "author" x ] ==> [ ("Data on the Web", "Abiteboul")]

47 Nested relational algebra… Joins. elem "reviews" [ elem "book" [ elem "title" [ text"Data on the Web" ], elem "review" [ text "This is great!" ]] elem “bib” [ elem “book”[ elem [ text “1999” ], elem “title” [text “Data on the web” ] ]] [ (value y, int (value z), value w) | x <- follow "book" bib0, y <- follow "title" x, z <- follow x, u <- follow "book" reviews0, v <- follow "title" u, w <- follow u, y == v ] ==> [("Data on the Web", 1999, "This is great!")]

48 Nested relational algebra… Regular expression matching ( [ (x,y,u) | x <- item y <- item "title", u <- rep (item "author") ] ) :: Reg (Node,Node,[Node] ) match reg0 book0 ==> [(elem [text "1999"], elem "title" [text "Data on the Web"], [elem "author" [text "Abiteboul"], elem "author" [text "Buneman"], elem "author" [text "Suciu"] ] ) ] Match :: Reg a -> Node-> [a] Result

49 Nested relational algebra… Sorting. sortBy :: (a -> a -> Bool) -> [a] -> [a] sortBy ( [1,1,2,3] Grouping groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy (==) [3,1,2,1] == [[2],[1,1],[3]]

50 Cross Comparisons of Algebra Niagara and AT&T standalone XML algebras Niagara proposed after W3C had selected proposed standard and has operators which operate on sets of bags At&T algebra chosen as proposed standard by W3C -- expressions resemble high level query language -- latest version of document referred to as “Semantics of XML Query Language XQuery”

51 Future Work Need more different evaluation strategies which would allow for flexible query plans Develop physical operators that take advantage of physical storage structures and generate mapping from query tree to a physical query plan