Download presentation
Presentation is loading. Please wait.
1
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane
2
2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set- oriented) data model XML format is a tree-structured hierarchical model
3
3 Why XML Algebra? It is common to translate a query language into an algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.
4
4 XML Algebra History Lore Algebra (August 1999) -- Stanford University IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison
5
5 NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier. Univ. of Wisconsin
6
6 Outline Concepts of Niagara Algebra Operations Optimization
7
7 Goals of Niagara Algebra Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques
8
8 Example: XML Source Documents Invoice.xml 2 AT&T $0.25 1 Sprint $1.20 1 AT&T $0.75 Customer.xml 1 Tom 2 George
9
9 XML Data Model and Tree Graph Example: Invoice_Document Invoice … number carrier total number carrier total 2 AT&T$0.251 Sprint $1.20 2 Sprint $0.25 1 Sprint $1.20 Ordered Tree Graph, Semi structured Data
10
10 XML Data Model [GVDNM01] Collection of bags of vertices. Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number Invoice-element-content element-content [Root “invoice.xml ”, invoice, invoice. account_number ]
11
11 Data Model Bag elements are reachable by path expressions. Path expression consists of two parts: An entry point A relative forward part Example: account_number:invoice
12
12 Operators Source S, Follow , Select , Join, Rename , Expose , Vertex, Group , Union , Intersection , Difference -, Cartesian Product .
13
13 Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documents S (invoice*.xml) All XML documents whose filename match “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd
14
14 Follow operator Input : a path expression in entry point notation Functionality : extracts vertices reachable by path expression Output : a new bag that consists of the extracted vertex + all contents of original bag (in case of unnesting follow)
15
15 Follow operator (Example*) Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice invoice.carrier Invoice-element-content carrier -element-content (carrier:invoice) *Unnesting Follow {[Root invoice.xml, invoice]} {[Root invoice.xml, invoice, invoice.carrier]}
16
16 Select operator Input : a set of bags Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)
17
17 Select operator (Example) invoice.carrier =Sprint Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content Root invoice.xml invoice Invoice-element-content {[Root invoice.xml, invoice], [Root invoice.xml, invoice], ……………} {[Root invoice.xml, invoice],… }
18
18 Join operator Input: two collections of bags Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate
19
19 Join operator (Example) Root invoice.xml invoice Invoice-element-content Root customer.xml customer customer-element-content account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer Invoice-element-content customer-element-content {[Root invoice.xml, invoice]}{[Root customer.xml, customer]} {[Root invoice.xml, invoice, Root customer.xml, customer]}
20
20 Expose operator Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order
21
21 Expose operator (Example) Root invoice.xml invoice. bill_period invoice.carrier carrier-element-content bill_period -element-content (bill_period,carrier) {[Root invoice.xml, invoice.bill_period, invoice.carrier]} Root invoice.xml invoice invoice.carrier invoice.bill_period Invoice-element-content bill_period -element-content {[Root invoice.xml, invoice, invoice.carrier, invoice.bill_period]} carrier-element-content
22
22 Vertex operator Creates the actual XML vertex that will encompass everything created by an expose operator Example : (Customer_invoice)[ ( (account)[invoice.account_number], (inv_total)[invoice.total])]
23
23 Other operators Group : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename : Changes entry point annotation of elements of a bag. Example: (invoice.bill_period,date)
24
24 Example: XML Source Documents Invoice.xml 2 AT&T $0.25 1 Sprint $1.20 1 $0.75 maria Customer.xml 1 Tom 2 George
25
25 Xquery Example List account number, customer name, and invoice total for all invoices that has carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN $i/account_number, $c/name, $i/total
26
26 Example: Xquery output 1 Tom $1.20
27
27 Algebra Tree Execution customer (2)customer(1)Invoice (1)invoice (2)invoice (3) Source (Invoices.xml)Source (cutomers.xml) Follow (*.invoice)Follow (*.customer) Select (carrier= “Sprint” ) invoice (2) Join (*.invoice.account_number=*.customer.account) invoice(2) customer(1) Expose (*.account_number, *.name, *.total ) Account_number name total
28
28 Optimization with Niagara Optimizer based on Niagara algebra: Use the operation more efficiently Produce simpler expressions by combining operations
29
29 Language Convention A and B are path expressions A< B -- Path Expression A is prefix of B AnB --- Common prefix of path A and B AńB --- Greatest common of path A and B ┴ --- Null path Expression
30
30 Heuristics using Rewrite Rules Allow optimization based on path selectivity When applying un-nesting following operation Φ μ
31
31 Φ μ (A) [Φ μ (B)]=Φ μ (B)[Φ μ (A)] TRUE when exists C such that C < A && C < B and C = AńB Or AnB = ┴ Interchangeability of Follow operation
32
32 Application of Rule on Invoice Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] * =?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] **
33
33 Application of Rule on Invoice Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] ?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] Equivalent because both share the common prefix “invoice”. Case AńB = invoice
34
34 Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice element, while carrier is not required for invoice element THEN: Φ μ (acc_Num:invoice)[Φ μ (carrier:invoice)] ?= Φ μ (carrier:invoice)[Φ μ (acc_Num:invoice)] Then what algebra tree do we prefer? Φ μ (acc_Num:invoice)[Φ μ (acc_Num:customer)] make more sense than ** Why?
35
35 Discussion Reduction of Input Size on first Sub-operation: Φ μ (carrier:invoice)
36
36 Should we/can we apply the rule below? Φ μ (acc_Num:invoice)[Φ μ (acc_Num:Customer)]
37
37 “acc_Num:invoice” and “acc_Num:customer” are two totally different paths Case is: AnB = ┴ So yes, rule is valid.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.