Download presentation
1
Introduction to XML Algebra
CS561
2
Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented) data model XML format is a tree-structured hierarchical model
3
Why Query Algebra (for XML) ?
It is common to translate a query language into an algebra. First, the algebra is used to give a semantics for the query language. Second, the algebra is used to support query optimization.
4
XML Algebra History Lore Algebra (August 1999) -- Stanford University
IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp YAT Algebra (May 2000) AT&T Algebra (June 2000) --AT&T; Bell Labs Niagara Algebra (2001) -- University of Wisconsin -Madison
5
NIAGARA Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier. Univ. of Wisconsin
6
Outline Concepts of Niagara Algebra Operations Optimization
7
Goals of Niagara Algebra
Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful algebraic expressions Allow re-use of traditional optimization techniques
8
Example: XML Source Documents
Invoice.xml <Invoice_Document> <invoice No = 1> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>
9
XML Data Model and Tree Graph
Example: Invoice_Document <Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice> <number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </Invoice_Document> … Invoice Invoice number carrier number total total carrier 2 AT&T $0.25 1 $1.20 Sprint Ordered Tree Graph, Semi structured Data
10
XML Data Model (for Querying)
SQL: relations in, relation out. Relational Algebra: relations in, relation out. XQuery: XML doc in, XML docs out XML Algebra: ??
11
XML Data Model [GVDNM01] Collection of bags of vertices.
Vertices in a bag have no order. Example: Root invoice.xml invoice invoice.account_number < account_number > element-content </ account_number > <invoice> Invoice-element-content </invoice> [Root“invoice.xml”, invoice, invoice. account_number ]
12
Data Model Bag elements are reachable by path expressions.
Path expression consists of two parts: An entry point A relative forward part Example: account_number:invoice
13
Outline Concepts of Niagara Algebra Operations Optimization
14
Operators Source S , Follow , Expose , Vertex ,
Source S , Select , Join , Rename , Group , Union , Intersection , Difference - , Cartesian Product .
15
Source Operator S Input : a list of documents
Output :a collection of singleton bags Examples : S (*) All known XML documents S (invoice*.xml) All XML documents whose filename match “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd
16
Follow operator Input : a path expression in entry point notation
Functionality : extracts vertices reachable by path expression Output : a new bag that consists of the extracted vertex + all contents of original bag (in case of unnesting follow)
17
Follow operator (Example*)
{[Root invoice.xml , invoice, invoice.carrier]} Root invoice.xml invoice invoice.carrier <carrier> carrier -element-content </carrier > <invoice> Invoice-element-content </invoice> *Unnesting Follow (carrier:invoice) Root invoice.xml invoice <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice]}
18
Select operator Input : a set of bags
Functionality : filters the bags of a collection using a predicate Output : a set of bags that conform to the predicate Predicate : Logical operator (,,), or simple qualifications (,,,,,)
19
Select operator (Example)
{[Root invoice.xml , invoice],… } Root invoice.xml invoice <invoice> Invoice-element-content </invoice> invoice.carrier =Sprint Root invoice.xml invoice Root invoice.xml invoice <invoice> Invoice-element-content </invoice> <invoice> Invoice-element-content </invoice> {[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}
20
Join operator Input: two collections of bags
Functionality: Joins the two collections based on a predicate Output: the concatenation of pairs of pages that satisfy the predicate
21
Join operator (Example)
{[Root invoice.xml , invoice, Root customer.xml , customer]} Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> account_number: invoice =number:customer Root invoice.xml invoice Root customer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> {[Root invoice.xml , invoice]} {[Root customer.xml , customer]}
22
Expose operator Input: a list of path expressions of vertices to be exposed Output: a set of bags that contains vertices in the parameter list with the same order
23
Expose operator (Example)
{[Root invoice.xml , invoice.bill_period, invoice.carrier]} Root invoice.xml invoice. bill_period invoice.carrier <carrier> bill_period -element-content </carrier > <invoice> carrier-element-content </invoice> (bill_period,carrier) Root invoice.xml invoice invoice.carrier invoice.bill_period <invoice> Invoice-element-content </invoice> <invoice> carrier-element-content </invoice> <carrier> bill_period -element-content </carrier > {[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}
24
Vertex operator Creates the actual XML vertex that will encompass everything created by an expose operator Example : (Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]
25
Other operators Group : is used for arbitrary grouping of elements based on their values Aggregate functions can be used with the group operator (i.e. average) Rename : Changes entry point annotation of elements of a bag. Example: (invoice.bill_period,date)
26
Example: XML Source Documents
Invoice.xml <Invoice_Document> <invoice> <account_number>2 </account_number> <carrier>AT&T</carrier> <total>$0.25</total> </invoice> <account_number>1 </account_number> <carrier>Sprint</carrier> <total>$1.20</total> <total>$0.75</total> <auditor> maria </auditor> </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom </name> </customer > <account>2 </account> <name>George </name> </Customer _Document>
27
Xquery Example List account number, customer name, and invoice total for all invoices that have carrier = “Sprint”. FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN <Sprint_invoices> $i/account_number, $c/name, $i/total </Sprint_invoices>
28
Example: Xquery output
<Sprint_Invoice> <account_number>1 </account_number> <name>Tom </name> <total>$1.20</total> </Sprint_Invoice >
29
Algebra Tree Execution
Account_number name total Expose (*.account_number , *.name, *.total ) invoice(2) customer(1) Join (*.invoice.account_number=*.customer.account) invoice (2) Select (carrier= “Sprint” ) Invoice (1) invoice (2) invoice (3) customer(1) customer (2) Follow (*.invoice) Follow (*.customer) Source (Invoices.xml) Source (cutomers.xml)
30
Outline Concepts of Niagara Algebra Operations Optimization
31
Optimization with Niagara
Optimizer based on Niagara algebra: Use the operation more efficiently Produce simpler expressions by combining operations
32
Language Convention A and B are path expressions
A< B -- Path Expression A is prefix of B AnB Common prefix of path A and B AńB Greatest common prefix of path A and B ┴ Null path Expression
33
Heuristics using Rewrite Rules
Allow optimization based on path selectivity When applying un-nesting with operation Φμ
34
Interchangeability of Follow operation
Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)] TRUE or FALSE? TRUE when exists C such that C < A && C < B and C = AńB Or AnB = ┴
35
Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] ? TRUE or FALSE?
36
Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] = Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] TRUE because both share common prefix “invoice”. Case AńB = invoice
37
Benefit of Rule Application
NOTE: Assume acc_Num is required for each invoice element, while carrier is not THEN: Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] == Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] Then what algebra tree do we prefer?
38
Discussion Reduction of Input Size on first Sub-operation:
Φμ(carrier:invoice) vs Φμ(acc_Num:invoice) (:
39
Can we apply the rule below?
Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]
40
Example “acc_Num:invoice” and “acc_Num:customer”
are two totally different paths Case is: AnB = ┴ So yes, rule is valid.
41
Summary XML Algebra Operations Optimization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.