Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Slides:



Advertisements
Similar presentations
Homework Answers 1. {3} 2. {1, 3} 5. {3, 4, 6} 6. {} 10. {2, 3, 4}
Advertisements

A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula.
Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Mathematical Preliminaries
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
7.5 Glide Reflections and Compositions
Math Vocabulary Review Part 1.
Introduction to Algorithms
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Topic 1Topic Q 1Q 6Q 11Q 16Q 21 Q 2Q 7Q 12Q 17Q 22 Q 3Q 8Q 13Q 18Q 23 Q 4Q 9Q 14Q 19Q 24 Q 5Q 10Q 15Q 20Q 25.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
Addition Facts
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
ZMQS ZMQS
Solve Multi-step Equations
ABC Technology Project
Columbus State Community College
Chapter 9 -- Simplification of Sequential Circuits.
Digital Logic Design Gate-Level Minimization
Real Numbers and Complex Numbers
§ 7.7 Complex Numbers.
15. Oktober Oktober Oktober 2012.
Quadratic Inequalities
Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases Paul Beame Jerry Li Sudeepa Roy Dan Suciu University of Washington.
Daniel Deutch Tel Aviv Univ. Tova Milo Tel Aviv Univ. Sudeepa Roy Univ. of Washington Val Tannen Univ. of Pennsylvania.
Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
We are learning how to read the 24 hour clock
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
1 Chapter 4 The while loop and boolean operators Samuel Marateck ©2010.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
The Fundamental Theorem of Algebra
Week 1.
We will resume in: 25 Minutes.
Dantzig-Wolfe Decomposition
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 The Relational Algebra.
Relational Algebra and Relational Calculus
Other Dynamic Programming Problems
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
Chapter 6 Languages: finite state machines
Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Provenance for Database Transformations Val Tannen University of Pennsylvania Joint work with J.N. Foster T.J. Green G. Karvounarakis IPAW ’08 Salt Lake.
Querying and storing XML
Presentation transcript:

Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

Need to Track XML Provenance For scientific data processing [Buneman+ 01] – Tree-structured data, heterogeneous sources – XML is the natural data model – Data annotated with source info; annotations need to be propagated during query processing For incomplete/probabilistic data [Sen.&Abit. 06] – Query output annotated with Boolean formulas – Annotations indicate correlations between source data and output data For data warehousing [Cui+ 00] – Even when data is relational, often have XML views 2

Provenance for Relational Algebra Views 3 ABC abc dbe fge AB ac ae dc de fe V := ¼ AB (( ¼ AC (R) ¼ C (R)) [ ( ¼ AB (R) ¼ BC (R))) source R view V ? ? ?

Semiring-Annotated Relations [PODS07] Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1) Combine and propagate annotations during (positive) relational query processing –, £, Å combine annotations using ¢ – ¼, [ combine annotations using + – ¾ multiplies annotations by 0 or 1 4

Annotated Relations Example 5 ABC abcp dber fges R AB ac2p22p2 aepr dc de2r 2 + rs fe2s 2 + rs V V := ¼ AB (( ¼ AC (R) ¼ C (R)) [ ( ¼ AB (R) ¼ BC (R)))

Semiring Bestiary ( B, Ç, Æ, ?, > )Set semantics ( N, +, ¢, 0, 1)Bag semantics (PosBool(B), Ç, Æ, ?, > )Incomplete dbs ( P ( ), [, Å, ;, )Probabilistic dbs ( P ( P (X)), [, d, ;, { ; })Why-provenance where A d B := {a [ b : a 2 A, b 2 B} ( C, min, max, absent, public) Security clearances ( N [X], +, ¢, 0, 1)Prov. polynomials 6

Our Contribution: Annotated XML We show how to decorate unordered XML data with semiring annotations: K-UXML We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery) We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases 7

K-UXML No attributes, no text values, no repeated children (inessential); no order (essential!) Each node decorated with a value k from semiring K (1 neutral, 0 not present) K-collection: a finite set of elements annotated with values from K Formally, the children of a node form a K- collection of subtrees (to annotate root, also have a top-level K-collection) 8

Example: XPath on K-UXML 9 a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d Source, $T: r c x 1 ¢ y 3 + y 1 ¢ y 2 cy1cy1 d a cy2cy2 bx2bx2 Answer: Query: element r { $T//c } Omitted annotations are 1 (and omitted subtrees have annotation 0)

Example: For-Loops in K-UXQuery 10 azaz bx1bx1 cx2cx2 dy1dy1 dy2dy2 ey3ey3 Source, $S: Answer: Query: element p { for $t in $S return for $x in ($t)/ ¤ return ($x)/ ¤ } (i.e., element p { $S/ ¤ / ¤ }) p d z ¢ x 1 ¢ y 1 + z ¢ x 2 ¢ y 2 e z¢x2¢y3e z¢x2¢y3

Outline of Technical Approach Extend NRC with a recursive tree type – satisfies: tree = label £ { tree } and an operation for structural recursion on trees (srt) [Robertson+ 07] – apply to each child subtree, collect results using NRC big union Generalize NRC + srt to handle semiring- annotated complex values ) NRC K + srt Define semantics of K-UXQuery by translation to NRC K + srt 11

Semantics of Small Union Sums annotations « e 1 [ e 2 ¬ K (x) := « e 1 ¬ K (x) + « e 2 ¬ K (x) Example: 12 axax byby axax byby axax bzbz, Query: return ($S, $T) (in NRC: $S [ $T) a2xa2x byby axax bzbz, Source: Answer:

Semantics of Big Union Sums and multiplies annotations « [ (x 2 e 1 ) e 2 ¬ K (y) := « e 1 ¬ K (a i ) ¢ « e 2 ¬ K [x := a i ] (y) where the support (the set of elements with non-zero annotations) of « e 1 ¬ K is {a 1,..., a n } 13

Big Union Example With K = N 14 Query: return $T/ ¤ / ¤ (in NRC: [ (x 2 $T) [ (y 2 x) { y }) b2b2 c3c3 b b c ccccc c7c7 b c b c Source, $T : Answer: ´´ c, c, c, c, c, c, c,,,

XPath Descendant Operator Uses srt // ¤ applied to forest $T translates to [ (x 2 $T) ¼ 1 ((srt(b, s). f) x) where f := let self = Tree(b, [ (x 2 s) { ¼ 2 (x)} in let matches = [ (x 2 s) { ¼ 1 (x)} in (matches [ {self}, self)) //a, similar to above 15

Data annotated with clearance levels from total order C : P < C < S < T < 0 Joint use of data ( ¢ ) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances) ( C, min, max, 0, P) is a commutative semiring p d min(max(P,C,C),max(P,C,S)) e max(P,C,T) Application: Security Clearances 16 p d Cd C e T aPaP bCbC cCcC dCdC dSdS eTeT Query: element p { $S/ ¤ / ¤ }

For any given clearance level (e.g., C), want the following diagram to commute: Security Condition: Non-Interference 17 pPpP dCdC eTeT pPpP dCdC aPaP bCbC cCcC dCdC dSdS eTeT aPaP bCbC cCcC dCdC query erase > C

Application: Incomplete XML Data annotated with Boolean expressions; tree T represents set of possible worlds Mod(T) 18 T = a b cy3cy3 cy1cy1 ad a cy2cy2 b d a b c c ad a cb d Mod(T) = a b a d a b c a d a bc ad a b d,,,..., 7 possible worlds

Correctness: Possible Worlds 19 For every incomplete tree T, and every UXQuery query q, want this diagram to commute: TMod(T) q(Mod(T)) = Mod(q(T)) q(T)q(T) q q Mod

Commutation with Homomorphisms Theorem: Let h : K 1 K 2 be a semiring homo- morphism. Then for any UXQuery query q, and for any K 1 -UXML document D, we have h(q(D)) = q(h(D)). Ex: security clearances h c : C C h c (k) := if k · c then k else 0 Ex: incomplete dbs º : B B Eval º : PosBool(B) B Ex: duplicate elimination ± : N B ± (k) := if k = 0 then ? else > 20

Related Work Bag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06] Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07] XML provenance [Buneman+ 01] NRC provenance [Hidders+ 07] Semiring-annotated XPath [Grahne+ 07] Negation, expressiveness of RA K [Geerts&Poggi 08] 21

Conclusion We showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt) We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms 22

Future Work Practical applications based on framework – Security clearances – Jointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!) Query optimization: containment/equivalence wrt annotated semantics depends on K – In paper, we show K-equivalence for UXQuery is the same as B -equivalence when K is a distributive lattice 23

24

K-UXQuery Syntax 25