Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Similar presentations


Presentation on theme: "Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh."— Presentation transcript:

1 Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

2 Need to Track XML Provenance For scientific data processing [Buneman+ 01] – Tree-structured data, heterogeneous sources – XML is the natural data model – Data annotated with source info; annotations need to be propagated during query processing For incomplete/probabilistic data [Sen.&Abit. 06] – Query output annotated with Boolean formulas – Annotations indicate correlations between source data and output data For data warehousing [Cui+ 00] – Even when data is relational, often have XML views 2

3 Provenance for Relational Algebra Views 3 ABC abc dbe fge AB ac ae dc de fe V := ¼ AB (( ¼ AC (R) ¼ C (R)) [ ( ¼ AB (R) ¼ BC (R))) source R view V ? ? ?

4 Semiring-Annotated Relations [PODS07] Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1) Combine and propagate annotations during (positive) relational query processing –, £, Å combine annotations using ¢ – ¼, [ combine annotations using + – ¾ multiplies annotations by 0 or 1 4

5 Annotated Relations Example 5 ABC abcp dber fges R AB ac2p22p2 aepr dc de2r 2 + rs fe2s 2 + rs V V := ¼ AB (( ¼ AC (R) ¼ C (R)) [ ( ¼ AB (R) ¼ BC (R)))

6 Semiring Bestiary ( B, Ç, Æ, ?, > )Set semantics ( N, +, ¢, 0, 1)Bag semantics (PosBool(B), Ç, Æ, ?, > )Incomplete dbs ( P ( ), [, Å, ;, )Probabilistic dbs ( P ( P (X)), [, d, ;, { ; })Why-provenance where A d B := {a [ b : a 2 A, b 2 B} ( C, min, max, absent, public) Security clearances ( N [X], +, ¢, 0, 1)Prov. polynomials 6

7 Our Contribution: Annotated XML We show how to decorate unordered XML data with semiring annotations: K-UXML We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery) We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases 7

8 K-UXML No attributes, no text values, no repeated children (inessential); no order (essential!) Each node decorated with a value k from semiring K (1 neutral, 0 not present) K-collection: a finite set of elements annotated with values from K Formally, the children of a node form a K- collection of subtrees (to annotate root, also have a top-level K-collection) 8

9 Example: XPath on K-UXML 9 a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d Source, $T: r c x 1 ¢ y 3 + y 1 ¢ y 2 cy1cy1 d a cy2cy2 bx2bx2 Answer: Query: element r { $T//c } Omitted annotations are 1 (and omitted subtrees have annotation 0)

10 Example: For-Loops in K-UXQuery 10 azaz bx1bx1 cx2cx2 dy1dy1 dy2dy2 ey3ey3 Source, $S: Answer: Query: element p { for $t in $S return for $x in ($t)/ ¤ return ($x)/ ¤ } (i.e., element p { $S/ ¤ / ¤ }) p d z ¢ x 1 ¢ y 1 + z ¢ x 2 ¢ y 2 e z¢x2¢y3e z¢x2¢y3

11 Outline of Technical Approach Extend NRC with a recursive tree type – satisfies: tree = label £ { tree } and an operation for structural recursion on trees (srt) [Robertson+ 07] – apply to each child subtree, collect results using NRC big union Generalize NRC + srt to handle semiring- annotated complex values ) NRC K + srt Define semantics of K-UXQuery by translation to NRC K + srt 11

12 Semantics of Small Union Sums annotations « e 1 [ e 2 ¬ K (x) := « e 1 ¬ K (x) + « e 2 ¬ K (x) Example: 12 axax byby axax byby axax bzbz, Query: return ($S, $T) (in NRC: $S [ $T) a2xa2x byby axax bzbz, Source: Answer:

13 Semantics of Big Union Sums and multiplies annotations « [ (x 2 e 1 ) e 2 ¬ K (y) := « e 1 ¬ K (a i ) ¢ « e 2 ¬ K [x := a i ] (y) where the support (the set of elements with non-zero annotations) of « e 1 ¬ K is {a 1,..., a n } 13

14 Big Union Example With K = N 14 Query: return $T/ ¤ / ¤ (in NRC: [ (x 2 $T) [ (y 2 x) { y }) b2b2 c3c3 b b c ccccc c7c7 b c b c Source, $T : Answer: ´´ c, c, c, c, c, c, c,,,

15 XPath Descendant Operator Uses srt // ¤ applied to forest $T translates to [ (x 2 $T) ¼ 1 ((srt(b, s). f) x) where f := let self = Tree(b, [ (x 2 s) { ¼ 2 (x)} in let matches = [ (x 2 s) { ¼ 1 (x)} in (matches [ {self}, self)) //a, similar to above 15

16 Data annotated with clearance levels from total order C : P < C < S < T < 0 Joint use of data ( ¢ ) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances) ( C, min, max, 0, P) is a commutative semiring p d min(max(P,C,C),max(P,C,S)) e max(P,C,T) Application: Security Clearances 16 p d Cd C e T aPaP bCbC cCcC dCdC dSdS eTeT Query: element p { $S/ ¤ / ¤ }

17 For any given clearance level (e.g., C), want the following diagram to commute: Security Condition: Non-Interference 17 pPpP dCdC eTeT pPpP dCdC aPaP bCbC cCcC dCdC dSdS eTeT aPaP bCbC cCcC dCdC query erase > C

18 Application: Incomplete XML Data annotated with Boolean expressions; tree T represents set of possible worlds Mod(T) 18 T = a b cy3cy3 cy1cy1 ad a cy2cy2 b d a b c c ad a cb d Mod(T) = a b a d a b c a d a bc ad a b d,,,..., 7 possible worlds

19 Correctness: Possible Worlds 19 For every incomplete tree T, and every UXQuery query q, want this diagram to commute: TMod(T) q(Mod(T)) = Mod(q(T)) q(T)q(T) q q Mod

20 Commutation with Homomorphisms Theorem: Let h : K 1 K 2 be a semiring homo- morphism. Then for any UXQuery query q, and for any K 1 -UXML document D, we have h(q(D)) = q(h(D)). Ex: security clearances h c : C C h c (k) := if k · c then k else 0 Ex: incomplete dbs º : B B Eval º : PosBool(B) B Ex: duplicate elimination ± : N B ± (k) := if k = 0 then ? else > 20

21 Related Work Bag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06] Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07] XML provenance [Buneman+ 01] NRC provenance [Hidders+ 07] Semiring-annotated XPath [Grahne+ 07] Negation, expressiveness of RA K [Geerts&Poggi 08] 21

22 Conclusion We showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt) We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms 22

23 Future Work Practical applications based on framework – Security clearances – Jointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!) Query optimization: containment/equivalence wrt annotated semantics depends on K – In paper, we show K-equivalence for UXQuery is the same as B -equivalence when K is a distributive lattice 23

24 24

25 K-UXQuery Syntax 25


Download ppt "Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh."

Similar presentations


Ads by Google