Download presentation
Presentation is loading. Please wait.
Published byEdith Parks Modified over 9 years ago
1
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com
2
Outline Motivation Problems Studied Without schema With schema Recursive schemas Related Work Summary & Future Work
3
Motivation 1/3 Integration of existing data sources. Local as view (LAV) – one of the well-known approaches. Each source = a materialized view over some global database. Answer to query over global DB = answer to query using (materialized) views.
4
Motivation 2/3 (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (14) (15) Mary Moore … Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc. Consider query Q: //Trials[//Status]//Trial over [unknown] original DB. How can and should we answer it using above source?
5
Motivation 3/3 (1) (2) (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (13) (14) (15) Mary Moore … (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (14) (15) Mary Moore … //Trials//Trial? ? One possible original DB
6
Motivation 3/3 (1) (2) (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (13) (14) (15) Mary Moore … (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (14) (15) Mary Moore … //Trials//Trial? ? One possible original DB Q: //Trials[//Status]//Trial
7
Motivation 3/3 (1) (2) (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (13) (14) (15) Mary Moore … (3) (4) John Doe … (10) Complete (11) (12) Jen Bloe … (14) (15) Mary Moore … //Trials//Trial One possible original DB ◦ “●[//Status]” { (3) } Contained rewriting
8
Problems Studied 1/3 Equivalent Rewriting: Given Q and views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. Appropriate for query optimization. Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. Most appropriate for information integration [Halevy, Lenzerini, Pottinger & Halevy].
9
Problems Studied 2/3 No Schema: Given Q and V, find a maximally contained rewriting (MCR) of Q using V. With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V. Focus: Tree Pattern Queries (XP /,//, [ ] ). Schema without cycles, union, and recursion.
10
Problems Studied 3/3 Given Q & V: R Ξ V ◦ E Q. Compensation query Rewriting query Want MCR in the absence and in the presence of a schema. //a[//b]/c //a bc
11
Without Schema 1/6 Question 1: Does an MCR always exist? /a b c V /b d Q1 /a b d Q2 No MCR for Q1 and for Q2. What went wrong? distinguished (answer) node
12
Without Schema 2/6 Trial //Trials//TrialsStatus Patient (1) (2)(3) Trial //Trials StatusPatient (1) (2)(3) Unfulfilled obligations Clip Away Tree (CAT) f f – useful embedding V Q V E
13
Without Schema 3/6 Theorem: Q, V – tree pattern queries. Then Q is answerable using V iff there is a useful embedding from Q to V. a d b a //a c c b a a b c 2 1 7 3 4 5 V e b c Q 1,2 1:{2}, 2:{} 2:{6} 6:{7} 1:{2,3}, 2:{3} 2:{6}, 3:{4} 4:{5}, 6:{7} Testing Existence of MCR: //a 6
14
Without Schema 4/6 Two embeddings – corresponding irredundant CRs. a b a c b c //a de a b a c b c //a ae b cd need for expressing MCR!
15
Without Schema 5/6 Can test existence of MCR in poly time. However, MCRs can be exponentially large (closure issue). ed a a c b //a a //a c b c b V Q How many irredundant CRs are possible?
16
Without Schema 5/6ed a a c b //a a //a c b c b V Q c b a //a d e
17
ed a a c b //a a //a c b c b V Q c b a //a d a/b/c/e
18
Without Schema 5/6ed a a c b //a a //a c b c b V Q c b a //a ea/b e c
19
Without Schema 5/6ed a a c b //a a //a c b c b V Q c b a //a a/b/c/ea/b e c MCR = union of exponential # CRs in the worst case!
20
Without Schema 6/6 Summary: Can test existence of MCR in poly time. Exact characterization. MCR may be union of exponentially many CRs in the worst case. Algorithm for generating MCR.
21
With Schema 1/6 Given Query Q, view V, schema S. Infer all constraints C implied by S. Chase V w.r.t. C. Look for MCR of Q w.r.t. chased view.
22
With Schema 2/6 Auctions Auction open_auctionclosed_auction bids person item name * *? + ? + + E.g. constraints: c_a has ≤ 1 bids child Every Auction having a person desc also has an item desc. every path from Auction to name goes via bids.
23
With Schema 3/6 //Auction o_ac_a bids bids V //Auction bidsbids person item name Q
24
Auctions Auction open_auctionclosed_auction bids person item name * * ? +? + + o_ac_a bids bids //Auction personitem name pi n
25
With Schema 4/6 o_ac_a bids bids //Auction personitem name pi n //Auction bidsbids person item name Q MCR = identity query.
26
With Schema 5/6 Another Example: Auctions Auction closed_auction bids person item name * *? + + open_auction buyer ? //Auction nameitem person Q //Auction V How to answer Q using V?
27
With Schema 5/6 Another Example: Auctions Auction closed_auction bids person item name * *? + + open_auction buyer ? //Auction nameitem person Q //Auction name item So what’s the compensation query?
28
With Schema 5/6 Another Example: Auctions Auction closed_auction bids person item name * *? + + open_auction buyer ? //Auction nameitem person Q //Auction name item MCR = V ◦ “●//name”
29
With Schema 6/6 Challenges and Highlights: Naïve chase can explode. Make chase context aware. Exact characterization of schema w/o recursion and union in terms of constraints. Efficient algo. for inferring the constraints. Efficient algo. for chase. And for finding MCR. MCR is unique, if it exists.
30
Recursive Schemas 1/2 a b c d * * ? //a b V b b c d Q What is the MCR?
31
Recursive Schemas 2/2 a b c d * * ? //a b V b b c d Q b cd
32
Recursive Schemas 2/2 a b c d * * ? //a b V b b c d Q b c d b
33
Recursive Schemas 2/2 a b c d * * ? //a b V b b c d Q b c db
34
Recursive Schemas 2/2 a b c d * * ? //a b V b b c d Q b c b d b MCR = union of four CRs. Behavior similar to no schema.
35
Related Work 1/2 QAV for relational – huge body of work [Halevy 01]. Regular path queries and semi- structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99]. Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].
36
Related Work 2/2 Key differences b/w equivalent & contained rewriting: Unique rewriting (even w/o schema). MCR may involve union of (possibly exponentially many) CRs. Study of contained rewriting in presence of schema. Lot of work on semantic caching [Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, ….
37
Summary & Future Work 1/2 QAV using (maximally) contained rewriting ( information integration). Without schema: existence, characterization, closure, generation of MCR. With Schema: extract essence using constraints, chase, similar problems as above. Impact of recursion. Experiments.
38
Summary & Future Work 2/2 Impact of wildcard, disjunction, order … Impact of union, recursion, … Other integration models (e.g., GLAV) QAV for XQuery.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.