Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.

Similar presentations


Presentation on theme: "Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza."— Presentation transcript:

1 Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza

2 Queries with Incomplete Answers Queries with complete answers Queries with AND Semantics Queries with Weak Semantics Queries with OR Semantics Increasing level of incompleteness Dealing with Incomplete Data

3 Queries and Matchings The queries are labeled rooted directed graphs –labels are on the edges Query nodes are variables Database nodes are objects Matchings are assignments of database nodes to the query variables according to –the constraints specified in the query, and –the semantics of the query

4 Root Constraint: Satisfied if the query root is mapped to the db root Edge Constraint: Satisfied if a query edge with label l is mapped to a database edge with label l Constraints On Exact Matchings r1 Query Root Database Root x y 12 25 ll

5 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name 34 29 27 Movie Director Uncredited Actor 14 May 1944 Date of birth 35 v Name Date of birth George Lucas A Exact Matching A Exact Matching Producer 11227 32 11 35 All the nodes are mapped to non-null values The root constraint and all the edge constraints are satisfied

6 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name 34 29 27 Movie Director Uncredited Actor 14 May 1944 Date of birth 35 v Name Date of birth Consider the case where Node 35 is removed from the database 14 May 1944 Date of birth 35 George Lucas No Exact Matching Exists! Allow Partial Matchings No Exact Matching Exists! Allow Partial Matchings Producer Star Wars 1977

7 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth George Lucas Not Every Partial Assignment is of interest Not Every Partial Assignment is of interest Producer 1 This is not interesting, since the query returns data that has no connection to the query u NULL z y x 31

8 The Reachability Constraint on Partial Matchings A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied Database x z w y l1l1 r v l3l3 l2l2 l5l5 l4l4 l6l6 Query w y r v l3l3 l5l5 v 1 55 5 8 l1l1 1 l3l3 l5l5 v x z r l2l2 l4l4 l6l6 7 9 1 l2l2 l4l4 l6l6

9 yx z Director Actor r Producer “ And ” Matchings A partial matching is an AND matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node –If a query node is mapped to a database node, all the incoming edge constraints are satisfied

10 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie An AND Matching George Lucas Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 11227 32 Producer 11 Producer u NULL

11 Uncredited Actor Uncredited Actor 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth Suppose that we remove the edges that are labeled with Uncredited Actor George Lucas Producer In an AND matching, Node z must be null! In an AND matching, Node z must be null!

12 Edge Constraint: Is Weakly Satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value Weak Satisfaction of Edge Constraints x y 12 25 ll x y 12 25 lm null x y 12 25 lm null x y l

13 Weak Matchings A partial matching is a weak matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node –Every edge constraint is weakly satisfied

14 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie A Weak Matching George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 Producer 11 Producer u NULL y Edges that are weakly satisfied

15 x y 12 25 ll x y 12 25 lm null x y l x y 12 25 lm null In a weak matching, all four options are permitted In an AND matching, only the first three options are permitted

16 Producer 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth Consider the case where edges labeled with Producer are removed George Lucas Producer In a weak matching, Node z must be null! In a weak matching, Node z must be null!

17 “ OR ” Matchings A partial matching is an OR matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node

18 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie An OR Matching George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 11 Producer u NULL y An edge which is not weakly satisfied

19 Increasing Level of Incompleteness A Exact matching is an AND matching An AND matching is a weak matching A weak matching is an OR matching

20 A matching is maximal if no other matching subsumes it, i.e., if there is no other matching that is equal on all mapped variables, and has additional mapped variables A query result consists of maximal matchings only The maximality of a matching may depend on the semantics considered (i.e., or, weak, and) Maximal Matchings

21 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Is this an AND matching? Is it maximal? Is this an AND matching? Is it maximal? George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 Producer u NULL y

22 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 Producer u NULL y Is this a Weak matching? Is it maximal? Is this a Weak matching? Is it maximal?

23 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 Producer u NULL y Is this an OR matching? Is it maximal? Is this an OR matching? Is it maximal?

24 1 11 Movie Database Movie 222325 26 Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name 34 29 27 Movie Director Uncredited Actor v Name Date of birth 127 32 Producer 11 Producer u NULL y Is this an AND matching? Weak matching? OR matching? Is it maximal (for each option)? Is this an AND matching? Weak matching? OR matching? Is it maximal (for each option)?

25 1 234 University Course Lab Teacher 5678 9 10 11 Course Teacher Instructor Title Name Title 1213 14 15 A. CohenB. Levi C. Katz LogicOS CompilersDatabases v u Teacher Course University w x Lab Teacher Find all maximal answers under AND- Semantics, OR-Semantics and Weak Semantics

26 Computing Maximal Answers How can we systematically compute all maximal answers? Can we compute all answers in polynomial time? We will see an algorithm to compute all maximal answers of a DAG Query under AND Semantics

27 Intuition Sort nodes in query by a topological order Start with the set of matchings containing the matching (root of query/root of database) Iterate over nodes v i according to order –extend each matchings by all possible images of v i that yield AND-matchings –if there are no appropriate images, then extend with v i mapped to null

28 Eval-Dag-Query-AND-Semantics(Q,D) let v 0 < v 1 < … < v k be a topological ordering of the nodes of Q let S 0 = {(v 0 /root(D)} for i = 1 to k do S i = ; for each  2 S i-1 do E = { u 2 D |  © (v i /u) is an AND matching} if E = ; then S i = S i [ {  © (v i /null)} else S i = S i [ {  © (v i /u) | u 2 E}

29 Analyzing the Algorithm Why is the algorithm correct? What is the runtime of the algorithm? What are the memory requirements of the algorithm? Can this algorithm easily be adapted for general graph queries (which may contain cycles)?

30 AND Semantics – Cyclic Queries Determining whether there is an AND matching that maps at least 1 non-root node to a non-null is NP-Complete –why is it in NP? –NP-hardness by reduction to Hamiltonian cycle

31 Hamiltonian Cycle Given a graph G, a Hamiltonian cycle is a simple cycle that traverses each node in the graph exactly once Determining if there is a Hamiltonian cycle is NP-Complete!

32 Can You Find One Here?

33

34 Reduction We show how, given a solution to the matchings under AND-Semantics problem, we can solve the Hamiltonian cycle problem Given graph G, we –create database D and query Q such that –G has a Hamiltonian cycle if and only if there is an AND-matching that maps a non-root node to a non-null value

35 Creating the Database Suppose that the graph G has nodes n 1, …,n k We create a database with nodes u 0,u 1,…,u k u 0 is the root of the database there is an edge labeled node from u 0 to each node u i for each pair of nodes u i, u j (i >=1, j>= 1, i  j) there is an edge labeled neql from u i to u j there is an edge labeled succ from u i to u j if there is an edge from n i to n j in G

36 Example: Create the Database for this Graph

37 Creating the Query Suppose that the graph G has nodes n 1, …,n k We create a query with nodes v 0,v 1,…,v k v 0 is the root of the database there is an edge labeled node from v 0 to each node v i for each pair of nodes v i, v j (i  j) there is an edge labeled neql from v i to v j there is an edge labeled succ from v i-1 to v i (for all i>1) and an edge labeled succ from v k to v 1

38 Example: Create the Query for this Graph

39 How does the Reduction Work? Mapping the root of the query to the root of the database is an AND-matching –can any additional nodes be mapped? If there is a Hamiltonian cycle, then this gives rise to a complete mapping of the query to the database If there is a matching that maps something other than the root to null, then: –it must map all the nodes (because of the cycle of succ) –it must map all query nodes to different database nodes (because of neql edges) –therefore, the mappings of the node correspond to a Hamiltonian cycle (because of such edges)


Download ppt "Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza."

Similar presentations


Ads by Google