Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.

Slides:



Advertisements
Similar presentations
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.
Advertisements

Introduction to Algorithms NP-Complete
NP-Hard Nattee Niparnan.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Max Cut Problem Daniel Natapov.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
NP-Completeness: Reductions
NP-complete and NP-hard problems Transitivity of polynomial-time many-one reductions Concept of Completeness and hardness for a complexity class Definition.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Computational problems, algorithms, runtime, hardness
1 Oblivious Querying of Data with Irregular Structure.
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
The Theory of NP-Completeness
NP-complete and NP-hard problems
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
Analysis of Algorithms CS 477/677
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Example: HP ≤ p HC Hamiltonian Path –Input: Undirected Graph G = (V,E) –Y/N Question: Does G contain a Hamiltonian Path? Hamiltonian Cycle –Input: Undirected.
Odds and Ends HP ≤ p HC (again) –Turing reductions Strong NP-completeness versus Weak NP-completeness Vertex Cover to Hamiltonian Cycle.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 31.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
Graph Theory Topics to be covered:
MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta
Great Theoretical Ideas in Computer Science.
MIT and James Orlin1 NP-completeness in 2005.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
1 Chapter 8 NP and Computational Intractability Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Jan Topological Order and SCC Edge classification Topological order Recognition of strongly connected components.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
NP-Complete problems.
NP-Completeness (Nondeterministic Polynomial Completeness) Sushanth Sivaram Vallath & Z. Joseph.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Graphs Part II Lecture 7. Lecture Objectives  Topological Sort  Spanning Tree  Minimum Spanning Tree  Shortest Path.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Hamiltonian Path Problem This girl do this presentation. :) Teddy bear work along with me. Jeff help me understand the topic. Are they both cute?
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 30, 2014.
NPC.
CSC 413/513: Intro to Algorithms
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 30, 2014.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
CS 361 – Chapter 13 Graph Review Purpose Representation Traversal Comparison with tree.
 2005 SDU Lecture15 P,NP,NP-complete.  2005 SDU 2 The PATH problem PATH = { | G is a directed graph that has a directed path from s to t} s t
Costas Busch - LSU 1 More NP-complete Problems. Costas Busch - LSU 2 Theorem: If: Language is NP-complete Language is in NP is polynomial time reducible.
More NP-Complete and NP-hard Problems
P & NP.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Decision trees Polynomial-Time
More NP-complete Problems
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Computing Full Disjunctions
Instructor: Shengyu Zhang
Trevor Brown DC 2338, Office hour M3-4pm
Trevor Brown DC 2338, Office hour M3-4pm
Presentation transcript:

Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza

Queries with Incomplete Answers Queries with complete answers Queries with AND Semantics Queries with Weak Semantics Queries with OR Semantics Increasing level of incompleteness Dealing with Incomplete Data

Queries and Matchings The queries are labeled rooted directed graphs –labels are on the edges Query nodes are variables Database nodes are objects Matchings are assignments of database nodes to the query variables according to –the constraints specified in the query, and –the semantics of the query

Root Constraint: Satisfied if the query root is mapped to the db root Edge Constraint: Satisfied if a query edge with label l is mapped to a database edge with label l Constraints On Exact Matchings r1 Query Root Database Root x y ll

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name Movie Director Uncredited Actor 14 May 1944 Date of birth 35 v Name Date of birth George Lucas A Exact Matching A Exact Matching Producer All the nodes are mapped to non-null values The root constraint and all the edge constraints are satisfied

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name Movie Director Uncredited Actor 14 May 1944 Date of birth 35 v Name Date of birth Consider the case where Node 35 is removed from the database 14 May 1944 Date of birth 35 George Lucas No Exact Matching Exists! Allow Partial Matchings No Exact Matching Exists! Allow Partial Matchings Producer Star Wars 1977

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name Movie Director Uncredited Actor v Name Date of birth George Lucas Not Every Partial Assignment is of interest Not Every Partial Assignment is of interest Producer 1 This is not interesting, since the query returns data that has no connection to the query u NULL z y x 31

The Reachability Constraint on Partial Matchings A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied Database x z w y l1l1 r v l3l3 l2l2 l5l5 l4l4 l6l6 Query w y r v l3l3 l5l5 v l1l1 1 l3l3 l5l5 v x z r l2l2 l4l4 l6l l2l2 l4l4 l6l6

yx z Director Actor r Producer “ And ” Matchings A partial matching is an AND matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node –If a query node is mapped to a database node, all the incoming edge constraints are satisfied

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie An AND Matching George Lucas Director Steven Spielberg Director 12 r yx z u Uncredited Actor Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer 11 Producer u NULL

Uncredited Actor Uncredited Actor 1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Suppose that we remove the edges that are labeled with Uncredited Actor George Lucas Producer In an AND matching, Node z must be null! In an AND matching, Node z must be null!

Edge Constraint: Is Weakly Satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value Weak Satisfaction of Edge Constraints x y ll x y lm null x y lm null x y l

Weak Matchings A partial matching is a weak matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node –Every edge constraint is weakly satisfied

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie A Weak Matching George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer 11 Producer u NULL y Edges that are weakly satisfied

x y ll x y lm null x y l x y lm null In a weak matching, all four options are permitted In an AND matching, only the first three options are permitted

Producer 1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Consider the case where edges labeled with Producer are removed George Lucas Producer In a weak matching, Node z must be null! In a weak matching, Node z must be null!

“ OR ” Matchings A partial matching is an OR matching if –The root constraint is satisfied –The reachability constraint is satisfied by every query node that is mapped to a database node

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie An OR Matching George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer u NULL y An edge which is not weakly satisfied

Increasing Level of Incompleteness A Exact matching is an AND matching An AND matching is a weak matching A weak matching is an OR matching

A matching is maximal if no other matching subsumes it, i.e., if there is no other matching that is equal on all mapped variables, and has additional mapped variables A query result consists of maximal matchings only The maximality of a matching may depend on the semantics considered (i.e., or, weak, and) Maximal Matchings

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie Is this an AND matching? Is it maximal? Is this an AND matching? Is it maximal? George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer u NULL y

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer u NULL y Is this a Weak matching? Is it maximal? Is this a Weak matching? Is it maximal?

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer u NULL y Is this an OR matching? Is it maximal? Is this an OR matching? Is it maximal?

1 11 Movie Database Movie Actor Name Title 3133 Dustin Hoffman Harrison Ford 1977 Star Wars 24 Year 21 Actor Name 30 Mark Hamill Hook Movie George Lucas Director Steven Spielberg Director 12 r yx z u Name 32 Name Movie Director Uncredited Actor v Name Date of birth Producer 11 Producer u NULL y Is this an AND matching? Weak matching? OR matching? Is it maximal (for each option)? Is this an AND matching? Weak matching? OR matching? Is it maximal (for each option)?

1 234 University Course Lab Teacher Course Teacher Instructor Title Name Title A. CohenB. Levi C. Katz LogicOS CompilersDatabases v u Teacher Course University w x Lab Teacher Find all maximal answers under AND- Semantics, OR-Semantics and Weak Semantics

Computing Maximal Answers How can we systematically compute all maximal answers? Can we compute all answers in polynomial time? We will see an algorithm to compute all maximal answers of a DAG Query under AND Semantics

Intuition Sort nodes in query by a topological order Start with the set of matchings containing the matching (root of query/root of database) Iterate over nodes v i according to order –extend each matchings by all possible images of v i that yield AND-matchings –if there are no appropriate images, then extend with v i mapped to null

Eval-Dag-Query-AND-Semantics(Q,D) let v 0 < v 1 < … < v k be a topological ordering of the nodes of Q let S 0 = {(v 0 /root(D)} for i = 1 to k do S i = ; for each  2 S i-1 do E = { u 2 D |  © (v i /u) is an AND matching} if E = ; then S i = S i [ {  © (v i /null)} else S i = S i [ {  © (v i /u) | u 2 E}

Analyzing the Algorithm Why is the algorithm correct? What is the runtime of the algorithm? What are the memory requirements of the algorithm? Can this algorithm easily be adapted for general graph queries (which may contain cycles)?

AND Semantics – Cyclic Queries Determining whether there is an AND matching that maps at least 1 non-root node to a non-null is NP-Complete –why is it in NP? –NP-hardness by reduction to Hamiltonian cycle

Hamiltonian Cycle Given a graph G, a Hamiltonian cycle is a simple cycle that traverses each node in the graph exactly once Determining if there is a Hamiltonian cycle is NP-Complete!

Can You Find One Here?

Reduction We show how, given a solution to the matchings under AND-Semantics problem, we can solve the Hamiltonian cycle problem Given graph G, we –create database D and query Q such that –G has a Hamiltonian cycle if and only if there is an AND-matching that maps a non-root node to a non-null value

Creating the Database Suppose that the graph G has nodes n 1, …,n k We create a database with nodes u 0,u 1,…,u k u 0 is the root of the database there is an edge labeled node from u 0 to each node u i for each pair of nodes u i, u j (i >=1, j>= 1, i  j) there is an edge labeled neql from u i to u j there is an edge labeled succ from u i to u j if there is an edge from n i to n j in G

Example: Create the Database for this Graph

Creating the Query Suppose that the graph G has nodes n 1, …,n k We create a query with nodes v 0,v 1,…,v k v 0 is the root of the database there is an edge labeled node from v 0 to each node v i for each pair of nodes v i, v j (i  j) there is an edge labeled neql from v i to v j there is an edge labeled succ from v i-1 to v i (for all i>1) and an edge labeled succ from v k to v 1

Example: Create the Query for this Graph

How does the Reduction Work? Mapping the root of the query to the root of the database is an AND-matching –can any additional nodes be mapped? If there is a Hamiltonian cycle, then this gives rise to a complete mapping of the query to the database If there is a matching that maps something other than the root to null, then: –it must map all the nodes (because of the cycle of succ) –it must map all query nodes to different database nodes (because of neql edges) –therefore, the mappings of the node correspond to a Hamiltonian cycle (because of such edges)