University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

한양대학교 정보보호 및 알고리즘 연구실 이재준 담당교수님 : 박희진 교수님
Chapter 13: Query Processing
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Source of slides: Introduction to Automata Theory, Languages and Computation.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Coordinate Plane Practice The following presentation provides practice in two skillsThe following presentation provides practice in two skills –Graphing.
0 - 0.
Inequalities and their Graphs Objective: To write and graph simple inequalities with one variable.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Dr. Alexandra I. Cristea CS 319: Theory of Databases.
Reductions Complexity ©D.Moshkovitz.
Notes 15 ECE Microwave Engineering
Dimitri DeFigueiredo Earl Barr S. (Felix) Wu Adobe Systems Inc. UC Davis UC Davis International Conference on Privacy, Security, Risk and Trust
ABC Technology Project
3 Logic The Study of What’s True or False or Somewhere in Between.
O X Click on Number next to person for a question.
© S Haughton more than 3?
演 算 法 實 驗 室演 算 法 實 驗 室 On the Minimum Node and Edge Searching Spanning Tree Problems Sheng-Lung Peng Department of Computer Science and Information Engineering.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Although, but, however All of these words join clauses in sentences, but they are different parts of speech. This presentation explains the impact of the.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Twenty Questions Subject: Twenty Questions
Graphing Ax + By = C Topic
Absolute-Value Equations and Inequalities
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Faster Query Answering in Probabilistic Databases using Read-Once Functions Sudeepa Roy Joint work with Vittorio Perduca Val Tannen University of Pennsylvania.
1 P, NP, and NP-Complete Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
This, that, these, those Number your paper from 1-10.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Problems and Their Classes
Bi-intervals for backtracking on temporal constraint networks Jean-François Baget and Sébastien Laborie.
Chapter 5 Test Review Sections 5-1 through 5-4.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
6.4 Logarithmic Functions
11 = This is the fact family. You say: 8+3=11 and 3+8=11
2 4 Theorem:Proof: What shall we do for an undirected graph?
Week 1.
Vanderbilt Business Objects Users Group 1 Linking Data from Multiple Sources.
We will resume in: 25 Minutes.
O X Click on Number next to person for a question.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.
Equivalence Relations
Copyright © Cengage Learning. All rights reserved.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Efficient Query Evaluation on Probabilistic Databases
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Approximate Lineage for Probabilistic Databases
Queries with Difference on Probabilistic Databases
Presentation transcript:

University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer, Katherine Moore, and Dan Suciu

Motivating Example: Explanations ? QueryIMDB Database Schema Relevant lineage: 137 tuples !! “What genres does Tim Burton direct?”

Example cont. (Musicals) Ranking Provenance important tuples unimportant tuple Goal: Rank tuples in order of importance

Solution: Causality  The fundamental question of causality:  “What is the cause of an effect?”  Causality theory has long been studied in AI and philosophy.  [Lewis73, EiterLucasiewicz02, HalpernPearl05, Menzies08]  Offers a metric (responsibility) for measuring the contribution of a variable to an outcome ranking [ChocklerHalpern04]

Contributions  We suggest responsibility as an effective measure for ranking provenance.  Explanations  Error tracing  We define causality and responsibility in a database context.  Complete complexity analysis for computing causality and responsibility for the case of conjunctive queries without self- joins  Interesting dichotomy result.  Non-trivial algorithm for computing responsibility in the PTIME cases.

Endogenous/exogenous tuples Partition the data into 2 groups:  Exogenous tuples (denoted by )  tuples that we consider correct/verified/trusted. They are not candidate causes  E.g. the Genre, and Movie_Director tables  Endogenous tuples (denoted by )  Untrusted tuples, or simply of interest to the user. They are potential causes  E.g. the Director and Movie tables

Counterfactuals  A variable is a counterfactual cause if a change in its value, changes the value of the result  E.g.  Limitations: disjunctive causes  E.g. A and B are both counterfactual causes of C

Contingencies  Generalize counterfactual causes  A contingency is a hypothetical setting of the endogenous variables that makes a tuple counterfactual A is a cause under the contingency B=0

Responsibility (intuition)  Measures the degree of causality, the contribution of a tuple  A larger contingency, means a tuple has smaller degree of causality  Counterfactual causes have the most contribution (empty contingency set)

Causality for Conjunctive Queries Definition: Causality (contingency) Definition: Responsibility Intuition: If the removal of t removes the answer, then t is counterfactual If there is a set of tuples whose removal makes t counterfactual, t is a cause Intuition: The more tuples that need to be removed, the less important t is (an answer to q)(endogenous tuple)(database) (endogenous tuples)

Example Query: Database: Lineage expression: (Datalog notation) Responsibility: Assume all endogenous NOTE: If is exogenous, is not a cause.

Complexity Results (Data Complexity) dichotomy answersnon-answers

Responsibility: PTIME Queries  Assume conjunctive queries with no self joins  A simple case: The lineage of q will be of the form: What is the responsibility of PTIME

Responsibility: PTIME Queries  More interesting: easy ✔ Intuition: a cut in the graph interrupts the s-t flow. The addition of t re-instantiates it. t becomes counterfactual * * (R tuples)(S tuples)

Responsibility: Hard Queries endogenous If unspecified, it could be either Theorem: The following queries are NP-hard:

Query Dual Hypergraph Query hypergraph Query dual hypergraph Definition: Linear Queries There exists an ordering of the nodes of the dual hypergraph, such that every hyperedge is a consecutive subsequence. Theorem: Computing responsibility for all linear queries is in PTIME. None of these are linear

Weakenings R is exogenous, and therefore its tuples cannot be part of the contingency set Expand R with the domain of z. Responsibility of T tuples is not affected! Dissociation PTIME NP-hard

Responsibility Dichotomy Dichotomy Theorem: (data complexity) If q is weakly linear, then computing responsibility for q is in PTIME If q is not weakly linear, then it is NP- hard Definition: Weakly Linear Queries A query is weakly linear, if there exists a set of weakenings that leads to a linear query

Conclusions  Defined causality and responsibility for conjunctive queries  Complete complexity analysis for CQ without self-joins  Interesting dichotomy result  Non-trivial algorithm for PTIME cases  Open problem:  Self-joins