University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan Suciu
Forward and Backward Paradigm e.g: query processing, data integration, data mining, clustering, indexing Forward transformations Source data Target data
Source data Target data Forward and Backward Paradigm e.g: query processing, data integration, data mining, clustering, indexing Forward transformations e.g: data cleaning, provenance, causality, data generation, view updates Backward transformations Reverse Data Management (RDM)
The Problem Space of RDM explicit specification implicit specification Target Data Specific data instance, or diffs between versions e.g. before and after a view update Described indirectly, through constraints and statistics e.g. declarative data generation
The Problem Space of RDM explicit specification implicit specification Target Data Source Data Source data needs to be modified in order to achieve the desired effect in the output e.g. view updates No source data is provided as a reference, but needs to be computed from scratch e.g. inverse schema mappings no sourcereference source
The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates modify the source data, to achieve the desired effect, while minimizing side-effects no sourcereference source
The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates Provenance, Causality no sourcereference source trace the source tuples that correspond to the target tuples of interest
The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates Provenance, Causality Constraint-based repair no sourcereference source repair a data instance in order to satisfy a constraint
The Problem Space of RDM explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source
Introducing Reverse What-If Queries explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source Reverse What-If or How-To queries
Hypothetical (What-If) Queries Example from [Balmin et al. VLDB 2000] “An analyst of a brokerage company wants to know what would be the effect on the return of customers’ portfolios if during the last 3 years they had suggested Intel stocks instead of Motorola” How would the target data change, given a change in the source? Change something in the source (hypothesis) Observe the effect in the target forward
Reverse What-If, or How-To queries Modified example: “An analyst wants to figure out how to achieve a 10% return in customer portfolios, with the least number of trades” What is the best hypothetical scenario that achieves the desired outcome? Find changes to the source that achieve the desired effect Declare a desired effect in the target reverse
Example Company reorganization: A company going through financial strain wants to reduce operational costs by 10%, through: lay-offs, salary decreases, or department and project merging, within certain constraints specified by the company’s requirements: any salary decreases should be uniform across employees of the same department, every project should have at least a certain number of employee hours devoted to it, the solution should be achieved with the minimum number of employee reassignments (variables) (constraints) (optimization objective) (constraints)
Declarative Problem Specification Problem constraints Optimization criterion Problem statement query CREATE CONSTRAINT Constr1 AS NOT EXISTS (SELECTok, sum(quant’) AS c FROMLineItem_N GROUP BYok HAVINGc > 100) CREATE OBJECTIVE Obj1 AS SELECT sum(*) FROM (SELECT quant – quant’ FROM LineItem as L1, LineItem_N as L2 WHERE L1.ok = L2.ok, AND L1.pk = L2.pk AND L1.sk = L2.sk) CREATE REPLACEMENT LineItem_N AS(SELECTok, pk, sk, VAR(quant) AS quant’ FROMLineItem) HOW TO minimize(Obj1) SUBJECT TOConstr1 Variable Definitions
How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer
How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer User Input: Support variable, constraint and objective specifications Maintain declarativity
How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer Evaluation requirements: Efficiency!
Evaluation User Input LP/IP transformation LP/IP Solver Map LP/IP solution to data DB How-To answer How-To Evaluation LP reduction 100
Conclusions Reverse Data Management Encompasses many important database problems Harder in general: the inverse of a function is not always a function How-To queries (reverse what-if) Implement optimization problems within a DBMS Plenty of challenges: Declarative input specification Efficient evaluation Optimization (combination of Integer Prog. and DB techniques) Under-specified and over-specified problem handling Solution “stability” and “sensitivity”