Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

Solving connectivity problems parameterized by treewidth in single exponential time Marek Cygan, Marcin Pilipczuk, Michal Pilipczuk Jesper Nederlof, Dagstuhl.
Data Markets in the Cloud: An Opportunity for the Database Community Magdalena Balazinska, Bill Howe, and Dan Suciu University of Washington Project supported.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
NP-Hard Nattee Niparnan.
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
O N G ENERATING A LL M AXIMAL A CYCLIC S UBHYPERGRAPHS WITH P OLYNOMIAL D ELAY Taishin Daigo (Kyushu Inst. of Tech.) Kouichi Hirata (Kyushu Inst. of Tech.)
Efficient Query Evaluation on Probabilistic Databases
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Bundling Equilibrium in Combinatorial Auctions Written by: Presented by: Ron Holzman Rica Gonen Noa Kfir-Dahav Dov Monderer Moshe Tennenholtz.
20081COMMA08 – Toulouse, May 2008 The Computational Complexity of Ideal Semantics I Abstract Argumentation Frameworks Paul E. Dunne Dept. Of Computer Science.
S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
D ATABASE S YSTEMS I A DMIN S TUFF. 2 Mid-term exam Tuesday, Oct 2:30pm Room 3005 (usual room) Closed book No cheating, blah blah No class on Oct.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a1, a2, …, an} and a size s(a)
Chapter 11: Limitations of Algorithmic Power
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 22 Instructor: Paul Beame.
Complexity Issues Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
P ARALLEL S KYLINE Q UERIES Foto Afrati Paraschos Koutris Dan Suciu Jeffrey Ullman University of Washington.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.
A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
1 Relational Algebra and Calculas Chapter 4, Part A.
University of Pittsburgh CS 3150 Page 1 out of 20 Market Equilibrium via a Primal-Dual-Type Algorithm Written By Nikhil R. Devanur, Christos H. Papadimitriou,
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
The Dominating Set and its Parametric Dual  the Dominated Set  Lan Lin prepared for theory group meeting on June 11, 2003.
CSP: Algorithms and Dichotomy Conjecture Andrei A. Bulatov Simon Fraser University.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Algorithms for hard problems WQO theory and applications to parameterized complexity Juris Viksna, 2015.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
University of Texas at El Paso
Richard Anderson Lecture 26 NP-Completeness
NP-Completeness Yin Tat Lee
ICS 353: Design and Analysis of Algorithms
Alternating tree Automata and Parity games
NP-Complete Problems.
Lecture 10: Query Complexity
CSE 6408 Advanced Algorithms.
Presentation transcript:

Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

M OTIVATION Data is increasingly sold and bought on the web Websites that sell data: – AggData [ – Xignite (financial data) [ – Gnip (social media) [ Data marketplace services: – Windows Azure Marketplace (100+ datasets) [datamarket.azure.com] – Infochimps (15,000 datasets) [ Query-based pricing customized for buyers 2

C URRENT P RICING (1) A fixed price for the whole dataset or for a specific set of views Example: CustomLists – USA Business Database for $399 – addresses for $299 – Businesses in WA for $199 Limitations: – Restaurants in WA ? – Businesses in cities with population >100,000 ? 3

C URRENT P RICING (2) API Subscriptions (Azure Marketplace, Infochimps) – Allow queries over the data – Pay by number of transactions (page of results) 4

I SSUES W ITH P RICING Buyers today need to buy a superset of the data they are interested in Sellers can’t easily anticipate all possible queries that buyers might ask Solution: we need a more flexible pricing scheme, parameterized by queries 5

O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 6

T HE P RICING F RAMEWORK The seller defines price points (view-price pairs): S = { (V 1,p 1 ), (V 2,p 2 ), … } A buyer can buy any query Q The system will compute price D S (Q) Seller V 1,p 1 V 2,p 2 … Buyer Q(D) ? Pricing System + Database D price D S (Q) 7

I NSTANCE -B ASED D ETERMINACY 8 Definition. V = V 1,…,V k determine Q given D, denoted D ⊢ V ↠ Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’) Intuitively, “V 1,…, V k determine Q” means that Q(D) can be answered only from V 1 (D),…,V k (D), without accessing the database instance D

A RBITRAGE -F REE Suppose V determines Q and price D (Q) > price D (V). Then, we can 1.buy V(D) for price D (V) 2.compute Q(D) from V(D) 3.now we have answered Q at some price p<price D (Q) Axiom 1. Given D, the pricing function price D (Q) is arbitrage- free if for all views V 1, …, V k and query Q where D ⊢ V 1, …, V k ↠ Q: price D (Q) ≤ price D (V 1 ) + … + price D (V k ) 9

D ISCOUNT -F REE The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database A pricing function is discount-free if it is maximal Axiom 2. The pricing function price D (Q) should not offer any other additional discounts except for the explicit price points defined by the seller. 10

E XAMPLE : O RIGAMI D ATABASE 11

E XAMPLE : O RIGAMI D ATABASE ShapeColorPicture SwanWhite..... SwanYellow..... DragonYellow..... CarYellow..... FishWhite..... ViewPrice V 1 (x,y,z) :- S(x,y,z), x=‘Swan’$2 V 2 (x,y,z) :- S(x,y,z), x=‘Dragon’$2 V 3 (x,y,z) :- S(x,y,z), x=‘Car’$2 V 4 (x,y,z) :- S(x,y,z), x=‘Fish’$2 W 1 (x,y,z) :- S(x,y,z), y=‘White’$3 W 2 (x,y,z) :- S(x,y,z), y=‘Yellow’$3 W 3 (x,y,z) :- S(x,y,z), y=‘Red’$3 Price pointsDatabase S Get all dragon origami for $2 Get all red origami for $3 What is the price of the entire database? Q(x,y,z) :- S(x,y,z) Exhausts the active domain V 1, V 2, V 3, V 4 determine Q: price(Q) ≤ $8 W 1, W 2, W 3 determine Q: price(Q) ≤ $9 price(Q)=$8 12

E XAMPLE : O RIGAMI D ATABASE ShapeColorPicture SwanWhite..... SwanYellow..... DragonYellow..... CarYellow..... FishWhite..... What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v) ShapeInstructions Swanfold, cut, fold… Dragoncut, fold, cut,… ColorPaperSpecs White15g/100, $10 Black20g/100, $15 p(σ shape )=$99 p(σ color )=$50 p(σ color )=$5 p(σ shape )=$2 R S T 13

O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 14

T HE Q UERY P RICING F ORMULA 15 Given: 1.Price points S = {(V 1,p 1 ),…,(V k, p k )} 2.Database instance D 3.Query Q. Compute: price D S (Q) Properties: (a) arbitrage-free, (b) discount-free, (c) price D S (V i )=p i If it exists, we say that the price points are consistent Theorem. (a)The price points are consistent iff p D (V i )=p i for any price point i=1,…,k (b) price D S (Q) = p D (Q) is the unique arbitrage-free, discount-free pricing function that agrees with the price points Method: Consider all subsets of V ={V 1,…,V k } that determine Q Let C be the subset with the minimum price, Σ i p i, for V i in C Define p D (Q) = Σ i p i

D ISCUSSION If the result of Q 1 is always a subset of Q 2, should Q 1 be priced less than Q 2 ? No! Example: – V(x,y) :- Fortune500(x,y) Q(x,y) :- Fortune500(x,y), StrongBuyRec(x) – price(Q) >> price(V) We ignore computation costs in our framework – Cost of computing query Q – Q(D)=f(V(D)), but f can be hard to compute 16

O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 17

D ETERMINACY 18 Definition. [Instance-independent] V determines Q, denoted as V ↠ Q, if: forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’) [Nash, Segoufin, Vianu ‘07] V ↠ Q iff there exists a function f such that Q(D) = f(V(D)) for all D iff for every D, we have that D ⊢ V ↠ Q Definition. [Instance-dependent] V determines Q given D, denoted as D ⊢ V ↠ Q, if: forall D’, if V(D’) = V(D), then Q(D) = Q(D’)

C OMPLEXITY O F D ETERMINACY 19 V, Q are UCQV, Q are CQ Instance-independent V ↠ Q Undecidable [NSV ’07] ? Instance- dependent D ⊢ V ↠ Q data coNP-complete [this paper] coNP-complete [this paper] combined Π 2 P [this paper] Π 2 P [this paper] Open Question: is the bound on the combined complexity tight?

C OMPLEXITY O F P RICING 20 Corollary. Deciding whether price D S (Q) ≤ k is: Combined complexity [input S, D]: Σ p 2 Data complexity [input D]: coNP-hard Proposition. Pricing is at least as hard as determinacy How do we deal with the hardness of computation?

O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 21

R ESTRICTING P RICE P OINTS TO S ELECTIONS A seller can specify only the prices of selection queries of the form σ R.X=a : prices on columns The domain of each column is finite and known to buyers and sellers Price points on selections is how prices are set in most cases today 22

D ICHOTOMY T HEOREM 23 Theorem. Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity): (a) price Q S (D) is in PTIME (b) checking whether price Q S (D)≤k is NP-complete PTIME: – Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains] – Q(x 1,…,x k ) :- R 1 (x 1,x 2 ),…,R k (x k,x 1 ) [Cycles] NP-complete: – Q(x) :- R(x,y) [Projections] – Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)

A LGORITHM F OR PTIME C ASES 24 The algorithm uses a reduction to maximum flow Edges of finite capacity represent price points A set of edges of finite cost is a cut iff they determine the query Example: – Chain query Q(x,y):-R(x),S(x,y),T(y) X a1a1 a2a2 XY a1a1 b1b1 a2a2 b2b2 a2a2 b2b2 a3a3 b2b2 a4a4 b1b1 Y b1b1 b3b3 Dom(X) = {a 1,a 2,a 3,a 4 } Dom(Y) = {b 1,b 2,b 3 } R S T

F LOW G RAPH 25 a4a4 a3a3 a2a2 a1a1 R b1b1 b2b2 b3b3 T b1b1 b2b2 b3b3 S a4a4 a3a3 a2a2 a1a1 X a1a1 a2a2 XY a1a1 b1b1 a2a2 b2b2 a2a2 b2b2 a3a3 b2b2 a4a4 b1b1 Y b1b1 b3b3 R S T A set of edges of finite cost is a cut iff they determine the query

C ONCLUSIONS Summary: – The seller sets prices to some views, while the system computes the price of any query – Interesting application of query determinacy – Complexity: dichotomy for CQs w/o self-joins Future Work: – Pricing in the presence of updates – How do we overcome pricing for intractable queries? – Connection of pricing and privacy 26

Thank you ! 27