Presentation is loading. Please wait.

Presentation is loading. Please wait.

Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.

Similar presentations


Presentation on theme: "Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012."— Presentation transcript:

1 Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

2 M OTIVATION Data is increasingly sold and bought on the web Websites that sell data: – AggData [www.aggdata.com] – Xignite (financial data) [www.xignite.com] – Gnip (social media) [www.gnip.com] Data marketplace services: – Windows Azure Marketplace (100+ datasets) [datamarket.azure.com] – Infochimps (15,000 datasets) [www.infochimps.com] Query-based pricing customized for buyers 2

3 C URRENT P RICING (1) A fixed price for the whole dataset or for a specific set of views Example: CustomLists – USA Business Database for $399 – Email addresses for $299 – Businesses in WA for $199 Limitations: – Restaurants in WA ? – Businesses in cities with population >100,000 ? 3

4 C URRENT P RICING (2) API Subscriptions (Azure Marketplace, Infochimps) – Allow queries over the data – Pay by number of transactions (page of results) 4

5 I SSUES W ITH P RICING Buyers today need to buy a superset of the data they are interested in Sellers can’t easily anticipate all possible queries that buyers might ask Solution: we need a more flexible pricing scheme, parameterized by queries 5

6 O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 6

7 T HE P RICING F RAMEWORK The seller defines price points (view-price pairs): S = { (V 1,p 1 ), (V 2,p 2 ), … } A buyer can buy any query Q The system will compute price D S (Q) Seller V 1,p 1 V 2,p 2 … Buyer Q(D) ? Pricing System + Database D price D S (Q) 7

8 I NSTANCE -B ASED D ETERMINACY 8 Definition. V = V 1,…,V k determine Q given D, denoted D ⊢ V ↠ Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’) Intuitively, “V 1,…, V k determine Q” means that Q(D) can be answered only from V 1 (D),…,V k (D), without accessing the database instance D

9 A RBITRAGE -F REE Suppose V determines Q and price D (Q) > price D (V). Then, we can 1.buy V(D) for price D (V) 2.compute Q(D) from V(D) 3.now we have answered Q at some price p<price D (Q) Axiom 1. Given D, the pricing function price D (Q) is arbitrage- free if for all views V 1, …, V k and query Q where D ⊢ V 1, …, V k ↠ Q: price D (Q) ≤ price D (V 1 ) + … + price D (V k ) 9

10 D ISCOUNT -F REE The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database A pricing function is discount-free if it is maximal Axiom 2. The pricing function price D (Q) should not offer any other additional discounts except for the explicit price points defined by the seller. 10

11 E XAMPLE : O RIGAMI D ATABASE 11

12 E XAMPLE : O RIGAMI D ATABASE ShapeColorPicture SwanWhite..... SwanYellow..... DragonYellow..... CarYellow..... FishWhite..... ViewPrice V 1 (x,y,z) :- S(x,y,z), x=‘Swan’$2 V 2 (x,y,z) :- S(x,y,z), x=‘Dragon’$2 V 3 (x,y,z) :- S(x,y,z), x=‘Car’$2 V 4 (x,y,z) :- S(x,y,z), x=‘Fish’$2 W 1 (x,y,z) :- S(x,y,z), y=‘White’$3 W 2 (x,y,z) :- S(x,y,z), y=‘Yellow’$3 W 3 (x,y,z) :- S(x,y,z), y=‘Red’$3 Price pointsDatabase S Get all dragon origami for $2 Get all red origami for $3 What is the price of the entire database? Q(x,y,z) :- S(x,y,z) Exhausts the active domain V 1, V 2, V 3, V 4 determine Q: price(Q) ≤ $8 W 1, W 2, W 3 determine Q: price(Q) ≤ $9 price(Q)=$8 12

13 E XAMPLE : O RIGAMI D ATABASE ShapeColorPicture SwanWhite..... SwanYellow..... DragonYellow..... CarYellow..... FishWhite..... What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v) ShapeInstructions Swanfold, cut, fold… Dragoncut, fold, cut,… ColorPaperSpecs White15g/100, $10 Black20g/100, $15 p(σ shape )=$99 p(σ color )=$50 p(σ color )=$5 p(σ shape )=$2 R S T 13

14 O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 14

15 T HE Q UERY P RICING F ORMULA 15 Given: 1.Price points S = {(V 1,p 1 ),…,(V k, p k )} 2.Database instance D 3.Query Q. Compute: price D S (Q) Properties: (a) arbitrage-free, (b) discount-free, (c) price D S (V i )=p i If it exists, we say that the price points are consistent Theorem. (a)The price points are consistent iff p D (V i )=p i for any price point i=1,…,k (b) price D S (Q) = p D (Q) is the unique arbitrage-free, discount-free pricing function that agrees with the price points Method: Consider all subsets of V ={V 1,…,V k } that determine Q Let C be the subset with the minimum price, Σ i p i, for V i in C Define p D (Q) = Σ i p i

16 D ISCUSSION If the result of Q 1 is always a subset of Q 2, should Q 1 be priced less than Q 2 ? No! Example: – V(x,y) :- Fortune500(x,y) Q(x,y) :- Fortune500(x,y), StrongBuyRec(x) – price(Q) >> price(V) We ignore computation costs in our framework – Cost of computing query Q – Q(D)=f(V(D)), but f can be hard to compute 16

17 O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 17

18 D ETERMINACY 18 Definition. [Instance-independent] V determines Q, denoted as V ↠ Q, if: forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’) [Nash, Segoufin, Vianu ‘07] V ↠ Q iff there exists a function f such that Q(D) = f(V(D)) for all D iff for every D, we have that D ⊢ V ↠ Q Definition. [Instance-dependent] V determines Q given D, denoted as D ⊢ V ↠ Q, if: forall D’, if V(D’) = V(D), then Q(D) = Q(D’)

19 C OMPLEXITY O F D ETERMINACY 19 V, Q are UCQV, Q are CQ Instance-independent V ↠ Q Undecidable [NSV ’07] ? Instance- dependent D ⊢ V ↠ Q data coNP-complete [this paper] coNP-complete [this paper] combined Π 2 P [this paper] Π 2 P [this paper] Open Question: is the bound on the combined complexity tight?

20 C OMPLEXITY O F P RICING 20 Corollary. Deciding whether price D S (Q) ≤ k is: Combined complexity [input S, D]: Σ p 2 Data complexity [input D]: coNP-hard Proposition. Pricing is at least as hard as determinacy How do we deal with the hardness of computation?

21 O UTLINE 1.The Pricing Framework 2.The Pricing Formula 3.The Complexity of Pricing 4.Dichotomy and Algorithms for Selections 21

22 R ESTRICTING P RICE P OINTS TO S ELECTIONS A seller can specify only the prices of selection queries of the form σ R.X=a : prices on columns The domain of each column is finite and known to buyers and sellers Price points on selections is how prices are set in most cases today 22

23 D ICHOTOMY T HEOREM 23 Theorem. Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity): (a) price Q S (D) is in PTIME (b) checking whether price Q S (D)≤k is NP-complete PTIME: – Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains] – Q(x 1,…,x k ) :- R 1 (x 1,x 2 ),…,R k (x k,x 1 ) [Cycles] NP-complete: – Q(x) :- R(x,y) [Projections] – Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)

24 A LGORITHM F OR PTIME C ASES 24 The algorithm uses a reduction to maximum flow Edges of finite capacity represent price points A set of edges of finite cost is a cut iff they determine the query Example: – Chain query Q(x,y):-R(x),S(x,y),T(y) X a1a1 a2a2 XY a1a1 b1b1 a2a2 b2b2 a2a2 b2b2 a3a3 b2b2 a4a4 b1b1 Y b1b1 b3b3 Dom(X) = {a 1,a 2,a 3,a 4 } Dom(Y) = {b 1,b 2,b 3 } R S T

25 F LOW G RAPH 25 a4a4 a3a3 a2a2 a1a1 R b1b1 b2b2 b3b3 T b1b1 b2b2 b3b3 S a4a4 a3a3 a2a2 a1a1 X a1a1 a2a2 XY a1a1 b1b1 a2a2 b2b2 a2a2 b2b2 a3a3 b2b2 a4a4 b1b1 Y b1b1 b3b3 R S T A set of edges of finite cost is a cut iff they determine the query

26 C ONCLUSIONS Summary: – The seller sets prices to some views, while the system computes the price of any query – Interesting application of query determinacy – Complexity: dichotomy for CQs w/o self-joins Future Work: – Pricing in the presence of updates – How do we overcome pricing for intractable queries? – Connection of pricing and privacy 26

27 Thank you ! 27


Download ppt "Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012."

Similar presentations


Ads by Google