Download presentation
Presentation is loading. Please wait.
Published byStanley Tate Modified over 9 years ago
1
T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington SIGMOD 2013
2
M OTIVATION Data is increasingly sold and bought on the web Websites that sell data: – Xignite (financial) – Gnip (social) Data marketplace services: – Windows Azure Marketplace – Infochimps – Factual – DataMarket 2
3
A P RICING S CENARIO (1) 3 English-German dictionary T PRICING SCHEMES Sell the whole table T for a fixed price Q: translate only the word “thanks” The user pays for redundant information Price per output tuple Q: Does the word “thanks” translate to “Auto” ? An empty result still carries information englishgerman thanksDanke carAuto dayTag roadStrasse RoadWeg ……
4
A P RICING S CENARIO (2) 4 English-German dictionary T Word Frequency Stats UF wordfrequencygenrerank rock0.025music20 pop0.030music10 database0.001science1453 ……..… Current systems do not sell queries that combine datasets Queries issued by a user may have overlapping content Q1: Return all translations to German of top 10 words in the genre “music” Q2: Return all translations to German of top 20 words in the genre “music” englishgerman thanksDanke carAuto dayTag roadStrasse RoadWeg ……
5
H OW T O P RICE D ATA 5 englishgerman thanksDanke carAuto dayTag roadStrasse roadWeg …… English-German dictionary T p(σ T.english=‘thanks’ )=$0.1 p(σ T.english=‘day’ )=$0.1 p(σ T.english=‘road’ )=$0.15 p(σ T.english=‘cat’ )=$0.05 Price points selection queries on single table exhaust the possible values (Col A ) of some attribute A may select on values not in the active domain p(σ T.english=‘car’ )=$0.1 p(σ T.german=‘Auto’ )=$0.5 …
6
Q UERY M ARKET : C ONTRIBUTIONS A formal pricing framework where: – sellers specify a set of price points as selection queries – buyers can purchase any query on the database – the system automatically computes the price of the query Support efficient computation of prices for a large class of SQL queries Support the necessary functionality for a marketplace: – Pricing queries with overlapping information content – Database updates – Revenue sharing among different sellers? 6
7
O UTLINE 1.The Pricing Framework 2.Computing the Price 3.Query History 4.Revenue Sharing 7
8
T HE P RICING F RAMEWORK The seller defines price points (view-price pairs): S = { (V 1,p 1 ), (V 2,p 2 ), … } A buyer can buy any query Q The system will compute price D S (Q) Seller Price points Buyer Q(D) ? Pricing System + Database D price D S (Q) 8 [Koutris et al., PODS 2012]
9
P ROPERTIES OF P RICES Arbitrage-free: Given D, price D (Q) is arbitrage-free if for all views V 1, …, V k that determine Q: price D (Q) ≤ price D (V 1 ) + … + price D (V k ) Discount-free: price D (Q) must not offer additional discounts except for the explicit price points defined by the seller 9 We say that the views V 1,…, V k determine Q if one can compute Q(D) from V 1 (D),…, V k (D) without access to D
10
T HE P RICING F ORMULA 10 Arbitrage-Price: The price of the cheapest set of views from price points S that determine the query Q unique + arbitrage-free + discount-free + agrees with price points A a1a1 AB a1a1 b a2a2 b Table R Table S Col A = { a 1, a 2, a 3 } Col B = { b } price = $1 price = $2price = $3 {σ[R.A=a 1 ], σ[S.B=b] } determines Q cost = 1 + 3 = 4 {σ[R.A=a 1 ], σ[S.A=a 1 ] } also determines Q cost = 1 + 2 = 3 (cheapest possible) Q(y) = R(x),S(x,y)
11
O UTLINE 1.The Pricing Framework 2.Computing the Price 3.Query History 4.Revenue Sharing 11
12
C OMPUTING T HE P RICE 12 The problem of computing the arbitrage price even for SELECT-PROJECT-JOIN queries is coNP-complete For some queries, the price can be computed fast: Selections, joins w/o projection We describe pricing as an Integer Linear Program (ILP) and then use fast ILP solvers (e.g. GLPK, CPLEX) Classes of queries supported: Selections/Projections/Joins Unions User-Defined Functions (UDF) Bundles of queries
13
ILP C ONSTRUCTION (1) 13 Price the query Q(x,y) = R(x), S(x,y) Introduce a {0/1} variable x[attribute,value] for each price point: x[R.A, a 2 ], x[S.A, a 1 ], x[S.B, b], … A a1a1 AB a1a1 b a2a2 b Table R Table S Col A = { a 1, a 2, a 3 } Col B = { b } price = $1 price = $2price = $3
14
ILP C ONSTRUCTION (2) 14 Minimize (independent of the query): price = x[R.A,a 1 ] + x[R.A,a 2 ] + x[R.A,a 3 ] +2x[S.A,a 1 ] + 2x[S.A,a 2 ] + 2x[S.A,a 3 ] +3x[S.B,b] Constraints: (a 1,b) in Q: x[R.A,a 1 ] ≥ 1 x[S.A,a 1 ] + x[S.B,b] ≥ 1 (a 2,b) not in Q: x[R.A,a 2 ] ≥ 1 (a 3,b) not in Q: x[R.A,a 3 ] + x[S.A,a 3 ] + x[S.B,b] ≥ 1 A a1a1 AB a1a1 b a2a2 b Table R Table S Col A = { a 1, a 2, a 3 } Col B = { b } Q(x,y) = R(x), S(x,y)
15
ILP C ONSTRUCTION (3) 15 Projection: Q(y) = R(x), S(x,y) Constraints: (a 1,b) in Q full : x[R.A,a 1 ] ≥ z 1 x[S.A,a 1 ] + x[S.B,b] ≥ z 1 (a 2,b) in Q full : x[R.A,a 2 ] ≥ z 2 x[S.A,a 2 ] + x[S.B,b] ≥ z 2 (b) in Q : z 1 + z 2 ≥ 1 A a1a1 a2a2 AB a1a1 b a2a2 b Table R Table S Col A = { a 1, a 2, a 3 } Col B = { b} New variable for each tuple in Q full
16
Q UERY M ARKET S YSTEM Runs on top of any SQL database Information stored in the database: – Price points are stored in the database in price tables – Keeping track of price tables with an index table The dataset: – English-german translation: T en,gr (w, w’) – English-french translation : T en,fr (w, w’) – UDF to find hashtags : IsHashtag(w) – Word frequency stats : WF(w, genre, frequency, rank) 16
17
P RICE C OMPUTATION (1) 17 Small dataset where columns have size ~ 10 2 selections 2-way joins w/o projections 2-way joins with projections 3-way join
18
P RICE C OMPUTATION (2) 18 Larger dataset where columns have size ~ 10 3 selections 2-way joins w/o projections 2-way joins with projections 3-way join
19
O UTLINE 1.The Pricing Framework 2.Computing the Price 3.Query History 4.Revenue Sharing 19
20
Q UERY H ISTORY A user asks a sequence of queries over time of varying information overlap Q = Q 1, Q 2, …, Q k Experiment with 30 selection/join queries 20 Oblivious pricing: each query priced independently Bundle pricing: each query Q i priced p(Q 1,…,Q i )- p(Q 1,…,Q i-1 ) View pricing: when a query is purchased, the purchased views are free for later queries
21
Q UERY H ISTORY (2) 21
22
V IEW P RICING View Pricing is our proposed strategy: – Computationally efficient – Low storage overhead – Close to optimal (bundle) price View Pricing can be used for dynamic databases: if view V is purchased at some point and then updated, the user pays only an update price 22
23
O UTLINE 1.The Pricing Framework 2.Computing the Price 3.Query History 4.Revenue Sharing 23
24
R EVENUE S HARING How is the revenue shared between sellers if several datasets contribute to the answer? What if the cheapest set of views to determine a query is not unique ? Example: – Q(‘sigmod13’) = isHashtag(‘sigmod13’), isNoun(‘sigmod13’) – Seller 1 prices $1 per entry for isHashtag, so does seller 2 – If both isHashtag, isNoun are false and each costs $1, purchasing either of the entries answers Q 24
25
R EVENUE S HARING : S OLUTION For a seller s, share(s, Q) is the maximum revenue of s over all minimum-cost set of price points that determine Q share(s, Q) can be computed in our framework Solution: split price(Q) among sellers proportionally to their shares Example: – Both shares are $1 – The revenue of each seller will be $0.5, since their shares are equal 25
26
C ONCLUSIONS QueryMarket: the first system that supports pricing a large class of SQL queries within a formal framework We presented solutions to address the requirements of a real-world marketplace Future work includes: – Scaling the price computation (bucketization) – Full SQL Support (aggregates, negation) – Query answering under limited budget 26
27
Thank you ! 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.