A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.

1 A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez

2 2 Overview Aggdata

3 3 Overview Windows Azure Marketplace

4 4 motivation and existing works People may want to buy data by asking queries. As stated by Koutris et al. in [Koutris et al., 2012], current pricing schemes have limitations: Assign prices to entire datasets. Assign prices to predefined views, and consumers are restricted to these views. May lead to arbitrage situations. E.g. 10 10- application-free accounts can be used to get 100 applications. In frameworks of [Koutris et al., 2012], [Koutris et al., 2013], [Li et al., 2012] Assign prices to pre-defined views. The price of a query is the price of cheapest set of pre-defined views which can determine the query. (NP-hard)

5 5 Framework In our framework Assign prices to individual tuples. For a query, we track the source tuples contributing to the query result. Each contributing source tuple is charged only once no matter how many times it contributes. provenance Nature of information goods [Balazinska et al., 2011]

6 6 Minimal provenance (provenance) Let Q be a query, D be a database. Q(D) is the query result. A provenance of Q(D) is a set of tuples L in D, such that (minimal provenance) A minimal provenance of Q(D) is a provenance L of Q(D) such that where Lā€™ is a provenance of Q(D).

7 7 Minimal provenance

8 8 Pricing function Pricing setting function maps each tuple in database to its price. Pricing function takes a query as input and returns its price. Properties of pricing function: Contribution monotonicity: if a query uses less source tuples than the other query, the price of the first query should be lower. Contribution arbitrage-freedom: if a query uses less source tuples than a set of queries, the price of the first query should be lower than the sum price of the set of queries. Bounded price: the price of a query is always not higher than the price of source tuples in the involved relations in the query.

9 9 Pricing function The price of a query Q in a database D is defined as the price of the cheapest minimal provenance of Q(D): where is the p-norm of L. Increasing p value decreases the p-norm value. Data seller can use p- norm to adjust prices according to different categories of data consumers.

10 10 Algorithms for price computation We assume that for each result tuple, its set of minimal provenances is available. We aim to find the cheapest minimal provenance of the set of result tuples. We prove that this problem is NP-hard. Exact algorithm: enumerates all the provenances of the query result. (exponential number) choose the cheapest one.

11 11 Approximation algorithms We devise some approximation algorithms. Worst case Khanna et al. prove that the approximability of this problem is a polynomial factor in the size of input. ([Khanna et al., 2000] )

12 12 Approximation algorithms Heuristic 1: choose the cheapest minimal provenance for each individual result tuple independently. (greedy algorithm) Heuristic 2: choose the minimal provenance with the lowest average price for each individual result tuple independently. (greedy algorithm) Heuristic 3: Heuristic 1 but consider previous choices. (semi-greedy) Heuristic 4: Heuristic 2 but consider previous choices. (semi-greedy)

13 13 Experiments Effectiveness: the ratio between approximate price and exact price Efficiency: running time of approximation algorithms.

14 14 Experiments Effectiveness: the ratio between approximate price and exact price Efficiency: running time of approximation algorithms. Set up: Number of result tuples is 10 for measuring effectiveness. (ratio in the worst case is 10) Number of result tuples varies from 1,000 to 5,000 for measuring efficiency. For each result tuple, the number of minimal provenances and the size of each minimal provenance is sampled from [1,5] with uniform distribution.

15 15 Effectiveness 50,000 runs

16 16 Efficiency

17 17 Conclusion We propose a framework for pricing queries based on the source tuples contributed in the query result. The price of a query is the price of the cheapest minimal provenance of the query result. We propose a baseline algorithm to compute the exact price of a query and four heuristics to compute the approximate price of a query. We conduct experiment to show the effectiveness and efficiency of the heuristics.

