Presentation is loading. Please wait.

Presentation is loading. Please wait.

Edith Cohen Google Research Tel Aviv University

Similar presentations


Presentation on theme: "Edith Cohen Google Research Tel Aviv University"β€” Presentation transcript:

1 Edith Cohen Google Research Tel Aviv University
Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets Edith Cohen Google Research Tel Aviv University

2 A Monotone Sampling Scheme
random seed 𝑒 ~ π‘ˆ 0,1 Data domain 𝑽(βŠ‚ 𝑅 𝑑 ) 𝒗 Outcome 𝑆(𝒗,𝑒) : function of the data 𝒗 and seed 𝑒 Seed value 𝑒 is available with the outcome 𝑆(𝒗,𝑒) can be interpreted as the set of all data vectors consistent with the outcome and 𝑒 What is monotone sampling/ monotone estimation. Sampling: The randomization is captured by the seed u Monotonicity: Fixing 𝒗, 𝑆(𝒗,𝑒) is non-increasing with 𝑒.

3 Monotone Estimation Problem (MEP)
A monotone sampling scheme (𝑽,𝑆) : Data domain 𝑽(βŠ‚ 𝑅 𝑑 ) Sampling scheme 𝑆:𝑽×[0,1], A nonnegative function 𝑓:𝑽 β‰₯0 Goal: estimate 𝑓 𝒗 from the sample Data 𝒗 Sample 𝑺 Q: 𝑓(𝒗) ? 𝑓 (𝑺 𝒗,𝑒 ,𝑒) estimator Desired properties of the estimator 𝑓 (𝑆): Unbiased βˆ€π’—, 𝑒 1 𝑓 𝑺 𝒗,π‘₯ ,π‘₯ 𝑑π‘₯=𝑓 𝑣 (useful with sums of MEPs) Nonnegative keep estimate 𝑓 in the same domain as 𝑓 (Pareto) β€œoptimal” (admissible) any estimator with smaller π‘£π‘Ž π‘Ÿ π‘’βˆΌπ‘ˆ 𝑓 𝑆 𝒗,𝑒 has for some 𝒗′, larger π‘£π‘Ž π‘Ÿ π‘’βˆΌπ‘ˆ 0, 𝑓 𝑆(𝒗′,𝑒) Unbiase: because we will be using sums of many idependent MEPs Nonnegative: same domain as f Pareto optimal For each v we have different variance. Can not improve for one v without hurting other

4 Bounds on 𝑓(𝒗) from 𝑆 and 𝑒
Data 𝒗. The lower the seed 𝑒 is, the more we know on 𝒗 and hence on 𝑓(𝒗). 𝑓(𝑣) 𝑒 1 If we do not have convergence to f(v) as u goes to 0 (for all v), we can not hope for an unbiased estimator. If convergence not β€œfast” enough, we can not bound the variance. Information on 𝑓(𝑣)

5 Estimators for MEPs Unbiased, Nonnegative, Bounded variance
Admissible: β€œPareto Optimal” in terms of variance Results preview: Explicit expressions for estimators for any MEP for which such estimator exists Solution is not unique. Consider some estimators with natural properties if we require monotonicity - 𝑓 (𝑆,𝑒) is non-increasing with 𝑒, we get uniqueness Notion of β€œCompetitiveness” of estimators We will come back to this, but first see some applications

6 MEP applications in data analysis
Scalable computation of approximate statistics and queries over large data sets Data is sampled (composable, distributed scheme). Sample is used to estimate statistics/queries expressed as a sum of multiple MEPs Key-value pairs with multiple sets of values (instances) Take coordinated samples of instances. We get a MEP for each key Sketching graph-based influence and similarity functions β€œDistance” sketch the utility values (relations of node to all others). Get a MEP for each β€œtarget” node from sketches of seed nodes Sketching generalized coverage functions Coordinated weighted sample of the β€œutility” vector of each element. MEP for each item. We give some motivation, explain the relation to coordinated sampling.

7 Social/Communication data
Activity value 𝑣(π‘₯) is associated with each key π‘₯=(𝑏,𝑐) (e.g. number of messages, communication from b to c) Take a weighted sample of keys. For example bottom-k (β€œweighed reservoir”) or PPS (Probability Proportional to Size) For 𝝉>0, iid 𝑒 π‘₯ βˆΌπ‘ˆ[0,1]: π‘₯βˆˆπ‘† ↔𝑣 π‘₯ β‰₯ 𝝉 Β· 𝑒(π‘₯) Monday activity (a,b) 40 (f,g) (h,c) 20 (a,z) 10 …… (h,f) 10 (f,s) Monday Sample: (a,b) 40 (a,z) 10 …….. (f,s) 10 With bottom-π‘˜, 𝝉 is set to obtain a fixed sample size π‘˜ Without replacement sampling: 𝑣 π‘₯ β‰₯βˆ’π‰ ln 𝑒 π‘₯ Fully composable sampling scheme Social data, but it can be any weighted data set of key value pair

8 Samples of multiple days
Coordinated samples: Different values for different days. Each key is sampled with same seed 𝑒(π‘₯) in different days Tuesday activity (a,b) 3 (f,g) 5 (g,c) 10 (a,z) 50 …… (s,f) 20 (g,h) 10 Wednesday activity (a,b) 30 (g,c) 5 (h,c) 10 (a,z) 10 …… (b,f) 20 (d,h) 10 Monday activity (a,b) 40 (f,g) (h,c) 20 (a,z) 10 …… (h,f) 10 (f,s) Monday Sample: (a,b) 40 (a,z) 10 …….. (f,s) 10 Tuesday Sample: (g,c) (a,z) 50 …….. (g,h) Wednesday Sample: (a,b) 30 (b,f) 20 …….. (d,h) 10

9 Matrix view keys Γ— instances
In our example: keys π‘₯=(π‘Ž,𝑏) are user pairs. Instances are days. Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13

10 Example Statistics Specify a segment of the keys π‘ŒβŠ‚π‘‹, examples:
one user in CA and one in NY apple device to android Queries/Statistics π‘₯βˆˆπ‘Œ 𝑓( 𝑣 1 π‘₯ , 𝑣 2 π‘₯ ,…, 𝑣 𝑑 π‘₯ ) Total communication of segment on Wednesday. π‘₯βˆˆπ‘Œ 𝑣 1 π‘₯ 𝐿 𝑝 𝑝 distance/Weighted Jaccard change in activity of segment between Friday and Saturday π‘₯βˆˆπ‘Œ |𝑣 1 (π‘₯)βˆ’ 𝑣 2 (π‘₯)| 𝑝 𝐿 𝑝 𝑝 increase/decrease π‘₯βˆˆπ‘Œ max⁑{0,𝑣 1 (π‘₯)βˆ’ 𝑣 2 (π‘₯)} 𝑝 Coverage of segment π‘Œ in days D : π‘₯βˆˆπ‘Œ max π‘–βˆˆπ· 𝑣 𝑖 (π‘₯) Average/sum of median/max/min/top-3/concave aggregate of activity values over days D We would like to compute an estimate from the sample

11 Matrix view keys Γ— instances
Coordinated PPS sample 𝜏=100 for all entries Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Can see that sampling say keys and then same set each day will not work well for some functions Tau can be different per entry in general. Here each column (day) is PPS sampled Coordination (using same u for the same key in all instances): A key that is sampled in one instance (has a small u) is more likely to be sampled in others

12 Estimate sum statistics, one key at a time
π‘₯βˆˆπ‘Œ 𝑓(𝒗(𝒙)) Sum over keys π‘₯βˆˆπ‘Œ of 𝑓(𝒗 π‘₯ ), where 𝒗(π‘₯)=( 𝑣 1 π‘₯ , 𝑣 2 (π‘₯)…) For 𝐿 𝑝 distance: 𝑓(𝒗)= |𝑣 1 βˆ’ 𝑣 2 | 𝑝 Estimate one key at a time: π‘₯βˆˆπ‘Œ 𝑓 (𝑆(𝑣(π‘₯)) The estimator for 𝑓(𝒗) is applied to the sample of 𝒗

13 Easy statistics: Sum over entries Estimate a single entry at a time
Example: Total communication of segment π‘Œ on Monday Inverse probability estimate (Horviz Thompson) [HT52]: Sum over sampled π‘₯βˆˆπ‘Œ of 𝒗 π’Žπ’π’π’…π’‚π’š 𝒙 𝒑 π’Žπ’π’π’…π’‚π’š (𝒙) Inclusion Probability 𝒑 π’Žπ’π’π’…π’‚π’š 𝒙 can be computed from 𝑣 π‘₯ and 𝝉 : π‘₯βˆˆπ‘† ↔𝑣 π‘₯ β‰₯ 𝝉 Β· 𝑒(π‘₯) 𝑝 𝑖 π‘₯ = Pr π‘’βˆˆπ‘ˆ [ 𝑣 𝑖 π‘₯ β‰₯ 𝝉 π’Š Β· 𝑒 π‘₯ ] To apply this we need to know inclusion probability. When we know v and u, we can compute it. This estimator is unbiased. Let’s see an example.

14 HT estimator (single-instance)
Coordinated PPS sample 𝜏=100 Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 13 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 𝒖 0.33 0.22 0.82 0.14 0.92 0.16 0.77 This is the sampled data (sampled entries marked) randomization on the left.

15 HT estimator (single-instance)
𝜏=100. Day: Wednesday, Segment: CA-NY Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Green entries are our selection according to query. We only see the sample however. In the sample we have two sampled entries in our selection.

16 HT estimator for single-instance
𝜏=100. Day: Wednesday, Segment: CA-NY Exact: =123 We (a,b) 43 (g,c) (h,c) 60 (a,z) 24 (h,f) 3 (f,s) 20 (d,h) 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 𝑝=0.43 HT estimate is 0 for keys that are not sampled, 𝑣/𝑝 when key is sampled We have two sampled entries in our selection. HT estimate: =200 𝑝=0.20

17 Inverse-Probability (HT) estimator
Unbiased: 1βˆ’π‘ π‘₯ β‹…0+𝑝 π‘₯ 𝑓 𝑣 π‘₯ 𝑝 π‘₯ =𝑓(𝑣 π‘₯ ) Nonnegative: 𝑣 π‘₯ β‰₯0 so 𝑣 π‘₯ 𝑝 π‘₯ β‰₯0 Bounded variance (for all 𝒗) Monotone: more information β‡’ higher estimate Optimal: UMVU The unique minimum variance (unbiased, nonnegative, sum) estimator Works when 𝑓 depends on a single entry. What about general 𝑓 ?

18 Queries involving multiple columns
𝐿 𝑝 𝑝 distance 𝑓(𝒗)= |𝑣 1 βˆ’ 𝑣 2 | 𝑝 𝑓(𝒗)= max{0,𝑣 1 βˆ’ 𝑣 2 } 𝑝 𝐿 𝑝 𝑝 increase HT estimate is positive only when we know 𝑓 𝑣 =|𝑣 1 βˆ’ 𝑣 2 | from the sample. But for 𝑣 2 =0, 𝑣 1 >0 then 𝑓 𝑣 >0 but sample never reveals 𝑓 𝑣 because second entry is never sampled. Thus, HT is biased Even when unbiased, HT may not be optimal. E.g. when 𝑣 1 is sampled and we can deduce from 𝜏 2 and 𝑒 that 𝑣 2 β‰€π‘Ž< 𝑣 1 then we know that 𝑓 𝑣 β‰₯ v 1 βˆ’a. An optimal estimator will use this incomplete information We want estimators with the same nice properties as HT and optimality Even when it does: say v_i>0 we are guaranteed positive probabilities. But still want to do something also when information not complete

19 Sampled data Coordinated PPS sample 𝜏=100
Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Distance between Wednesday and Thursday on pairs with source β€œa” Want to estimate 55βˆ’ βˆ’ βˆ’15 2 Lets look at key (a,z), and estimating πŸπŸ’βˆ’πŸπŸ“ 𝟐

20 Information on 𝑓 Fix the data 𝒗. The lower 𝑒 is, the more we know on 𝒗 and on 𝑓 𝒗 = 24βˆ’15 2 =81. We plot the lower bound we have on 𝑓(𝒗) as a function of the seed 𝑒. 81 𝑒 1 0.15 0.24

21 This is a MEP ! Monotone Estimation Problem
A monotone sampling scheme (𝑽,𝑆) : Data domain 𝑽(βŠ‚ 𝑅 𝑑 ) here 𝑣 1 , 𝑣 2 βˆˆπ‘… β‰₯0 2 Sampling scheme 𝑆:𝑽×[0,1], here 𝑆((𝑣 1 , 𝑣 2 ),𝑒) reveals 𝑣 𝑖 when 𝑣 𝑖 >100 𝑒 A nonnegative function 𝑓:𝑽 β‰₯0 here 𝑣 1 βˆ’ 𝑣 2 2 Goal: estimate 𝑓(𝒗): specify an estimator 𝑓 (𝑆,𝑒) that is Unbiased, Nonnegative, Bounded variance, Admissible (optimal) Solution is not unique.

22 The optimal (admissible) range
Cum. Est For π‘₯>𝑒 We see 𝑆(𝑣,𝑒) and 𝑒. We know what 𝑆(𝑣,π‘₯) is for all π‘₯>𝑒. Suppose we fixed 𝑀= 𝑒 1 𝑓 𝑆 𝑣,π‘₯ ,π‘₯ 𝑑π‘₯

23 MEP Estimators Order optimal estimators: For an order β‰Ί on the data domain 𝑽: Any estimator with lower variance on 𝒗, must have higher variance on 𝒛≺𝒗 The L* estimator: The unique admissible monotone estimator Order optimal for: π’›β‰Ίπ’—βŸΊπ’‡ 𝒛 <𝒇(𝒗) 4-variance competitive (soon we define that) The U* estimator: Order optimal for: π’›β‰Ίπ’—βŸΊπ’‡ 𝒛 >𝒇(𝒗) The L* estimator: stays as low as possible in the optimal range (optimizes for the consistent data with lowest f(v). The U* estimator goes as high as possible. Choice of estimator depends on properties we want, possibly depending on typical data distribution. L* is a good default (monotone and competitive)

24 Variance Competitiveness [CK13]
A β€œworst-case” over data theoretical indicator for estimator quality For each 𝒗, we can consider the minimum 𝐸 π‘’βˆˆπ‘ˆ[0,1] [ 𝑓 2 S 𝐯,u ,𝑒 ] attainable by an estimator that is unbiased and nonnegative for all other 𝒗’ We use such β€œoptimal” estimator 𝑓 (𝒗) for 𝒗 as a reference point. An estimator 𝑓 𝑆,𝑒 is c-competitive if for any data 𝒗, the expectation of the square is within a factor c of the minimum possible for 𝒗 (by an unbiased and nonnegative estimator). The L* estimator is 4-competitive, this is tight, the supremum over all MEP is 4. But often is better. For all unbiased nonnegative 𝑔 , 𝐸 π‘’βˆˆπ‘ˆ[0,1] [ 𝑓 2 S(𝐯,u) ] ≀𝑐 𝐸 π‘’βˆˆπ‘ˆ 0,1 [ 𝑔 2 S(𝐯,u )] The L βˆ— estimator is 4-competitive and this is tight. For some MEPs, ratio is 4

25 Optimal estimator 𝑓 (𝒗) for data 𝒗 (unbiased and nonnegative for all data)
The optimal estimates 𝑓 (𝒗) are the negated derivative of the lower hull of the Lower bound function. 𝑓 (𝑆,𝑒) 𝑒 1 Lower Bound function for 𝒗 Lower Hull Intuition: The lower bound guides us on outcome S, how β€œhigh” we can go with the estimate, in order to optimize variance for 𝒗 while still being nonnegative on all other consistent data vectors.

26 The L* estimator We see 𝑆(𝑣,𝑒) and 𝑒. We know what 𝑆(𝑣,π‘₯) is for all π‘₯>𝑒 𝑓 𝑆(𝑣,𝑒),𝑒 u βˆ’ 𝑒 1 𝑓 𝑆(𝑣,π‘₯),π‘₯ π‘₯ 2 𝑑π‘₯

27 𝐿 1 estimation example Estimators for 𝑓 v 1 , v 2 = 𝑣 1 βˆ’ 𝑣 2
Scheme: 𝑣 𝑖 β‰₯0 is sampled if 𝑣 𝑖 >𝑒 β€œlower bound” (LB) on 𝑓 0.6, from 𝑆 and 𝑒 The Lower hull of LB The 𝐿 βˆ— , π‘ˆ βˆ— , and opt for 𝑣 estimators π‘ˆ βˆ— is optimized for the vector 𝑓 0.6,0.0 (always consistent with 𝑆) 𝐿 βˆ— is optimized locally for the vector 𝑓 0.6,𝑒 (consistent vector with smallest 𝑓

28 𝐿 2 2 estimation example Estimators for 𝑓 v 1 , v 2 = 𝑣 1 βˆ’ 𝑣 2 2
Scheme: 𝑣 𝑖 β‰₯0 is sampled if 𝑣 𝑖 >𝑒 β€œlower bound” (LB) on 𝑓 0.6, from 𝑆 and 𝑒 The Lower hull of LB The 𝐿 βˆ— , π‘ˆ βˆ— , and opt for 𝑣 estimators π‘ˆ βˆ— is optimized for the vector 𝑓 0.6,0.0 (always consistent with 𝑆) 𝐿 βˆ— is optimized locally for the vector 𝑓 0.6,𝑒 (consistent vector with smallest 𝑓

29 Summary Defined Monotone Estimation Problems (MEPs) (motivated by coordinated sampling) Derive Pareto optimal (admissible) unbiased and nonnegative estimators (for any MEP when they exist): L* (lower end of range: unique monotone estimator, dominates HT) , U* (upper end of range), Order optimal estimators (optimized for certain data patterns)

30 Applications Estimators for Euclidean and Manhattan distances from samples [C KDD β€˜14] sketch-based closeness similarity in social networks [CDFGGW COSN β€˜13] (similarity of the sets of long-range interactions) Sketching generalized coverage functions, including graph-based influence functions [CDPW ’14, C’ 16]

31 Future Tighter bounds on universal ratio: L* is 4 competitive, can do competitive, lower bound is 1.44 competitive. Instance-optimal competitiveness – Give efficient construction for any MEP Multi-dimensional MEPs: Have multiple independent seed (independent samples of β€œcolumns”), some initial derivations for 𝑑=2 and coverage and distance functions [CK 12, C 14], but the full picture is missing

32 L1 distance [C KDD14] Independent / Coordinated PPS sampling
#IP flows to a destination in two time periods

33 L 2 2 distance [C KDD14] Independent/Coordinated PPS sampling
Surname occurrences in 2007, 2008 books (Google ngrams)

34 Thank you!

35 Coordination of samples
Very powerful tool for big data analysis with applications well beyond what [Brewer, Early, Joyce 1972] could envision Locality Sensitive Hashing (LSH) (similar weight vectors have similar samples/sketches) Multi-objective samples (universal samples): A single sample (as small as possible) that provides statistical guarantees for multiple sets of weights. Statistics/Domain queries that span multiple β€œinstances” (Jaccard similarity, 𝐿 𝑝 distances, distinct counts, union size,…) MinHash sketches are a special case with 0/1 weights. Facilitates faster computation of samples. Example: [C’ 97] Sketching/sampling reachability sets and neighborhoods of all nodes in a graph in near-linear time. LSH: Brewer et al did not call it that. Coordination very powerful for other applications: Multi objective, Complex functions, sketching coverage functions, Pokemons changing sizes, think of size as salary or the size of the IP flow between source and destination pairs, or the trading amount of a stock, or the changes in gradients, or


Download ppt "Edith Cohen Google Research Tel Aviv University"

Similar presentations


Ads by Google