Edith Cohen Google Research Tel Aviv University

Slides:

Advertisements

Similar presentations

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Advertisements

General Linear Model With correlated error terms  =  2 V ≠  2 I.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Fast Algorithms For Hierarchical Range Histogram Constructions

Summarizing Distributed Data Ke Yi HKUST += ?. Small summaries for BIG data  Allow approximate computation with guarantees and small space – save space,

Maximizing the Spread of Influence through a Social Network

1 Summarizing Data using Bottom-k Sketches Edith Cohen AT&T Haim Kaplan Tel Aviv University.

Visual Recognition Tutorial

Approximation Algorithms

Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.

Maximum likelihood (ML)

Distance Queries from Sampled Data: Accurate and Efficient Edith Cohen Microsoft Research.

Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.

Coordinated weighted sampling for estimating aggregates over multiple weight assignments Edith Cohen, AT&T Research Haim Kaplan, Tel Aviv University Shubho.

Estimation for Monotone Sampling: Competitiveness and Customization Edith Cohen Microsoft Research.

On comparison of different approaches to the stability radius calculation Olga Karelkina Department of Mathematics University of Turku MCDM 2011.

Alternative Measures of Risk. The Optimal Risk Measure Desirable Properties for Risk Measure A risk measure maps the whole distribution of one dollar.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Structure-Aware Sampling:

Getting the Most out of Your Sample Edith Cohen Haim Kaplan Tel Aviv University.

Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

The Message Passing Communication Model David Woodruff IBM Almaden.

1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Dr.Theingi Community Medicine

INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.

What you can do with Coordinated Sampling Edith Cohen Microsoft Research SVC Tel Aviv University Haim Kaplan Tel Aviv University.

The Magic of Random Sampling: From Surveys to Big Data

Review: Tree search Initialize the frontier using the starting state

Finding Dense and Connected Subgraphs in Dual Networks

Visual Recognition Tutorial

Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.

Zhu Han University of Houston Thanks for Professor Dan Wang’s slides

Computing and Compressive Sensing in Wireless Sensor Networks

A paper on Join Synopses for Approximate Query Answering

Randomized Algorithms

Streaming & sampling.

Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,

Haim Kaplan and Uri Zwick

Foundations of Data Mining

Distributed Submodular Maximization in Massive Datasets

Sublinear Algorithmic Tools 2

Degree and Eigenvector Centrality

AQUA: Approximate Query Answering

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Random Sampling over Joins Revisited

Randomized Algorithms

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Coverage Approximation Algorithms

OVERVIEW OF LINEAR MODELS

Summarizing Data by Statistics

Range-Efficient Computation of F0 over Massive Data Streams

OVERVIEW OF LINEAR MODELS

Parametric Methods Berlin Chen, 2005 References:

Lecture 15: Least Square Regression Metric Embeddings

Minwise Hashing and Efficient Search

Hash Functions for Network Applications (II)

Chapter 8 Estimation.

Efficient Processing of Top-k Spatial Preference Queries

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

Edith Cohen Google Research Tel Aviv University Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets Edith Cohen Google Research Tel Aviv University

A Monotone Sampling Scheme random seed 𝑢 ~ 𝑈 0,1 Data domain 𝑽(⊂ 𝑅 𝑑 ) 𝒗 Outcome 𝑆(𝒗,𝑢) : function of the data 𝒗 and seed 𝑢 Seed value 𝑢 is available with the outcome 𝑆(𝒗,𝑢) can be interpreted as the set of all data vectors consistent with the outcome and 𝑢 What is monotone sampling/ monotone estimation. Sampling: The randomization is captured by the seed u Monotonicity: Fixing 𝒗, 𝑆(𝒗,𝑢) is non-increasing with 𝑢.

Monotone Estimation Problem (MEP) A monotone sampling scheme (𝑽,𝑆) : Data domain 𝑽(⊂ 𝑅 𝑑 ) Sampling scheme 𝑆:𝑽×[0,1], A nonnegative function 𝑓:𝑽 ≥0 Goal: estimate 𝑓 𝒗 from the sample Data 𝒗 Sample 𝑺 Q: 𝑓(𝒗) ? 𝑓 (𝑺 𝒗,𝑢 ,𝑢) estimator Desired properties of the estimator 𝑓 (𝑆): Unbiased ∀𝒗, 𝑢 1 𝑓 𝑺 𝒗,𝑥 ,𝑥 𝑑𝑥=𝑓 𝑣 (useful with sums of MEPs) Nonnegative keep estimate 𝑓 in the same domain as 𝑓 (Pareto) “optimal” (admissible) any estimator with smaller 𝑣𝑎 𝑟 𝑢∼𝑈 𝑓 𝑆 𝒗,𝑢 has for some 𝒗′, larger 𝑣𝑎 𝑟 𝑢∼𝑈 0, 𝑓 𝑆(𝒗′,𝑢) Unbiase: because we will be using sums of many idependent MEPs Nonnegative: same domain as f Pareto optimal For each v we have different variance. Can not improve for one v without hurting other

Bounds on 𝑓(𝒗) from 𝑆 and 𝑢 Data 𝒗. The lower the seed 𝑢 is, the more we know on 𝒗 and hence on 𝑓(𝒗). 𝑓(𝑣) 𝑢 1 If we do not have convergence to f(v) as u goes to 0 (for all v), we can not hope for an unbiased estimator. If convergence not “fast” enough, we can not bound the variance. Information on 𝑓(𝑣)

Estimators for MEPs Unbiased, Nonnegative, Bounded variance Admissible: “Pareto Optimal” in terms of variance Results preview: Explicit expressions for estimators for any MEP for which such estimator exists Solution is not unique. Consider some estimators with natural properties if we require monotonicity - 𝑓 (𝑆,𝑢) is non-increasing with 𝑢, we get uniqueness Notion of “Competitiveness” of estimators We will come back to this, but first see some applications

MEP applications in data analysis Scalable computation of approximate statistics and queries over large data sets Data is sampled (composable, distributed scheme). Sample is used to estimate statistics/queries expressed as a sum of multiple MEPs Key-value pairs with multiple sets of values (instances) Take coordinated samples of instances. We get a MEP for each key Sketching graph-based influence and similarity functions “Distance” sketch the utility values (relations of node to all others). Get a MEP for each “target” node from sketches of seed nodes Sketching generalized coverage functions Coordinated weighted sample of the “utility” vector of each element. MEP for each item. We give some motivation, explain the relation to coordinated sampling.

Social/Communication data Activity value 𝑣(𝑥) is associated with each key 𝑥=(𝑏,𝑐) (e.g. number of messages, communication from b to c) Take a weighted sample of keys. For example bottom-k (“weighed reservoir”) or PPS (Probability Proportional to Size) For 𝝉>0, iid 𝑢 𝑥 ∼𝑈[0,1]: 𝑥∈𝑆 ↔𝑣 𝑥 ≥ 𝝉 · 𝑢(𝑥) Monday activity (a,b) 40 (f,g) 5 (h,c) 20 (a,z) 10 …… (h,f) 10 (f,s) 10 Monday Sample: (a,b) 40 (a,z) 10 …….. (f,s) 10 With bottom-𝑘, 𝝉 is set to obtain a fixed sample size 𝑘 Without replacement sampling: 𝑣 𝑥 ≥−𝝉 ln 𝑢 𝑥 Fully composable sampling scheme Social data, but it can be any weighted data set of key value pair

Samples of multiple days Coordinated samples: Different values for different days. Each key is sampled with same seed 𝑢(𝑥) in different days Tuesday activity (a,b) 3 (f,g) 5 (g,c) 10 (a,z) 50 …… (s,f) 20 (g,h) 10 Wednesday activity (a,b) 30 (g,c) 5 (h,c) 10 (a,z) 10 …… (b,f) 20 (d,h) 10 Monday activity (a,b) 40 (f,g) 5 (h,c) 20 (a,z) 10 …… (h,f) 10 (f,s) 10 Monday Sample: (a,b) 40 (a,z) 10 …….. (f,s) 10 Tuesday Sample: (g,c) (a,z) 50 …….. (g,h) Wednesday Sample: (a,b) 30 (b,f) 20 …….. (d,h) 10

Matrix view keys × instances In our example: keys 𝑥=(𝑎,𝑏) are user pairs. Instances are days. Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13

Example Statistics Specify a segment of the keys 𝑌⊂𝑋, examples: one user in CA and one in NY apple device to android Queries/Statistics 𝑥∈𝑌 𝑓( 𝑣 1 𝑥 , 𝑣 2 𝑥 ,…, 𝑣 𝑑 𝑥 ) Total communication of segment on Wednesday. 𝑥∈𝑌 𝑣 1 𝑥 𝐿 𝑝 𝑝 distance/Weighted Jaccard change in activity of segment between Friday and Saturday 𝑥∈𝑌 |𝑣 1 (𝑥)− 𝑣 2 (𝑥)| 𝑝 𝐿 𝑝 𝑝 increase/decrease 𝑥∈𝑌 max⁡{0,𝑣 1 (𝑥)− 𝑣 2 (𝑥)} 𝑝 Coverage of segment 𝑌 in days D : 𝑥∈𝑌 max 𝑖∈𝐷 𝑣 𝑖 (𝑥) Average/sum of median/max/min/top-3/concave aggregate of activity values over days D We would like to compute an estimate from the sample

Matrix view keys × instances Coordinated PPS sample 𝜏=100 for all entries Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Can see that sampling say keys and then same set each day will not work well for some functions Tau can be different per entry in general. Here each column (day) is PPS sampled Coordination (using same u for the same key in all instances): A key that is sampled in one instance (has a small u) is more likely to be sampled in others

Estimate sum statistics, one key at a time 𝑥∈𝑌 𝑓(𝒗(𝒙)) Sum over keys 𝑥∈𝑌 of 𝑓(𝒗 𝑥 ), where 𝒗(𝑥)=( 𝑣 1 𝑥 , 𝑣 2 (𝑥)…) For 𝐿 𝑝 distance: 𝑓(𝒗)= |𝑣 1 − 𝑣 2 | 𝑝 Estimate one key at a time: 𝑥∈𝑌 𝑓 (𝑆(𝑣(𝑥)) The estimator for 𝑓(𝒗) is applied to the sample of 𝒗

Easy statistics: Sum over entries Estimate a single entry at a time Example: Total communication of segment 𝑌 on Monday Inverse probability estimate (Horviz Thompson) [HT52]: Sum over sampled 𝑥∈𝑌 of 𝒗 𝒎𝒐𝒏𝒅𝒂𝒚 𝒙 𝒑 𝒎𝒐𝒏𝒅𝒂𝒚 (𝒙) Inclusion Probability 𝒑 𝒎𝒐𝒏𝒅𝒂𝒚 𝒙 can be computed from 𝑣 𝑥 and 𝝉 : 𝑥∈𝑆 ↔𝑣 𝑥 ≥ 𝝉 · 𝑢(𝑥) 𝑝 𝑖 𝑥 = Pr 𝑢∈𝑈 [ 𝑣 𝑖 𝑥 ≥ 𝝉 𝒊 · 𝑢 𝑥 ] To apply this we need to know inclusion probability. When we know v and u, we can compute it. This estimator is unbiased. Let’s see an example.

HT estimator (single-instance) Coordinated PPS sample 𝜏=100 Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 13 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 𝒖 0.33 0.22 0.82 0.14 0.92 0.16 0.77 This is the sampled data (sampled entries marked) randomization on the left.

HT estimator (single-instance) 𝜏=100. Day: Wednesday, Segment: CA-NY Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Green entries are our selection according to query. We only see the sample however. In the sample we have two sampled entries in our selection.

HT estimator for single-instance 𝜏=100. Day: Wednesday, Segment: CA-NY Exact: 43+60+20=123 We (a,b) 43 (g,c) (h,c) 60 (a,z) 24 (h,f) 3 (f,s) 20 (d,h) 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 𝑝=0.43 HT estimate is 0 for keys that are not sampled, 𝑣/𝑝 when key is sampled We have two sampled entries in our selection. HT estimate: 100+100=200 𝑝=0.20

Inverse-Probability (HT) estimator Unbiased: 1−𝑝 𝑥 ⋅0+𝑝 𝑥 𝑓 𝑣 𝑥 𝑝 𝑥 =𝑓(𝑣 𝑥 ) Nonnegative: 𝑣 𝑥 ≥0 so 𝑣 𝑥 𝑝 𝑥 ≥0 Bounded variance (for all 𝒗) Monotone: more information ⇒ higher estimate Optimal: UMVU The unique minimum variance (unbiased, nonnegative, sum) estimator Works when 𝑓 depends on a single entry. What about general 𝑓 ?

Queries involving multiple columns 𝐿 𝑝 𝑝 distance 𝑓(𝒗)= |𝑣 1 − 𝑣 2 | 𝑝 𝑓(𝒗)= max{0,𝑣 1 − 𝑣 2 } 𝑝 𝐿 𝑝 𝑝 increase HT estimate is positive only when we know 𝑓 𝑣 =|𝑣 1 − 𝑣 2 | from the sample. But for 𝑣 2 =0, 𝑣 1 >0 then 𝑓 𝑣 >0 but sample never reveals 𝑓 𝑣 because second entry is never sampled. Thus, HT is biased Even when unbiased, HT may not be optimal. E.g. when 𝑣 1 is sampled and we can deduce from 𝜏 2 and 𝑢 that 𝑣 2 ≤𝑎< 𝑣 1 then we know that 𝑓 𝑣 ≥ v 1 −a. An optimal estimator will use this incomplete information We want estimators with the same nice properties as HT and optimality Even when it does: say v_i>0 we are guaranteed positive probabilities. But still want to do something also when information not complete

Sampled data Coordinated PPS sample 𝜏=100 Su Mo Tu We Th Fr Sa (a,b) 40 30 10 43 55 20 (g,c) 5 4 (h,c) 60 3 2 (a,z) 24 15 7 (h,f) 6 8 (f,s) 100 70 50 (d,h) 13 𝒖 0.33 0.22 0.82 0.16 0.92 0.77 Distance between Wednesday and Thursday on pairs with source “a” Want to estimate 55−43 2 + 8−3 2 + 24−15 2 Lets look at key (a,z), and estimating 𝟐𝟒−𝟏𝟓 𝟐

Information on 𝑓 Fix the data 𝒗. The lower 𝑢 is, the more we know on 𝒗 and on 𝑓 𝒗 = 24−15 2 =81. We plot the lower bound we have on 𝑓(𝒗) as a function of the seed 𝑢. 81 𝑢 1 0.15 0.24

This is a MEP ! Monotone Estimation Problem A monotone sampling scheme (𝑽,𝑆) : Data domain 𝑽(⊂ 𝑅 𝑑 ) here 𝑣 1 , 𝑣 2 ∈𝑅 ≥0 2 Sampling scheme 𝑆:𝑽×[0,1], here 𝑆((𝑣 1 , 𝑣 2 ),𝑢) reveals 𝑣 𝑖 when 𝑣 𝑖 >100 𝑢 A nonnegative function 𝑓:𝑽 ≥0 here 𝑣 1 − 𝑣 2 2 Goal: estimate 𝑓(𝒗): specify an estimator 𝑓 (𝑆,𝑢) that is Unbiased, Nonnegative, Bounded variance, Admissible (optimal) Solution is not unique.

The optimal (admissible) range Cum. Est For 𝑥>𝑢 We see 𝑆(𝑣,𝑢) and 𝑢. We know what 𝑆(𝑣,𝑥) is for all 𝑥>𝑢. Suppose we fixed 𝑀= 𝑢 1 𝑓 𝑆 𝑣,𝑥 ,𝑥 𝑑𝑥

MEP Estimators Order optimal estimators: For an order ≺ on the data domain 𝑽: Any estimator with lower variance on 𝒗, must have higher variance on 𝒛≺𝒗 The L* estimator: The unique admissible monotone estimator Order optimal for: 𝒛≺𝒗⟺𝒇 𝒛 <𝒇(𝒗) 4-variance competitive (soon we define that) The U* estimator: Order optimal for: 𝒛≺𝒗⟺𝒇 𝒛 >𝒇(𝒗) The L* estimator: stays as low as possible in the optimal range (optimizes for the consistent data with lowest f(v). The U* estimator goes as high as possible. Choice of estimator depends on properties we want, possibly depending on typical data distribution. L* is a good default (monotone and competitive)

Variance Competitiveness [CK13] A “worst-case” over data theoretical indicator for estimator quality For each 𝒗, we can consider the minimum 𝐸 𝑢∈𝑈[0,1] [ 𝑓 2 S 𝐯,u ,𝑢 ] attainable by an estimator that is unbiased and nonnegative for all other 𝒗’ We use such “optimal” estimator 𝑓 (𝒗) for 𝒗 as a reference point. An estimator 𝑓 𝑆,𝑢 is c-competitive if for any data 𝒗, the expectation of the square is within a factor c of the minimum possible for 𝒗 (by an unbiased and nonnegative estimator). The L* estimator is 4-competitive, this is tight, the supremum over all MEP is 4. But often is better. For all unbiased nonnegative 𝑔 , 𝐸 𝑢∈𝑈[0,1] [ 𝑓 2 S(𝐯,u) ] ≤𝑐 𝐸 𝑢∈𝑈 0,1 [ 𝑔 2 S(𝐯,u )] The L ∗ estimator is 4-competitive and this is tight. For some MEPs, ratio is 4

Optimal estimator 𝑓 (𝒗) for data 𝒗 (unbiased and nonnegative for all data) The optimal estimates 𝑓 (𝒗) are the negated derivative of the lower hull of the Lower bound function. 𝑓 (𝑆,𝑢) 𝑢 1 Lower Bound function for 𝒗 Lower Hull Intuition: The lower bound guides us on outcome S, how “high” we can go with the estimate, in order to optimize variance for 𝒗 while still being nonnegative on all other consistent data vectors.

The L* estimator We see 𝑆(𝑣,𝑢) and 𝑢. We know what 𝑆(𝑣,𝑥) is for all 𝑥>𝑢 𝑓 𝑆(𝑣,𝑢),𝑢 u − 𝑢 1 𝑓 𝑆(𝑣,𝑥),𝑥 𝑥 2 𝑑𝑥

𝐿 1 estimation example Estimators for 𝑓 v 1 , v 2 = 𝑣 1 − 𝑣 2 Scheme: 𝑣 𝑖 ≥0 is sampled if 𝑣 𝑖 >𝑢 “lower bound” (LB) on 𝑓 0.6,0.2 from 𝑆 and 𝑢 The Lower hull of LB The 𝐿 ∗ , 𝑈 ∗ , and opt for 𝑣 estimators 𝑈 ∗ is optimized for the vector 𝑓 0.6,0.0 (always consistent with 𝑆) 𝐿 ∗ is optimized locally for the vector 𝑓 0.6,𝑢 (consistent vector with smallest 𝑓

𝐿 2 2 estimation example Estimators for 𝑓 v 1 , v 2 = 𝑣 1 − 𝑣 2 2 Scheme: 𝑣 𝑖 ≥0 is sampled if 𝑣 𝑖 >𝑢 “lower bound” (LB) on 𝑓 0.6,0.2 from 𝑆 and 𝑢 The Lower hull of LB The 𝐿 ∗ , 𝑈 ∗ , and opt for 𝑣 estimators 𝑈 ∗ is optimized for the vector 𝑓 0.6,0.0 (always consistent with 𝑆) 𝐿 ∗ is optimized locally for the vector 𝑓 0.6,𝑢 (consistent vector with smallest 𝑓

Summary Defined Monotone Estimation Problems (MEPs) (motivated by coordinated sampling) Derive Pareto optimal (admissible) unbiased and nonnegative estimators (for any MEP when they exist): L* (lower end of range: unique monotone estimator, dominates HT) , U* (upper end of range), Order optimal estimators (optimized for certain data patterns)

Applications Estimators for Euclidean and Manhattan distances from samples [C KDD ‘14] sketch-based closeness similarity in social networks [CDFGGW COSN ‘13] (similarity of the sets of long-range interactions) Sketching generalized coverage functions, including graph-based influence functions [CDPW ’14, C’ 16]

Future Tighter bounds on universal ratio: L* is 4 competitive, can do 3.375 competitive, lower bound is 1.44 competitive. Instance-optimal competitiveness – Give efficient construction for any MEP Multi-dimensional MEPs: Have multiple independent seed (independent samples of “columns”), some initial derivations for 𝑑=2 and coverage and distance functions [CK 12, C 14], but the full picture is missing

L1 distance [C KDD14] Independent / Coordinated PPS sampling #IP flows to a destination in two time periods

L 2 2 distance [C KDD14] Independent/Coordinated PPS sampling Surname occurrences in 2007, 2008 books (Google ngrams)

Thank you!

Coordination of samples Very powerful tool for big data analysis with applications well beyond what [Brewer, Early, Joyce 1972] could envision Locality Sensitive Hashing (LSH) (similar weight vectors have similar samples/sketches) Multi-objective samples (universal samples): A single sample (as small as possible) that provides statistical guarantees for multiple sets of weights. Statistics/Domain queries that span multiple “instances” (Jaccard similarity, 𝐿 𝑝 distances, distinct counts, union size,…) MinHash sketches are a special case with 0/1 weights. Facilitates faster computation of samples. Example: [C’ 97] Sketching/sampling reachability sets and neighborhoods of all nodes in a graph in near-linear time. LSH: Brewer et al did not call it that. Coordination very powerful for other applications: Multi objective, Complex functions, sketching coverage functions, Pokemons changing sizes, think of size as salary or the size of the IP flow between source and destination pairs, or the trading amount of a stock, or the changes in gradients, or