Testing with Alternative Distances

Slides:



Advertisements
Similar presentations
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Advertisements

Statistical Techniques I
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
Hypothesis testing Another judgment method of sampling data.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Chapter 6 Sampling and Sampling Distributions
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Quantum Spectrum Testing Ryan O’Donnell John Wright (CMU)
Estimating the Unseen: Sublinear Statistics Paul Valiant.
Elementary hypothesis testing
Statistical Methods Chichang Jou Tamkang University.
12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.
Chapter 7 Sampling and Sampling Distributions
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
1 ENM 503 Block 1 Algebraic Systems Lesson 4 – Algebraic Methods The Building Blocks - Numbers, Equations, Functions, and other interesting things. Did.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
© Copyright McGraw-Hill 2004
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Week 101 Test on Pairs of Means – Case I Suppose are iid independent of that are iid. Further, suppose that n 1 and n 2 are large or that are known. We.
Estimating standard error using bootstrap
Step 1: Specify a null hypothesis
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
STATISTICAL INFERENCE
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Review 49% of all people in the world are male. Interested whether physics majors are more likely to be male than the general population, you survey 10.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Chapter 4. Inference about Process Quality
On Testing Dynamic Environments
Classification: Logistic Regression
Chapter 6: Sampling Distributions
Ch3: Model Building through Regression
Chapter 5 Joint Probability Distributions and Random Samples
CH 5: Multivariate Methods
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 2 Simple Comparative Experiments
Path Coupling And Approximate Counting
Of Probability & Information Theory
Hypothesis Testing: Hypotheses
Lecture 18: Uniformity Testing Monotonicity Testing
Tests for Two Means – Normal Populations
Chapter 9 Hypothesis Testing.
Inference about the Slope and Intercept
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Differential Privacy and Statistical Inference: A TCS Perspective
ASV Chapters 1 - Sample Spaces and Probabilities
Discrete Event Simulation - 4
Inference about the Slope and Intercept
Gautam Kamath Simons Institute  University of Waterloo
Computing and Statistical Data Analysis / Stat 7
ASV Chapters 1 - Sample Spaces and Probabilities
Chengyuan Yin School of Mathematics
Charles University Charles University STAKAN III
Introduction to Statistics − Day 4
UNIT-4.
12. Principles of Parameter Estimation
Chapter 8 Estimation.
CH2 Time series.
Sequential Learning with Dependency Nets
Presentation transcript:

Testing with Alternative Distances Gautam “G” Kamath FOCS 2017 Workshop: Frontiers in Distribution Testing October 14, 2017 Based on joint works with Jayadev Acharya Cornell Constantinos Daskalakis MIT John Wright MIT

The story so far… Test whether 𝑝=𝑞 versus ℓ 1 𝑝,𝑞 ≥𝜀 ℓ 1 𝑝,𝑞 >𝜀 0< ℓ 1 𝑝,𝑞 ≤𝜀 𝑝=𝑞 Test whether 𝑝=𝑞 versus ℓ 1 𝑝,𝑞 ≥𝜀 Domain of [𝑛] Success probability ≥2/3 Goal: Strongly sublinear sample complexity 𝑂( 𝑛 1−𝛾 ) for some 𝛾>0 Identity testing (samples from 𝑝, known 𝑞) Θ( 𝑛 / 𝜀 2 ) samples [BFFKR’01, P’08, VV’14] Closeness testing (samples from 𝑝, 𝑞) Θ( max { 𝑛 2/3 / 𝜀 4/3 , 𝑛 / 𝜀 2 } ) samples [BFRSW’00, V’11, CDVV’14]

Generalize: Different Distances 𝑝=𝑞 or ℓ 1 𝑝,𝑞 ≥𝜀? 𝑑 1 𝑝,𝑞 ≤ 𝜀 1 or 𝑑 2 𝑝,𝑞 ≥ 𝜀 2 ?

Generalize: Different Distances 𝑑 1 𝑝,𝑞 ≤ 𝜀 1 or 𝑑 2 𝑝,𝑞 ≥ 𝜀 2 ? Are 𝑝 and 𝑞 𝜀 1 -close in 𝑑 1 (.,.), or 𝜀 2 -far in 𝑑 2 (.,.)? Distances of interest: ℓ 1 , ℓ 2 , 𝜒 2 , KL, Hellinger Classic identity testing: 𝜀 1 =0, 𝑑 2 = ℓ 1 Can we characterize sample complexity for each pair of distances? Which distribution distances are sublinearly testable? [DKW’18] Wait, but… why?

Wait, but… why? Tolerance for model misspecification Useful as a proxy in classical testing problems 𝑑 1 as 𝜒 2 distance is useful for composite hypothesis testing Monotonicity, independence, etc. [ADK’15] Other distances are natural in certain testing settings 𝑑 2 as Hellinger distance is sometimes natural in multivariate settings Bayes networks, Markov chains [DP’17,DDG’17] Costis’ talk

Wait, but… why? Tolerance for model misspecification Useful as a proxy in classical testing problems 𝑑 1 as 𝜒 2 distance is useful for composite hypothesis testing Monotonicity, independence, etc. [ADK’15] Other distances are natural in certain testing settings 𝑑 2 as Hellinger distance is sometimes natural in multivariate settings Bayes networks, Markov chains [DP’17,DDG’17] Costis’ talk

Tolerance Is 𝑝 equal to 𝑞, or are they far from each other? ideal model Is 𝑝 equal to 𝑞, or are they far from each other? But why do we know 𝑞 exactly? Models are inexact Measurement errors Imprecisions in nature 𝑝, 𝑞 may be “philosophically” equal, but not literally equal When can we test 𝑑 1 𝑝,𝑞 ≤ 𝜀 1 versus ℓ 1 𝑝,𝑞 ≥𝜀? CLT approximations… Read data point wrong… observed model

Tolerance 𝑑 1 𝑝,𝑞 ≤ 𝜀 1 vs. ℓ 1 𝑝,𝑞 ≥𝜀? ℓ 1 𝑝,𝑞 ≤𝜀/2 vs. ℓ 1 𝑝,𝑞 ≥𝜀? ℓ 1 𝑝,𝑞 >𝜀 𝜀 2 < ℓ 1 𝑝,𝑞 ≤𝜀 ℓ 1 𝑝,𝑞 ≤ 𝜀 2 ℓ 1 𝑝,𝑞 >𝜀 0< ℓ 1 𝑝,𝑞 ≤𝜀 𝑝=𝑞 ℓ 1 𝑝,𝑞 >𝜀 𝜒 2 𝑝,𝑞 > 𝜀 2 /2 ℓ 1 𝑝,𝑞 ≤𝜀 𝑑 1 𝑝,𝑞 ≤ 𝜀 1 vs. ℓ 1 𝑝,𝑞 ≥𝜀? What 𝑑 1 ? How about ℓ 1 ? ℓ 1 𝑝,𝑞 ≤𝜀/2 vs. ℓ 1 𝑝,𝑞 ≥𝜀? No! Θ(𝑛/ log 𝑛 ) samples [VV’10] Chill out, relax… 𝜒 2 -distance: 𝜒 2 𝑝,𝑞 = 𝑖∈Σ 𝑝 𝑖 − 𝑞 𝑖 2 𝑞 𝑖 Cauchy-Schwarz: 𝜒 2 𝑝,𝑞 ≥ ℓ 1 2 (𝑝,𝑞) 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 /4 vs. ℓ 1 𝑝,𝑞 ≥𝜀? Yes! 𝑂( 𝑛 / 𝜀 2 ) samples [ADK’15] 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 /2

Details for a 𝜒 2 -Tolerant Tester Goal: Distinguish (i) 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 2 versus (ii) ℓ 1 2 𝑝,𝑞 ≥ 𝜀 2 Draw 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝑚) samples from 𝑝 (“Poissonization”) 𝑁 𝑖 : number of appearances of symbol 𝑖 𝑁 𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝑚⋅ 𝑝 𝑖 𝑁 𝑖 ’s are now independent! Statistic: 𝑍= 𝑖∈Σ 𝑁 𝑖 −𝑚⋅ 𝑞 𝑖 2 − 𝑁 𝑖 𝑚⋅ 𝑞 𝑖 Acharya, Daskalakis, K. Optimal Testing for Properties of Distributions. NIPS 2015.

Details for a 𝜒 2 -Tolerant Tester Goal: Distinguish (i) 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 2 versus (ii) ℓ 1 2 𝑝,𝑞 ≥ 𝜀 2 Statistic: 𝑍= 𝑖∈[𝑛] 𝑁 𝑖 −𝑚⋅ 𝑞 𝑖 2 − 𝑁 𝑖 𝑚⋅ 𝑞 𝑖 𝑁 𝑖 : # of appearances of 𝑖; 𝑚: # of samples 𝐸 𝑍 =𝑚⋅ 𝜒 2 (𝑝, 𝑞) (i): 𝐸 𝑍 ≤𝑚⋅ 𝜀 2 2 , (ii): 𝐸 𝑍 ≥𝑚⋅ 𝜀 2 Can bound variance of 𝑍 with some work Need to avoid low prob. elements of 𝑞 Either ignore 𝑖 such that 𝑞 𝑖 ≤ 𝜀 2 10𝑛 ; or Mix lightly (𝑂( 𝜀 2 )) with uniform distribution (also in [G’16]) Apply Chebyshev’s inequality Side-Note: Pearson’s 𝜒 2 -test uses statistic 𝑖 𝑁 𝑖 −𝑚 ⋅𝑞 𝑖 2 𝑚⋅ 𝑞 𝑖 Subtracting 𝑁 𝑖 in the numerator gives an unbiased estimator and importantly may hugely decrease variance [Zelterman’87] [VV’14, CDVV’14, DKN’15] Acharya, Daskalakis, K. Optimal Testing for Properties of Distributions. NIPS 2015.

Tolerant Identity Testing Harder (Implicit in [DK’16]) Daskalakis, K., Wright. Which Distribution Distances are Sublinearly Testable? SODA 2018.

Tolerant Testing Takeaways Can handle ℓ 2 or 𝜒 2 tolerance at no additional cost Θ( 𝑛 / 𝜀 2 ) samples KL, Hellinger, or ℓ 1 tolerance are expensive Θ(𝑛/ log 𝑛 ) samples KL result based off hardness of entropy estimation Closeness testing (𝑞 unknown): Even 𝜒 2 tolerance is costly! Only ℓ 2 tolerance is free Proven via hardness of ℓ 1 -tolerant identity testing Since 𝑞 is unknown, 𝜒 2 is no longer a polynomial

Application: Testing for Structure Composite hypothesis testing Test against a class of distributions! 𝑝∈𝐶 versus ℓ 1 𝑝,𝐶 >𝜀 Example: 𝐶 = all monotone distributions 𝑝 𝑖 ’s are monotone non-increasing Others: unimodality, log-concavity, monotone hazard rate, independence All can be tested in Θ 𝑛 / 𝜀 2 samples [ADK’15] Same complexity as vanilla uniformity testing! min 𝑞∈𝐶 ℓ 1 (𝑝,𝑞)

Testing by Learning Goal: Distinguish 𝑝∈𝐶 from ℓ 1 𝑝,𝐶 >𝜀 Learn-then-Test: Learn hypothesis 𝑞∈𝐶 such that 𝑝∈𝐶⇒ 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 /2 (needs cheap “proper learner” in 𝜒 2 ) ℓ 1 𝑝,𝐶 >𝜀⇒ ℓ 1 𝑝,𝑞 >𝜀 (automatic since 𝑞∈𝐶) Perform “tolerant testing” Given sample access to 𝑝 and description of 𝑞, distinguish 𝜒 2 𝑝,𝑞 ≤ 𝜀 2 /2 from ℓ 1 𝑝,𝑞 >𝜀 Tolerant testing (step 2) is 𝑂( 𝑛 / 𝜀 2 ) Naïve approach (using ℓ 1 instead of 𝜒 2 ) would require Ω(𝑛/ log 𝑛 ) Proper learners in 𝜒 2 (step 1)? Claim: This is cheap Naïve approach of using L1 tolerant testing would require n/log n samples. This is why different distances are important!

Hellinger Testing Change 𝑑 2 instead of 𝑑 1 Hellinger distance: 𝐻 2 𝑝,𝑞 = 1 2 𝑖∈[𝑛] 𝑝 𝑖 − 𝑞 𝑖 2 Between linear and quadratic relationship with ℓ 1 𝐻 2 𝑝,𝑞 ≤ ℓ 1 𝑝,𝑞 ≤𝐻(𝑝,𝑞) Natural distance when considering a collection of iid samples Comes up in some multivariate testing problems (Costis @ 2:55) Testing 𝑝=𝑞 vs. 𝐻 𝑝,𝑞 ≥𝜀? Trivial results via ℓ 1 testing Identity: 𝑂( 𝑛 / 𝜀 4 ) samples Closeness: 𝑂(max { 𝑛 2/3 / 𝜀 8/3 , 𝑛 / 𝜀 4 } ) samples

Hellinger Testing But you can do better! Testing 𝑝=𝑞 vs. 𝐻 𝑝,𝑞 ≥𝜀? Trivial results via ℓ 1 testing Identity: 𝑂( 𝑛 / 𝜀 4 ) samples Closeness: 𝑂(max { 𝑛 2/3 / 𝜀 8/3 , 𝑛 / 𝜀 4 } ) samples But you can do better! Identity: Θ( 𝑛 / 𝜀 2 ) samples No extra cost for ℓ 2 or 𝜒 2 tolerance either! Closeness: Θ(min { 𝑛 2/3 / 𝜀 8/3 , 𝑛 3/4 / 𝜀 2 } ) samples LB and previous UB in [DK’16] Similar chi-squared statistics as [ADK’15] and [CDVV’14] Some tweaks and more careful analysis to handle Hellinger Daskalakis, K., Wright. Which Distribution Distances are Sublinearly Testable? SODA 2018.

Miscellanea 𝑝=𝑞 vs. 𝐾𝐿 𝑝,𝑞 ≥𝜀? Trivially impossible, due to ratio between 𝑝 𝑖 and 𝑞 𝑖 𝑝 𝑖 =𝛿, 𝑞 𝑖 =0, 𝛿→0 Upper bounds for Ω(𝑛/ log 𝑛 ) testing problems? i.e., 𝐾𝐿 𝑝,𝑞 ≤ 𝜀 2 /4 vs. ℓ 1 𝑝,𝑞 ≥𝜀? Use estimators mentioned in Jiantao’s talk

Thanks!