Presentation is loading. Please wait.

Presentation is loading. Please wait.

Testing with Alternative Distances

Similar presentations


Presentation on theme: "Testing with Alternative Distances"β€” Presentation transcript:

1 Testing with Alternative Distances
Gautam β€œG” Kamath FOCS 2017 Workshop: Frontiers in Distribution Testing October 14, 2017 Based on joint works with Jayadev Acharya Cornell Constantinos Daskalakis MIT John Wright MIT

2 The story so far… Test whether 𝑝=π‘ž versus β„“ 1 𝑝,π‘ž β‰₯πœ€
β„“ 1 𝑝,π‘ž >πœ€ 0< β„“ 1 𝑝,π‘ž β‰€πœ€ 𝑝=π‘ž Test whether 𝑝=π‘ž versus β„“ 1 𝑝,π‘ž β‰₯πœ€ Domain of [𝑛] Success probability β‰₯2/3 Goal: Strongly sublinear sample complexity 𝑂( 𝑛 1βˆ’π›Ύ ) for some 𝛾>0 Identity testing (samples from 𝑝, known π‘ž) Θ( 𝑛 / πœ€ 2 ) samples [BFFKR’01, P’08, VV’14] Closeness testing (samples from 𝑝, π‘ž) Θ( max { 𝑛 2/3 / πœ€ 4/3 , 𝑛 / πœ€ 2 } ) samples [BFRSW’00, V’11, CDVV’14]

3 Generalize: Different Distances
𝑝=π‘ž or β„“ 1 𝑝,π‘ž β‰₯πœ€? 𝑑 1 𝑝,π‘ž ≀ πœ€ 1 or 𝑑 2 𝑝,π‘ž β‰₯ πœ€ 2 ?

4 Generalize: Different Distances
𝑑 1 𝑝,π‘ž ≀ πœ€ 1 or 𝑑 2 𝑝,π‘ž β‰₯ πœ€ 2 ? Are 𝑝 and π‘ž πœ€ 1 -close in 𝑑 1 (.,.), or πœ€ 2 -far in 𝑑 2 (.,.)? Distances of interest: β„“ 1 , β„“ 2 , πœ’ 2 , KL, Hellinger Classic identity testing: πœ€ 1 =0, 𝑑 2 = β„“ 1 Can we characterize sample complexity for each pair of distances? Which distribution distances are sublinearly testable? [DKW’18] Wait, but… why?

5 Wait, but… why? Tolerance for model misspecification
Useful as a proxy in classical testing problems 𝑑 1 as πœ’ 2 distance is useful for composite hypothesis testing Monotonicity, independence, etc. [ADK’15] Other distances are natural in certain testing settings 𝑑 2 as Hellinger distance is sometimes natural in multivariate settings Bayes networks, Markov chains [DP’17,DDG’17] Costis’ talk

6 Wait, but… why? Tolerance for model misspecification
Useful as a proxy in classical testing problems 𝑑 1 as πœ’ 2 distance is useful for composite hypothesis testing Monotonicity, independence, etc. [ADK’15] Other distances are natural in certain testing settings 𝑑 2 as Hellinger distance is sometimes natural in multivariate settings Bayes networks, Markov chains [DP’17,DDG’17] Costis’ talk

7 Tolerance Is 𝑝 equal to π‘ž, or are they far from each other?
ideal model Is 𝑝 equal to π‘ž, or are they far from each other? But why do we know π‘ž exactly? Models are inexact Measurement errors Imprecisions in nature 𝑝, π‘ž may be β€œphilosophically” equal, but not literally equal When can we test 𝑑 1 𝑝,π‘ž ≀ πœ€ 1 versus β„“ 1 𝑝,π‘ž β‰₯πœ€? CLT approximations… Read data point wrong… observed model

8 Tolerance 𝑑 1 𝑝,π‘ž ≀ πœ€ 1 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€? β„“ 1 𝑝,π‘ž β‰€πœ€/2 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€?
β„“ 1 𝑝,π‘ž >πœ€ πœ€ 2 < β„“ 1 𝑝,π‘ž β‰€πœ€ β„“ 1 𝑝,π‘ž ≀ πœ€ 2 β„“ 1 𝑝,π‘ž >πœ€ 0< β„“ 1 𝑝,π‘ž β‰€πœ€ 𝑝=π‘ž β„“ 1 𝑝,π‘ž >πœ€ πœ’ 2 𝑝,π‘ž > πœ€ 2 /2 β„“ 1 𝑝,π‘ž β‰€πœ€ 𝑑 1 𝑝,π‘ž ≀ πœ€ 1 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€? What 𝑑 1 ? How about β„“ 1 ? β„“ 1 𝑝,π‘ž β‰€πœ€/2 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€? No! Θ(𝑛/ log 𝑛 ) samples [VV’10] Chill out, relax… πœ’ 2 -distance: πœ’ 2 𝑝,π‘ž = π‘–βˆˆΞ£ 𝑝 𝑖 βˆ’ π‘ž 𝑖 π‘ž 𝑖 Cauchy-Schwarz: πœ’ 2 𝑝,π‘ž β‰₯ β„“ 1 2 (𝑝,π‘ž) πœ’ 2 𝑝,π‘ž ≀ πœ€ 2 /4 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€? Yes! 𝑂( 𝑛 / πœ€ 2 ) samples [ADK’15] πœ’ 2 𝑝,π‘ž ≀ πœ€ 2 /2

9 Details for a πœ’ 2 -Tolerant Tester
Goal: Distinguish (i) πœ’ 2 𝑝,π‘ž ≀ πœ€ versus (ii) β„“ 1 2 𝑝,π‘ž β‰₯ πœ€ 2 Draw π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘›(π‘š) samples from 𝑝 (β€œPoissonization”) 𝑁 𝑖 : number of appearances of symbol 𝑖 𝑁 𝑖 ~ π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘› π‘šβ‹… 𝑝 𝑖 𝑁 𝑖 ’s are now independent! Statistic: 𝑍= π‘–βˆˆΞ£ 𝑁 𝑖 βˆ’π‘šβ‹… π‘ž 𝑖 2 βˆ’ 𝑁 𝑖 π‘šβ‹… π‘ž 𝑖 Acharya, Daskalakis, K. Optimal Testing for Properties of Distributions. NIPS 2015.

10 Details for a πœ’ 2 -Tolerant Tester
Goal: Distinguish (i) πœ’ 2 𝑝,π‘ž ≀ πœ€ versus (ii) β„“ 1 2 𝑝,π‘ž β‰₯ πœ€ 2 Statistic: 𝑍= π‘–βˆˆ[𝑛] 𝑁 𝑖 βˆ’π‘šβ‹… π‘ž 𝑖 2 βˆ’ 𝑁 𝑖 π‘šβ‹… π‘ž 𝑖 𝑁 𝑖 : # of appearances of 𝑖; π‘š: # of samples 𝐸 𝑍 =π‘šβ‹… πœ’ 2 (𝑝, π‘ž) (i): 𝐸 𝑍 β‰€π‘šβ‹… πœ€ 2 2 , (ii): 𝐸 𝑍 β‰₯π‘šβ‹… πœ€ 2 Can bound variance of 𝑍 with some work Need to avoid low prob. elements of π‘ž Either ignore 𝑖 such that π‘ž 𝑖 ≀ πœ€ 2 10𝑛 ; or Mix lightly (𝑂( πœ€ 2 )) with uniform distribution (also in [G’16]) Apply Chebyshev’s inequality Side-Note: Pearson’s πœ’ 2 -test uses statistic 𝑖 𝑁 𝑖 βˆ’π‘š β‹…π‘ž 𝑖 2 π‘šβ‹… π‘ž 𝑖 Subtracting 𝑁 𝑖 in the numerator gives an unbiased estimator and importantly may hugely decrease variance [Zelterman’87] [VV’14, CDVV’14, DKN’15] Acharya, Daskalakis, K. Optimal Testing for Properties of Distributions. NIPS 2015.

11 Tolerant Identity Testing
Harder (Implicit in [DK’16]) Daskalakis, K., Wright. Which Distribution Distances are Sublinearly Testable? SODA 2018.

12 Tolerant Testing Takeaways
Can handle β„“ 2 or πœ’ 2 tolerance at no additional cost Θ( 𝑛 / πœ€ 2 ) samples KL, Hellinger, or β„“ 1 tolerance are expensive Θ(𝑛/ log 𝑛 ) samples KL result based off hardness of entropy estimation Closeness testing (π‘ž unknown): Even πœ’ 2 tolerance is costly! Only β„“ 2 tolerance is free Proven via hardness of β„“ 1 -tolerant identity testing Since π‘ž is unknown, πœ’ 2 is no longer a polynomial

13 Application: Testing for Structure
Composite hypothesis testing Test against a class of distributions! π‘βˆˆπΆ versus β„“ 1 𝑝,𝐢 >πœ€ Example: 𝐢 = all monotone distributions 𝑝 𝑖 ’s are monotone non-increasing Others: unimodality, log-concavity, monotone hazard rate, independence All can be tested in Θ 𝑛 / πœ€ 2 samples [ADK’15] Same complexity as vanilla uniformity testing! min π‘žβˆˆπΆ β„“ 1 (𝑝,π‘ž)

14 Testing by Learning Goal: Distinguish π‘βˆˆπΆ from β„“ 1 𝑝,𝐢 >πœ€
Learn-then-Test: Learn hypothesis π‘žβˆˆπΆ such that π‘βˆˆπΆβ‡’ πœ’ 2 𝑝,π‘ž ≀ πœ€ 2 /2 (needs cheap β€œproper learner” in πœ’ 2 ) β„“ 1 𝑝,𝐢 >πœ€β‡’ β„“ 1 𝑝,π‘ž >πœ€ (automatic since π‘žβˆˆπΆ) Perform β€œtolerant testing” Given sample access to 𝑝 and description of π‘ž, distinguish πœ’ 2 𝑝,π‘ž ≀ πœ€ 2 /2 from β„“ 1 𝑝,π‘ž >πœ€ Tolerant testing (step 2) is 𝑂( 𝑛 / πœ€ 2 ) NaΓ―ve approach (using β„“ 1 instead of πœ’ 2 ) would require Ξ©(𝑛/ log 𝑛 ) Proper learners in πœ’ 2 (step 1)? Claim: This is cheap NaΓ―ve approach of using L1 tolerant testing would require n/log n samples. This is why different distances are important!

15 Hellinger Testing Change 𝑑 2 instead of 𝑑 1
Hellinger distance: 𝐻 2 𝑝,π‘ž = 1 2 π‘–βˆˆ[𝑛] 𝑝 𝑖 βˆ’ π‘ž 𝑖 2 Between linear and quadratic relationship with β„“ 1 𝐻 2 𝑝,π‘ž ≀ β„“ 1 𝑝,π‘ž ≀𝐻(𝑝,π‘ž) Natural distance when considering a collection of iid samples Comes up in some multivariate testing problems 2:55) Testing 𝑝=π‘ž vs. 𝐻 𝑝,π‘ž β‰₯πœ€? Trivial results via β„“ 1 testing Identity: 𝑂( 𝑛 / πœ€ 4 ) samples Closeness: 𝑂(max { 𝑛 2/3 / πœ€ 8/3 , 𝑛 / πœ€ 4 } ) samples

16 Hellinger Testing But you can do better!
Testing 𝑝=π‘ž vs. 𝐻 𝑝,π‘ž β‰₯πœ€? Trivial results via β„“ 1 testing Identity: 𝑂( 𝑛 / πœ€ 4 ) samples Closeness: 𝑂(max { 𝑛 2/3 / πœ€ 8/3 , 𝑛 / πœ€ 4 } ) samples But you can do better! Identity: Θ( 𝑛 / πœ€ 2 ) samples No extra cost for β„“ 2 or πœ’ 2 tolerance either! Closeness: Θ(min { 𝑛 2/3 / πœ€ 8/3 , 𝑛 3/4 / πœ€ 2 } ) samples LB and previous UB in [DK’16] Similar chi-squared statistics as [ADK’15] and [CDVV’14] Some tweaks and more careful analysis to handle Hellinger Daskalakis, K., Wright. Which Distribution Distances are Sublinearly Testable? SODA 2018.

17 Miscellanea 𝑝=π‘ž vs. 𝐾𝐿 𝑝,π‘ž β‰₯πœ€?
Trivially impossible, due to ratio between 𝑝 𝑖 and π‘ž 𝑖 𝑝 𝑖 =𝛿, π‘ž 𝑖 =0, 𝛿→0 Upper bounds for Ξ©(𝑛/ log 𝑛 ) testing problems? i.e., 𝐾𝐿 𝑝,π‘ž ≀ πœ€ 2 /4 vs. β„“ 1 𝑝,π‘ž β‰₯πœ€? Use estimators mentioned in Jiantao’s talk

18 Thanks!


Download ppt "Testing with Alternative Distances"

Similar presentations


Ads by Google