Download presentation
Published byKerry Collins Modified over 9 years ago
1
Comparison of data distributions: the power of Goodness-of-Fit Tests
B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy IEEE – NSS 2006 San Diego, October 29-November 5, 2006
2
Goodness of Fit testing
Goodness-of-fit testing is the mathematical foundation for the comparison of data distributions Regression testing Throughout the software life-cycle Online DAQ Monitoring detector behaviour w.r.t. a reference Simulation validation Comparison with experimental data Reconstruction Comparison of reconstructed vs. expected distributions Physics analysis Comparisons of experimental distributions Comparison with theoretical distributions Use cases in experimental physics THEORETICAL DISTRIBUTION SAMPLE ONE-SAMPLE PROBLEM SAMPLE 2 SAMPLE 1 TWO-SAMPLE PROBLEM
3
“A Goodness-of-Fit Statistical Toolkit”
G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): B. Mascialino, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “New developments of the Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2006), 53 (6), to be published
4
GoF algorithms in the Statistical Toolkit
TWO-SAMPLE PROBLEM GoF algorithms in the Statistical Toolkit Binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Tiku test (Cramer-von Mises test in chi-squared approximation) Unbinned distributions Anderson-Darling test Anderson-Darling approximated test Cramer-von Mises test Generalised Girone test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test Tiku test (Cramer-von Mises test in chi-squared approximation) Weighted Kolmogorov-Smirnov test Weighted Cramer-von Mises test
5
Power of GoF tests Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature novel research in statistics (not only in physics data analysis!) Systematic study of all existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit
6
two parent distributions
Method for the evaluation of power The power of a test is the probability of rejecting the null hypothesis correctly N=10000 Monte Carlo replicas Pseudo-experiment: a random drawing of two samples from two parent distributions Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 GoF test Parent distribution 2 Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments
7
Is there any recipe to identify the best test to use?
Analysis cases Data samples drawn from different parent distributions Data samples drawn from the same parent distribution Applying a scale factor Applying a shift Use cases in experimental physics Signal over background “Hot channel”, dead channel etc. Power analysis on a set of reference mathematical distributions Power analysis on some typical physics applications Is there any recipe to identify the best test to use?
8
Parent reference distributions
Uniform Gaussian Double Exponential Cauchy Exponential Contaminated Normal Distribution 2 Contaminated Normal Distribution 1 Left Tailed Pareto α= 1.0 α= 2.0 α= 3.0 α= 4.0
9
Parent Distribution SKEWNESS TAILWEIGHT f12(x) Pareto 4 0.037 1.647
0.076 1.488 f10(x) Pareto 2 0.151 1.351 f9(x) Pareto 1 0.294 1.245 f1(x) Uniform 1.000 1.267 f2(x) Gaussian 1.704 f6(x) Contamined Normal 1 1.991 f3(x) Double Exponential 2.161 f4(x) Cauchy 5.263 f7(x) Contamined Normal 2 1.769 1.693 f5(x) Exponential 4.486 1.883 f8(x) Exponential left tailed 6.050 1.501
10
Compare different distributions Parent1 ≠ Parent2
Unbinned distributions Compare different distributions Parent1 ≠ Parent2
11
The power increases as a function of the sample size
CvM GAUSSIAN vs CN2 PARETO1 vs PARETO2 K AD Medium tailed vs Short tailed vs Empirical power (%) Empirical power (%) W KS Symmetric vs skewed Skewed vs KS WKSB WKSAD CvM WCvM AD W K WCvM Empirical power (%) Symmetric vs Skewed Medium tailed CN1 CN2 WKSB EXPONENTIAL LEFT TAILED vs PARETO1 K Short tailed vs Medium tailed Empirical power (%) W Skewed vs WKSAD
12
The power varies as a function of the parent distributions’ characteristics
Empirical power (%) Samples size = 50 EXPONENTIAL vs PARETO S1 – S2 Samples size = 15 FLAT OTHER DISTRIBUTIONS T1 – T2 POWER CORRELATION COEFFICIENTS S1 – S2 T1 – T2 N 0.409 0.091 0.181 p<0.0001 General recipe p<0.0001
13
Quantitative evaluation of GoF tests power
We propose an alternative quantitative method to evaluate the power of various GoF tests. LINEAR MULTIPLE REGRESSION INCLUDE BOTH PARENT DISTRIBUTIONS’ CHARACTERISATION INCLUDE SAMPLES SIZE p<0.0001 < Standardised coefficients analysis:
14
Compare different distributions Parent1 ≠ Parent2
Binned distributions Compare different distributions Parent1 ≠ Parent2
15
Preliminary results GAUSSIAN CAUCHY CN1 DOUBLE EXPONENTIAL CAUCHY CN1
Sample size = 1000 Number of bins = 20 GAUSSIAN CAUCHY CN1 χ2 = (38.91±0.49) CvM = (92.9 ± 0.26) χ2 = (98.67±0.12) CvM = (100.0 ± 0.0) χ2 = (50.32±0.50) CvM = (99.79 ± 0.05) χ2 = (100.0±0.0) χ2 = (77.72±0.42) CvM = (99.98 ± 0.02) χ2 = (65.04±0.48) CvM = (79.55 ± 0.40) χ2 = (33.23±0.47) CvM = (88.57 ± 0.32) χ2 = (92.83±0.26) CvM = (99.97 ± 0.02) χ2 = (99.95±0.02) DOUBLE EXPONENTIAL CAUCHY CN1 CN2
16
Physics use case
17
Conclusions No clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations first classify the type of the distributions in terms of skewness and tailweight choose the most appropriate test given the type of distributions evaluating the best test by means of the quantitative model proposed Systematic study of the power in progress for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.