Comparison of data distributions: the power of Goodness-of-Fit Tests

Slides:



Advertisements
Similar presentations
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Advertisements

Statistical Toolkit Power of Goodness-of-Fit tests
Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team
Maria Grazia Pia, INFN Genova Geant4 Physics Validation (mostly electromagnetic, but also hadronic…) K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B.
Maria Grazia Pia, INFN Genova PhysicsLists in Geant4 Advanced Examples Geant4.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Tips for Writing Free Response Questions on the AP Statistics Exam Laura Trojan Cannon School.
Hee Seo, Chan-Hyeung Kim, Lorenzo Moneta, Maria Grazia Pia Hanyang Univ. (Korea), INFN Genova (Italy), CERN (Switzerland) 18 October 2010 Design, development.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,
March 29, 2010 RFI Mitigation Workshop, Groningen The Netherlands 1 Statistics of the Spectral Kurtosis Estimator Gelu M. Nita and Dale E. Gary New Jersey.
DISTRIBUTION FITTING.
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Zhaohua Wu and N. E. Huang:
Chapter 11: Inference for Distributions
Maria Grazia Pia, INFN Genova Geant4 Electromagnetic Validation (mostly electromagnetic, but also a bit of hadronic…) K. Amako, G.A.P. Cirrone, G. Cuttone,
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Validation of the Bremsstrahlung models Susanna Guatelli, Barbara Mascialino, Luciano Pandola, Maria Grazia Pia, Pedro Rodrigues, Andreia Trindade IEEE.
Geant4-INFN (Genova-LNS) Team Validation of Geant4 electromagnetic and hadronic models against proton data Validation of Geant4 electromagnetic and hadronic.
Maria Grazia Pia Systematic validation of Geant4 electromagnetic and hadronic models against proton data Systematic validation of Geant4 electromagnetic.
M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.
Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Designing Monte Carlo Simulation Studies.
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
P. Saracco, M.G. Pia, INFN Genova An exact framework for Uncertainty Quantification in Monte Carlo simulation CHEP 2013 Amsterdam, October 2013 Paolo.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.
Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
STATISTICAL ANALYSIS OF FATIGUE SIMULATION DATA J R Technical Services, LLC Julian Raphael 140 Fairway Drive Abingdon, Virginia.
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
1 ConceptsDescriptionHypothesis TheoryLawsModel organizesurprise validate formalize The Scientific Method.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Brownian Bridge and nonparametric rank tests Olena Kravchuk School of Physical Sciences Department of Mathematics UQ.
Tests of Significance: The Basics BPS chapter 15 © 2006 W.H. Freeman and Company.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
Principles of statistical testing
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Inference ConceptsSlide #1 1-sample Z-test H o :  =  o (where  o = specific value) Statistic: Test Statistic: Assume: –  is known – n is “large” (so.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Tests of Significance: The Basics ESS chapter 15 © 2013 W.H. Freeman and Company.
Essential Statistics Chapter 171 Two-Sample Problems.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Literature Reviews and Research Overview
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Update on the Goodness of Fit Toolkit
A Statistical Toolkit for Data Analysis
Data analysis in HEP: a statistical toolkit
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
An update on the Goodness of Fit Statistical Toolkit
Precision validation of Geant4 electromagnetic physics
Statistical Testing Project
G. A. P. Cirrone1, G. Cuttone1, F. Di Rosa1, S. Guatelli1, A
Comparison of data distributions: the power of Goodness-of-Fit Tests
Data analysis in HEP: a statistical toolkit
Presentation transcript:

Comparison of data distributions: the power of Goodness-of-Fit Tests B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy IEEE – NSS 2006 San Diego, October 29-November 5, 2006

Goodness of Fit testing Goodness-of-fit testing is the mathematical foundation for the comparison of data distributions Regression testing Throughout the software life-cycle Online DAQ Monitoring detector behaviour w.r.t. a reference Simulation validation Comparison with experimental data Reconstruction Comparison of reconstructed vs. expected distributions Physics analysis Comparisons of experimental distributions Comparison with theoretical distributions Use cases in experimental physics THEORETICAL DISTRIBUTION SAMPLE ONE-SAMPLE PROBLEM SAMPLE 2 SAMPLE 1 TWO-SAMPLE PROBLEM

“A Goodness-of-Fit Statistical Toolkit” G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063. B. Mascialino, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “New developments of the Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2006), 53 (6), to be published http://www.ge.infn.it/statisticaltoolkit/

GoF algorithms in the Statistical Toolkit TWO-SAMPLE PROBLEM GoF algorithms in the Statistical Toolkit Binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Tiku test (Cramer-von Mises test in chi-squared approximation) Unbinned distributions Anderson-Darling test Anderson-Darling approximated test Cramer-von Mises test Generalised Girone test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test Tiku test (Cramer-von Mises test in chi-squared approximation) Weighted Kolmogorov-Smirnov test Weighted Cramer-von Mises test

Power of GoF tests Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature novel research in statistics (not only in physics data analysis!) Systematic study of all existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit

two parent distributions Method for the evaluation of power The power of a test is the probability of rejecting the null hypothesis correctly N=10000 Monte Carlo replicas Pseudo-experiment: a random drawing of two samples from two parent distributions Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 GoF test Parent distribution 2 Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments

Is there any recipe to identify the best test to use? Analysis cases Data samples drawn from different parent distributions Data samples drawn from the same parent distribution Applying a scale factor Applying a shift Use cases in experimental physics Signal over background “Hot channel”, dead channel etc. Power analysis on a set of reference mathematical distributions Power analysis on some typical physics applications Is there any recipe to identify the best test to use?

Parent reference distributions Uniform Gaussian Double Exponential Cauchy Exponential Contaminated Normal Distribution 2 Contaminated Normal Distribution 1 Left Tailed Pareto α= 1.0 α= 2.0 α= 3.0 α= 4.0

Parent Distribution SKEWNESS TAILWEIGHT f12(x) Pareto 4 0.037 1.647 0.076 1.488 f10(x) Pareto 2 0.151 1.351 f9(x) Pareto 1 0.294 1.245 f1(x) Uniform 1.000 1.267 f2(x) Gaussian 1.704 f6(x) Contamined Normal 1 1.991 f3(x) Double Exponential 2.161 f4(x) Cauchy 5.263 f7(x) Contamined Normal 2 1.769 1.693 f5(x) Exponential 4.486 1.883 f8(x) Exponential left tailed 6.050 1.501

Compare different distributions Parent1 ≠ Parent2 Unbinned distributions Compare different distributions Parent1 ≠ Parent2

The power increases as a function of the sample size CvM GAUSSIAN vs CN2 PARETO1 vs PARETO2 K AD Medium tailed vs Short tailed vs Empirical power (%) Empirical power (%) W KS Symmetric vs skewed Skewed vs KS WKSB WKSAD CvM WCvM AD W K WCvM Empirical power (%) Symmetric vs Skewed Medium tailed CN1 CN2 WKSB EXPONENTIAL LEFT TAILED vs PARETO1 K Short tailed vs Medium tailed Empirical power (%) W Skewed vs WKSAD

The power varies as a function of the parent distributions’ characteristics Empirical power (%) Samples size = 50 EXPONENTIAL vs PARETO S1 – S2 Samples size = 15 FLAT OTHER DISTRIBUTIONS T1 – T2 POWER CORRELATION COEFFICIENTS S1 – S2 T1 – T2 N 0.409 0.091 0.181 p<0.0001 General recipe p<0.0001

Quantitative evaluation of GoF tests power We propose an alternative quantitative method to evaluate the power of various GoF tests. LINEAR MULTIPLE REGRESSION INCLUDE BOTH PARENT DISTRIBUTIONS’ CHARACTERISATION INCLUDE SAMPLES SIZE p<0.0001 < Standardised coefficients analysis:

Compare different distributions Parent1 ≠ Parent2 Binned distributions Compare different distributions Parent1 ≠ Parent2

Preliminary results GAUSSIAN CAUCHY CN1 DOUBLE EXPONENTIAL CAUCHY CN1 Sample size = 1000 Number of bins = 20 GAUSSIAN CAUCHY CN1 χ2 = (38.91±0.49) CvM = (92.9 ± 0.26) χ2 = (98.67±0.12) CvM = (100.0 ± 0.0) χ2 = (50.32±0.50) CvM = (99.79 ± 0.05) χ2 = (100.0±0.0) χ2 = (77.72±0.42) CvM = (99.98 ± 0.02) χ2 = (65.04±0.48) CvM = (79.55 ± 0.40) χ2 = (33.23±0.47) CvM = (88.57 ± 0.32) χ2 = (92.83±0.26) CvM = (99.97 ± 0.02) χ2 = (99.95±0.02) DOUBLE EXPONENTIAL CAUCHY CN1 CN2

Physics use case

Conclusions No clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations first classify the type of the distributions in terms of skewness and tailweight choose the most appropriate test given the type of distributions evaluating the best test by means of the quantitative model proposed Systematic study of the power in progress for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics