Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,

Slides:



Advertisements
Similar presentations
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Advertisements

Statistical Toolkit Power of Goodness-of-Fit tests
Precision validation of Geant4 electromagnetic physics Katsuya Amako, Susanna Guatelli, Vladimir Ivanchenko, Michel Maire, Barbara Mascialino, Koichi Murakami,
Maria Grazia Pia, INFN Genova Geant4 Physics Validation (mostly electromagnetic, but also hadronic…) K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B.
Maria Grazia Pia, INFN Genova PhysicsLists in Geant4 Advanced Examples Geant4.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
STAT 135 LAB 14 TA: Dongmei Li. Hypothesis Testing Are the results of experimental data due to just random chance? Significance tests try to discover.
DISTRIBUTION FITTING.
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Chapter 11: Inference for Distributions
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Geant4-INFN (Genova-LNS) Team Validation of Geant4 electromagnetic and hadronic models against proton data Validation of Geant4 electromagnetic and hadronic.
Nonparametrics and goodness of fit Petter Mostad
M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Comparison of data distributions: the power of Goodness-of-Fit Tests
AM Recitation 2/10/11.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Education 793 Class Notes T-tests 29 October 2003.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
1 Theoretical Physics Experimental Physics Equipment, Observation Gambling: Cards, Dice Fast PCs Random- number generators Monte- Carlo methods Experimental.
Integrated circuit failure times in hours during stress test David Swanick DSES-6070 HV5 Statistical Methods for Reliability Engineering Summer 2008 Professor.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.
Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
STATISTICAL ANALYSIS OF FATIGUE SIMULATION DATA J R Technical Services, LLC Julian Raphael 140 Fairway Drive Abingdon, Virginia.
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Test of Goodness of Fit Lecture 43 Section 14.1 – 14.3 Fri, Apr 8, 2005.
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai The Normal Curve and Univariate Normality PowerPoint.
Environmental Modeling Basic Testing Methods - Statistics II.
Testing Differences in Means (t-tests) Dr. Richard Jackson © Mercer University 2005 All Rights Reserved.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Update on the Goodness of Fit Toolkit
Hypothesis testing. Chi-square test
A Statistical Toolkit for Data Analysis
Data analysis in HEP: a statistical toolkit
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
An update on the Goodness of Fit Statistical Toolkit
Precision validation of Geant4 electromagnetic physics
Statistical Testing Project
Comparison of data distributions: the power of Goodness-of-Fit Tests
Data analysis in HEP: a statistical toolkit
Presentation transcript:

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino, Andreas Pfeiffer, Maria Grazia Pia, Alberto Ribon, Paolo Viarengo

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 The comparison of two data distribution is fundamental in experimental practice Many algorithms are available for the comparison of two data distributions (the two-sample problem) Aim of this study: Aim of this study: compare the algorithms available in statistics literature to select the most appropriate one in every specific case Outline Detector monitoring Detector monitoring (current versus reference data) Simulation validation (experiment versus simulation) Reconstruction versus expectation Regression testing (two versions of the same software) Physics analysis Physics analysis (measurement versus theory, experiment A versus experiment B) Parametric statistics Non-parametric statistics (Goodness-of-Fit testing)

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 The two-sample problem EXAMPLE 1 EXAMPLE 1: binned data Which is the most suitable goodness-of-fit test? EXAMPLE 2 EXAMPLE 2: unbinned data X-ray fluorescence spectrum Dosimetric distribution from a medical LINAC

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 binnedApplies to binned distributions It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 –When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached –Otherwise one could use Yates’ formula Chi-squared test

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test D mn Tests based on the supremum statistics unbinned distributions SUPREMUMSTATISTICS

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Fisz-Cramer-von Mises test k-sample Anderson-Darling test Tests containing a weighting function binned/unbinned distributions EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS QUADRATICSTATISTICS+ WEIGHTING FUNCTION Sum/integral of all the distances

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): October issue.

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Power evaluation N=1000 Monte Carlo replications Confidence Level = 0.05 Pseudoexperiment: a random drawing of two samples from two parent distributions For each test, the p-value computed by the GoF Toolkit derives from analytical calculation of the asymptotic distribution, often depending on the samples sizes. The power of a test is the probability of rejecting the null hypothesis correctly Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Power Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Parent distributions Uniform Gaussian Double exponential Cauchy Exponential Contaminated Normal Distribution 2Contaminated Normal Distribution 1

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Skewness and tailweightParentST f 1 (x) Uniform f 2 (x) Gaussian f 3 (x) Double exponential f 4 (x) Cauchy f 5 (x) Exponential f 6 (x) Contamined normal f 7 (x) Contamined normal SkewnessTailweight

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Power increases as a function of the sample size (analytical calculation of the asymptotic distribution) N sample Power Kolmogorov-Smirnov test CL = 0.05 The “location-scale problem” Case Parent1 = Parent 2 Uniform Normal Exponential Double Exponential Contaminated Normal 1 Contaminated Normal 2 Cauchy small sized samples moderate sized samples

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 The “general shape problem” Distribution1 – Distribution 2KSCVMAD CN2-Normal55.6± ± ±1.1 CN2-CN124.9± ± ±1.6 CN2-Double Exponential37.6± ± ±1.6 T2T2 Case Parent1 ≠ Parent 2 Power Tailweight Distribution 2 CL = 0.05 Kolmogorov-Smirnov Cramér-von Mises Anderson-Darling (S 1 = S 2 = 1) Distribution 1 Double exponential (T 1 = 2.161) A) Symmetric versus symmetric B) Skewed versus symmetric KSKSCVMCVMADAD ~< For very long tailed distributions: KSKSCVMCVMADAD ~~ For short-medium tailed distributions:

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Comparative evaluation of tests Short (T<1.5) Medium (1.5 < T < 2) Long(T>2) S~1S~1S~1S~1KS KS – CVM CVM - AD S>1.5 KS - AD AD CVM - AD Skewness Tailweight 2222 2222 Supremum statistics tests Tests containing a weight function < <

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Results for the data examples Extremely skewed – medium tail ANDERSON-DARLING TEST A 2 =0.085 – p>0.05 Moderate skewed – medium tail KOLMOGOROV-SMIRNOV TEST D=0.27 – p>0.05 X-variable: Ŝ=4 T=1.43 Y-variable: Ŝ=4 T=1.50 X-variable: Ŝ=1.53 T=1.36 Y-variable: Ŝ=1.27 T=1.34 ^ ^ ^ ^ EXAMPLE 1 EXAMPLE 1: binned data EXAMPLE 2 EXAMPLE 2: unbinned data

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Studied several goodness-of-fit tests for location-scale alternatives and general alternatives noThere is no clear winner for all the considered distributions in general To select one test in practice: 1.classify ST 1. first classify the type of the distributions in terms of skewness S and tailweight T 2.most 2. choose the most appropriate test for the classified type of distribution Conclusions Topic still subject to research activity in the domain of statistics