Statistical Toolkit Power of Goodness-of-Fit tests

Slides:

Advertisements

Similar presentations

Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit

Advertisements

Maria Grazia Pia, INFN Genova 1 Part IV Geant4 results.

Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.

MSc Dissertation Writing

Assumptions underlying regression analysis

Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team

Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team

Maria Grazia Pia, INFN Genova 1 Publication patterns in HEP computing M. G. Pia 1, T. Basaglia 2, Z. W. Bell 3, P. V. Dressendorfer 4 1 INFN Genova, Genova,

Configuration management

On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach

Maria Grazia Pia Simulation in a Distributed Computing Environment Simulation in a Distributed Computing Environment S. Guatelli 1, A. Mantero 1, P. Mendez.

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

1 COMPARISON BETWEEN PLATO ISODOSE DISTRIBUTION OF A 192 IR SOURCE AND THOSE SIMULATED WITH GEANT4 TOOLKIT F. Foppiano 1, S. Agostinelli 1, S. Garelli.

Precision validation of Geant4 electromagnetic physics Katsuya Amako, Susanna Guatelli, Vladimir Ivanchenko, Michel Maire, Barbara Mascialino, Koichi Murakami,

F. Foppiano, M.G. Pia, M. Piergentili Medical Linac IEEE NSS, October 2004, Rome, Italy

Maria Grazia Pia, INFN Genova Geant4 Physics Validation (mostly electromagnetic, but also hadronic…) K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B.

Maria Grazia Pia Geant4 LowE Workshop 30-31/5/2002 ow Energy e.m. Workshop CERN, May 2002.

Maria Grazia Pia, INFN Genova PhysicsLists in Geant4 Advanced Examples Geant4.

Simulation of X-ray Fluorescence and Application to Planetary Astrophysics A. Mantero, M. Bavdaz, A. Owens, A. Peacock, M. G. Pia IEEE NSS -- Portland,

Maria Grazia Pia, INFN Genova 1 Part V The lesson learned Summary and conclusions.

Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.

Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,

March 29, 2010 RFI Mitigation Workshop, Groningen The Netherlands 1 Statistics of the Spectral Kurtosis Estimator Gelu M. Nita and Dale E. Gary New Jersey.

Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.

Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,

Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team

Maria Grazia Pia, INFN Genova CERN, 26 July 2004 Background of the Project.

1 M.G. Pia et al. The application of GEANT4 simulation code for brachytherapy treatment Maria Grazia Pia INFN Genova, Italy and CERN/IT

Geant4-INFN (Genova-LNS) Team Validation of Geant4 electromagnetic and hadronic models against proton data Validation of Geant4 electromagnetic and hadronic.

Maria Grazia Pia Systematic validation of Geant4 electromagnetic and hadronic models against proton data Systematic validation of Geant4 electromagnetic.

Comparison of data distributions: the power of Goodness-of-Fit Tests

P. Saracco, M.G. Pia, INFN Genova An exact framework for Uncertainty Quantification in Monte Carlo simulation CHEP 2013 Amsterdam, October 2013 Paolo.

IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,

IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,

Maria Grazia Pia Simulation for LHC Radiation Background Optimisation of monitoring detectors and experimental validation Simulation for LHC Radiation.

Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team

Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.

Grid Workload Management Massimo Sgaravatto INFN Padova.

Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.

Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),

Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.

IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,

An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.

Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Precision Validation of Geant4 Electromagnetic Physics Geant4 DNA Project Meeting 26 July 2004, CERN Michela.

Geant4 Training 2006 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki Institute of Physics)

Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.

The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.

Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.

Geant4 Training 2004 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Giuseppe Daquino (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki.

Maria Grazia Pia, INFN Genova and CERN1 Geant4 highlights of relevance for medical physics applications Maria Grazia Pia INFN Genova and CERN.

Uncertainty quantification in generic Monte Carlo Simulation: a mathematical framework How to do it? Abstract: Uncertainty Quantification (UQ) is the capability.

Update on the Goodness of Fit Toolkit

Potential use of JAS/JAIDA etc. SAS J2EE Review

A Statistical Toolkit for Data Analysis

Data analysis in HEP: a statistical toolkit

B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo

Introductory Course PTB, Braunschweig, June 2009

Short Course Siena, 5-6 October 2006

An update on the Goodness of Fit Statistical Toolkit

Short Course IEEE NSS/MIC 2003 Katsuya Amako (KEK) Makoto Asai (SLAC)

Validating a Random Number Generator

Precision validation of Geant4 electromagnetic physics

Statistical Testing Project

G. A. P. Cirrone1, G. Cuttone1, F. Di Rosa1, S. Guatelli1, A

Comparison of data distributions: the power of Goodness-of-Fit Tests

Data analysis in HEP: a statistical toolkit

Presentation transcript:

Statistical Toolkit Power of Goodness-of-Fit tests B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy Fluorescence spectrum of Icelandic Basalt 8.3 keV beam Counts Energy (keV) CHEP 2006 Mumbai, 13-17 February 2006

Historical background… Validation of Geant4 physics models through comparison of simulation vs. experimental data or reference databases Some use cases The test statistics computation concerns the agreement between the two samples’ empirical distribution functions Regression testing Throughout the software life-cycle Online DAQ Monitoring detector behaviour w.r.t. a reference Simulation validation Comparison with experimental data Reconstruction Comparison of reconstructed vs. expected distributions Physics analysis Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) Comparison with theoretical distributions (data vs. Standard Model)

“A Goodness-of-Fit Statistical Toolkit” Releases are publicly downloadable from the web code, documentation etc. Releases are also distributed with LCG Mathematical Libraries Also ported to Java, distributed with JAS G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

Flexible, extensible, maintainable system Vision of the project Basic vision General purpose tool Toolkit approach (choice open to users) Open source product Independent from specific analysis tools Easily usable in analysis and other tools Clearly define scope, objectives Rigorous software process Software quality Flexible, extensible, maintainable system Build on a solid architecture

GoF algorithms (latest public release) Algorithms for binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Algorithms for unbinned distributions Cramer-von Mises test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test

Recent extensions: algorithms Improved tests Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values) Tiku test Cramer-von Mises test in a chi-squared approximation New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Mises various weighting functions available in literature Watson test can be applied in case of cyclic observations, like the Kuiper test In preparation Girone test It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools goal: provide all 2-sample GoF algorithms existing in statistics literature Publication in preparation to describe the new algorithms Software release: March 2006

User Layer Simple user layer Shields the user from the complexity of the underlying algorithms and design Only deal with the user’s analysis objects and choice of comparison algorithm First release: user layer for AIDA analysis objects LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms in response to user requirements Other user layer implementations foreseen easy to add sound architecture decouples the mathematical component and the user’s representation of analysis objects

Power of GoF tests Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature novel research in statistics (not only in physics data analysis!) Systematic study of all existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit

two parent distributions Method for the evaluation of power Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Pseudoexperiment: a random drawing of two samples from two parent distributions N=1000 Monte Carlo replicas Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes

Parent distributions Uniform Gaussian Exponential Double exponential Cauchy Contaminated Normal Distribution 1 Also Breit-Wigner, other distributions being considered Contaminated Normal Distribution 2

Characterization of distributions Skewness Tailweight Parent distribution S T f1(x) Uniform 1 1.267 f2(x) Gaussian 1.704 f3(x) Double exponential 2.161 f4(x) Cauchy 5.263 f5(x) Exponential 4.486 1.883 f6(x) Contamined normal 1 1.991 f7(x) Contamined normal 2 1.769 1.693

General alternative Compare different distributions Unbinned distributions General alternative Compare different distributions Parent1 ≠ Parent2

The power increases as a function of the sample size FLAT vs EXPONENTIAL Sample size Empirical power (%) Symmetric skewed Short tailed Medium tailed Skewed AD K W KS CvM DOUBLE EXPONENTIAL CN1 Simmetric No clear winner CN2

The power increases as a function of the sample size GAUSSIAN vs DOUBLE EXPONENTIAL Samples size Empirical power (%) Very similar distributions Simmetric Medium tailed Long tailed Sample size Empirical power (%) CAUCHY vs EXPONENTIAL Long tailed Medium tailed Symmetric Asymmetric AD GAUSSIAN DOUBLE EXPONENTIAL CvM CN2 skewed KS K W CN1

The power varies as a function of the parent distributions’ characteristics Samples size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Samples size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Samples size = 15 Samples size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS

The power varies as a function of parent distributions’ characteristics Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Sample size = 15 Sample size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS

Comparative evaluation of tests Preliminary Tailweight Short (T<1.5) Medium (1.5 < T < 2) Long (T>2) S~1 KS KS – CVM CVM - AD S>1.5 KS - AD CVM-AD Skewness

Location-scale alternative Same distribution, shifted or scaled Parent1(x) = Parent2 ((x-θ)/τ)

Power increases as a function of sample size Empirical power (%) Sample size EXPONENTIAL θ =0.5, τ = 0.5 Empirical power (%) Sample size θ =0.5, τ = 1.5 EXPONENTIAL CvM KS CvM AD K W KS K W Empirical power (%) Sample size EXPONENTIAL θ =1.0, τ = 0.5 Sample size Empirical power (%) EXPONENTIAL θ =1.0, τ = 1.5 CvM No clear winner KS

Power increases as a function of sample size θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 CN2 CN2 W K Empirical power (%) Empirical power (%) KS CvM AD K W Sample size Sample size θ =1.0, τ = 0.5 θ =1.0, τ = 1.5 CN2 CN2 Empirical power (%) No clear winner Empirical power (%) Sample size Sample size

Power decreases as a function of tailweight Empirical power (%) Tailweight θ =0.5, τ = 1.5 N=10 Tailweight Empirical power (%) θ =0.5, τ = 0.5 N=10 No clear winner KS CvM AD K W Tailweight Empirical power (%) θ =1.0, τ = 0.5 N=10 Tailweight Empirical power (%) θ =1.0, τ = 1.5 N=10 No clear winner

General alternative Compare different distributions Binned distributions General alternative Compare different distributions Parent1 ≠ Parent2

Chi-squared test: POWER % Norm DoubleExp Cauchy CN1 20 63 35 21 31 16 CN2 100 99 55 86 Samples size = 500 Number of bins = 20

Preliminary results No clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations first classify the type of the distributions in terms of skewness and tailweight choose the most appropriate test given the type of distributions Systematic study of the power in progress for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics Publication in preparation

Surprise… Flat Gaussian Exponential KS KS CvM CvM AD AD K K W W KS Inefficiency Flat Inefficiency KS CvM AD K W KS CvM AD K W Gaussian KS CvM AD K W Inefficiency Exponential General alternative, same distributions CL = 95%, expect 5% inefficiency Anderson-Darling exhibits an unexpected inefficiency at low numerosity Not documented anywhere in literature! Limitation of applicability?

Outlook 1-sample GoF tests (comparison w.r.t. a function) Comparison of two/multi-dimensional distributions Systematic study of the power of GoF tests Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Treatment of errors, filtering New release coming soon New papers in preparation Other components beyond GoF? Suggestions are welcome…

Conclusions A novel, complete software toolkit for statistical analysis is being developed rich set of algorithms sound architectural design rigorous software process A systematic study of the power of GoF tests is in progress unexplored area of research Application in various domains Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated The project is open to developers interested in statistical methods

IEEE Transactions on Nuclear Science http://ieeexplore. ieee Prime journal on technology in particle/nuclear physics Review process reorganized about one year ago Associate Editor dedicated to computing papers Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission: http://tns-ieee.manuscriptcentral.com/ Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our “hardware colleagues” have better established publication habits… Further info: Maria.Grazia.Pia@cern.ch