A Statistical Toolkit for Data Analysis

Slides:



Advertisements
Similar presentations
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Advertisements

Statistical Toolkit Power of Goodness-of-Fit tests
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team
1 COMPARISON BETWEEN PLATO ISODOSE DISTRIBUTION OF A 192 IR SOURCE AND THOSE SIMULATED WITH GEANT4 TOOLKIT F. Foppiano 1, S. Agostinelli 1, S. Garelli.
Precision validation of Geant4 electromagnetic physics Katsuya Amako, Susanna Guatelli, Vladimir Ivanchenko, Michel Maire, Barbara Mascialino, Koichi Murakami,
Maria Grazia Pia, INFN Genova Geant4 Physics Validation (mostly electromagnetic, but also hadronic…) K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B.
Simulation of X-ray Fluorescence and Application to Planetary Astrophysics A. Mantero, M. Bavdaz, A. Owens, A. Peacock, M. G. Pia IEEE NSS -- Portland,
Maria Grazia Pia, INFN Genova Atomic Relaxation Models A. Mantero, B. Mascialino, Maria Grazia Pia INFN Genova, Italy P. Nieminen ESA/ESTEC
Test Beam Simulation for ESA BepiColombo Mission Marcos Bavdaz, Alfonso Mantero, Barbara Mascialino, Petteri Nieminen, Alan Owens, Tone Peacock, Maria.
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Maria Grazia Pia, INFN Genova CERN, 26 July 2004 Background of the Project.
1 M.G. Pia et al. The application of GEANT4 simulation code for brachytherapy treatment Maria Grazia Pia INFN Genova, Italy and CERN/IT
Maria Grazia Pia, INFN Genova Low Energy Electromagnetic Physics Maria Grazia Pia INFN Genova
Comparison of data distributions: the power of Goodness-of-Fit Tests
Geant4: Electromagnetic Processes 2 V.Ivanchenko, BINP & CERN
Alfonso Mantero, INFN Genova Models for the Simulation of X-Ray Fluorescence and PIXE A. Mantero, S. Saliceti, B. Mascialino, Maria Grazia Pia INFN Genova,
M.G. Pia et al. Brachytherapy at IST Results from an atypical Comparison Project Stefano Agostinelli 1,2, Franca Foppiano 1, Stefania Garelli 1, Matteo.
Simulation – Stat::Fit
OOAD… LowE Electrons From HEP computing to medical research and vice versa Bidirectional From HEP computing to medical research and vice versa Bidirectional.
Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.
Riccardo Capra 1, Stéphane Chauvie 2, Ziad Francis 3, Sebastien Incerti 4, Barbara Mascialino 1, Gerard Montarou 3, Philippe Moretto 4, Petteri Nieminen.
Simple Computer Codes for Evaluating Absorbed Doses in Materials Irradiated by Electron Beams T. Tabata RIAST, Osaka Pref. Univ. The 62nd ONSA Database.
Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.
Technological Transfer from HEP to Medical Physics How precise Brachytherapy MonteCarlo simulations can be applied in Clinics Reality Problem: How to achieve.
Detector Simulation Presentation # 3 Nafisa Tasneem CHEP,KNU  How to do HEP experiment  What is detector simulation?
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Precision Validation of Geant4 Electromagnetic Physics Geant4 DNA Project Meeting 26 July 2004, CERN Michela.
Test Beam Simulation for ESA BepiColombo Mission Marcos Bavdaz, Alfonso Mantero, Barbara Mascialino, Petteri Nieminen, Alan Owens, Tone Peacock, Maria.
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
A Short Course on Geant4 Simulation Toolkit Introduction
1 Transmission Coefficients and Residual Energies of Electrons: PENELOPE Results and Empirical Formulas Tatsuo Tabata and Vadim Moskvin * Osaka Prefecture.
A Study of Reverse MC and Space Charge Effect Simulation with Geant4
Sokhna Bineta Lo Amar Advisor: Prof. Oumar Ka, UCAD
Models for the Simulation of X-Ray Fluorescence and PIXE
Radioactivity – review of laboratory results
Geant4 and its validation
Update on the Goodness of Fit Toolkit
Geant4 REMSIM application
Transient Sources Simulation and “GRBSpectrum”
Test Beam Simulation for ESA BepiColombo Mission
P. Nieminen, E. Daly, A. Mohammadzadeh, H.D.R. Evans, G. Santin
Gamma Ray Satellites Simulations with Geant4
Data analysis in HEP: a statistical toolkit
Basic analysis Process the data validation editing coding data entry
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
Hadronic physics validation of Geant4
Geant4: Electromagnetic Processes 2
An update on the Goodness of Fit Statistical Toolkit
A Short Course on Geant4 Simulation Toolkit Introduction
Geant4 at IST Applications in Brachytherapy
Validating a Random Number Generator
Advanced Examples Alex Howard, Imperial College, UK
Precision validation of Geant4 electromagnetic physics
Statistical Testing Project
Comparison of data distributions: the power of Goodness-of-Fit Tests
Data analysis in HEP: a statistical toolkit
Radioactivity – inverse square law, absorption, and rates
Radioactivity – review of laboratory results
Presentation transcript:

A Statistical Toolkit for Data Analysis G.A.P.Cirrone, S.Donadio, S.Guatelli, A. Mantero, B.Mascialino, S.Parlati, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo 9th Topical Seminar on Innovative Particle and Radiation Detectors 23 - 26 May 2004   Siena, Italy

Data analysis in HEP Provide tools for the statistical comparison of distributions in terms of: Equivalent reference distributions; Experimental measurements; Data from reference sources; Functions deriving from theoretical calculations or fits; Detector monitoring in order to check if the behavior is constant in more than one run

Applications Validation of Geant4 electromagnetic physics models Attenuation coefficients, CSDA ranges, Stopping Power, distributions of physics quantities Quantitative comparisons to experimental data and recognised standard references Detector monitoring; Simulation validation; Reconstruction vs. Expectation; Regression testing; Physics analysis; Detector monitoring in order to check if the behavior is constant in more than one run

Example of Applications I Photon mass attenuation coefficient G4Standard G4 LowE NIST Photon beam (Io) Transmitted photons (I) Detector monitoring in order to check if the behavior is constant in more than one run Absorber Materials: Be, Al, Si, Ge, Fe, Cs, Au, Pb, U

Example of Applications II Electron stopping power and CSDA range Detector monitoring in order to check if the behavior is constant in more than one run Absorber Materials: Be, Al, Si, Ge, Fe, Cs, Au, Pb, U

GoF statistical toolkit Qualitative evaluation Quantitative evaluation A project to develop a statistical comparison system Comparison of distributions Detector monitoring in order to check if the behavior is constant in more than one run Goodness of fit testing

Software Process guidelines United Software Development Process, specifically tailored to the project practical guidance and tools from the RUP both rigorous and lightweight mapping onto ISO 15504 Guidance from ISO 15504 Incremental and iterative life cycle model with SPIRAL APPROACH

Architectural guidelines The project adopts a solid architectural approach to offer the functionality and the quality needed by the users to be maintainable over a large time scale to be extensible, to accommodate future evolutions of the requirements Component-based approach to facilitate re-use and integration in different frameworks AIDA adopt a (HEP) standard no dependence on any specific analysis tool

The algorithms are specialised on the kind of distribution (binned/unbinned) Every algorithm has been rigorously tested Documentation available : http://www.ge.infn.it/geant4/analysis/HEPstatistics/

Chi-Squared test Applies to binned distributions It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached Otherwise one could use Yates formula

More sophisticated algorithms unbinned distributions Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Dmn SUPREMUM STATISTICS

More powerful algorithms unbinned distributions Cramer-von Mises test (Tiku test) Anderson-Darling test TESTS CONTAINING A WEIGHTING FUNCTION These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: binned distributions Fisz-Cramer-von Mises test (Tiku test) k-sample Anderson-Darling test

How to decide the power of an algorithm? A test is considered powerful if the probability of accepting the null hypothesis when null hypothesis is wrong is low 2 Supremum statistics tests Tests containing a weight function < 2 loses information in a test for unbinned distribution by grouping the data into cells (Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n4/5 observations compared to n observations for 2 to attain the same power) Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point. . . . This is now work in progress . . .

EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE User’s point of view Simple user layer Only deal with AIDA objects and choice of comparison algorithm The user is completely shielded from both statistical and computing complexity. STATISTICAL RESULT USER TOOLKIT EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE

Results and practical applications Collaborations with:

are statistically comparable with Microscopic validation of physics NIST Geant4 Standard Geant4 LowE 2N-S=0.267 =28 p=1 2N-L=1.315 =28 p=1 2N-S=0.373 =28 p=1 2N-L= 5.882 =28 p=1 2N-S=0.532 =28 p=1 2N-L=1.928 =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database http://www.nist.gov) Chi-squared test 2N-S=0.532 =28 p=1 2N-L=1.928 =28 p=1

X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Test beam at Bessy Bepi-Colombo Mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Chi2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Very complex distributions Experimental measurements are comparable with Geant4 simulations Anderson-Darling Ac (95%) =0.752 A.Mantero, M.Bavdaz, A.Owens, A.Peacock, M.G.Pia Simulation of X-ray Fluorescence and Application to Planetary Astrophysics

Medical applications in hadron therapy KOLMOGOROV-SMIRNOV Experimental measurements are comparable with Geant4 simulations DEXP-GEANT4=0.11 p=n.s. Goodman approximation KOLMOGOROV-SMIRNOV 2EXP-GEANT4=3.8 =2 p=n.s. G.A.P.Cirrone, G.Cuttone, S.Donadio, S.Guatelli, S.Lo Nigro, B.Mascialino, M.G.Pia, L.Raffaele, G.M.Sabini Implementation of a new Monte Carlo Simulation Tool for the Development of a proton Therapy Beam Line and Verification of the Related Dose Distributions

Conclusions Applications in: HEP, astrophysics, medical physics This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics