Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Statistical Toolkit for Data Analysis

Similar presentations


Presentation on theme: "A Statistical Toolkit for Data Analysis"— Presentation transcript:

1 A Statistical Toolkit for Data Analysis
G.A.P.Cirrone, S.Donadio, S.Guatelli, A. Mantero, B.Mascialino, S.Parlati, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo 9th Topical Seminar on Innovative Particle and Radiation Detectors May 2004   Siena, Italy

2 Data analysis in HEP Provide tools for the statistical comparison of distributions in terms of: Equivalent reference distributions; Experimental measurements; Data from reference sources; Functions deriving from theoretical calculations or fits; Detector monitoring in order to check if the behavior is constant in more than one run

3 Applications Validation of Geant4 electromagnetic physics models
Attenuation coefficients, CSDA ranges, Stopping Power, distributions of physics quantities Quantitative comparisons to experimental data and recognised standard references Detector monitoring; Simulation validation; Reconstruction vs. Expectation; Regression testing; Physics analysis; Detector monitoring in order to check if the behavior is constant in more than one run

4 Example of Applications I
Photon mass attenuation coefficient G4Standard G4 LowE NIST Photon beam (Io) Transmitted photons (I) Detector monitoring in order to check if the behavior is constant in more than one run Absorber Materials: Be, Al, Si, Ge, Fe, Cs, Au, Pb, U

5 Example of Applications II
Electron stopping power and CSDA range Detector monitoring in order to check if the behavior is constant in more than one run Absorber Materials: Be, Al, Si, Ge, Fe, Cs, Au, Pb, U

6 GoF statistical toolkit
Qualitative evaluation Quantitative evaluation A project to develop a statistical comparison system Comparison of distributions Detector monitoring in order to check if the behavior is constant in more than one run Goodness of fit testing

7 Software Process guidelines
United Software Development Process, specifically tailored to the project practical guidance and tools from the RUP both rigorous and lightweight mapping onto ISO 15504 Guidance from ISO 15504 Incremental and iterative life cycle model with SPIRAL APPROACH

8 Architectural guidelines
The project adopts a solid architectural approach to offer the functionality and the quality needed by the users to be maintainable over a large time scale to be extensible, to accommodate future evolutions of the requirements Component-based approach to facilitate re-use and integration in different frameworks AIDA adopt a (HEP) standard no dependence on any specific analysis tool

9

10 The algorithms are specialised on the kind of distribution
(binned/unbinned) Every algorithm has been rigorously tested Documentation available :

11 Chi-Squared test Applies to binned distributions
It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached Otherwise one could use Yates formula

12 More sophisticated algorithms
unbinned distributions Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Dmn SUPREMUM STATISTICS

13 More powerful algorithms
unbinned distributions Cramer-von Mises test (Tiku test) Anderson-Darling test TESTS CONTAINING A WEIGHTING FUNCTION These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: binned distributions Fisz-Cramer-von Mises test (Tiku test) k-sample Anderson-Darling test

14 How to decide the power of an algorithm?
A test is considered powerful if the probability of accepting the null hypothesis when null hypothesis is wrong is low 2 Supremum statistics tests Tests containing a weight function < 2 loses information in a test for unbinned distribution by grouping the data into cells (Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n4/5 observations compared to n observations for 2 to attain the same power) Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point. . . . This is now work in progress . . .

15 EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE
User’s point of view Simple user layer Only deal with AIDA objects and choice of comparison algorithm The user is completely shielded from both statistical and computing complexity. STATISTICAL RESULT USER TOOLKIT EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE

16 Results and practical applications
Collaborations with:

17 are statistically comparable with
Microscopic validation of physics NIST Geant4 Standard Geant4 LowE 2N-S= =28 p=1 2N-L= =28 p=1 2N-S=0.373 =28 p=1 2N-L= =28 p=1 2N-S= =28 p=1 2N-L=1.928 =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database Chi-squared test 2N-S= =28 p=1 2N-L=1.928 =28 p=1

18 X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV)
Test beam at Bessy Bepi-Colombo Mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Chi2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Very complex distributions Experimental measurements are comparable with Geant4 simulations Anderson-Darling Ac (95%) =0.752 A.Mantero, M.Bavdaz, A.Owens, A.Peacock, M.G.Pia Simulation of X-ray Fluorescence and Application to Planetary Astrophysics

19 Medical applications in hadron therapy
KOLMOGOROV-SMIRNOV Experimental measurements are comparable with Geant4 simulations DEXP-GEANT4=0.11 p=n.s. Goodman approximation KOLMOGOROV-SMIRNOV 2EXP-GEANT4=3.8 =2 p=n.s. G.A.P.Cirrone, G.Cuttone, S.Donadio, S.Guatelli, S.Lo Nigro, B.Mascialino, M.G.Pia, L.Raffaele, G.M.Sabini Implementation of a new Monte Carlo Simulation Tool for the Development of a proton Therapy Beam Line and Verification of the Related Dose Distributions

20 Conclusions Applications in: HEP, astrophysics, medical physics
This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics


Download ppt "A Statistical Toolkit for Data Analysis"

Similar presentations


Ads by Google