Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Slides:



Advertisements
Similar presentations
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Advertisements

Statistical Toolkit Power of Goodness-of-Fit tests
Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team
Maria Grazia Pia, INFN Genova Geant4 Physics Validation (mostly electromagnetic, but also hadronic…) K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B.
Maria Grazia Pia Geant4 LowE Workshop 30-31/5/2002 ow Energy e.m. Workshop CERN, May 2002.
Maria Grazia Pia, INFN Genova PhysicsLists in Geant4 Advanced Examples Geant4.
Hee Seo, Chan-Hyeung Kim, Lorenzo Moneta, Maria Grazia Pia Hanyang Univ. (Korea), INFN Genova (Italy), CERN (Switzerland) 18 October 2010 Design, development.
Maria Grazia Pia, INFN Genova 1 Part V The lesson learned Summary and conclusions.
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Iterative development and The Unified process
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Maria Grazia Pia, INFN Genova CERN, 26 July 2004 Background of the Project.
1 M.G. Pia et al. The application of GEANT4 simulation code for brachytherapy treatment Maria Grazia Pia INFN Genova, Italy and CERN/IT
Validation of the Bremsstrahlung models Susanna Guatelli, Barbara Mascialino, Luciano Pandola, Maria Grazia Pia, Pedro Rodrigues, Andreia Trindade IEEE.
Geant4-INFN (Genova-LNS) Team Validation of Geant4 electromagnetic and hadronic models against proton data Validation of Geant4 electromagnetic and hadronic.
Maria Grazia Pia Systematic validation of Geant4 electromagnetic and hadronic models against proton data Systematic validation of Geant4 electromagnetic.
Background Data validation, a critical issue for the E.S.S.
Comparison of data distributions: the power of Goodness-of-Fit Tests
Maria Grazia Pia, INFN Genova Software Process: Physics Maria Grazia Pia INFN Genova on behalf of the Geant4 Collaboration Budker Inst. of Physics IHEP.
Michela Piergentili, INFN Genova F. P. Brooks, “No Silver Bullet - Essence and Accidents of Software Engineering”, IEEE Computer 20(4):10-19, April, 1987.
M.G. Pia et al. Brachytherapy at IST Results from an atypical Comparison Project Stefano Agostinelli 1,2, Franca Foppiano 1, Stefania Garelli 1, Matteo.
Geant4 Acceptance Suite for Key Observables CHEP06, T.I.F.R. Mumbai, February 2006 J. Apostolakis, I. MacLaren, J. Apostolakis, I. MacLaren, P. Mendez.
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
Extreme/Agile Programming Prabhaker Mateti. ACK These slides are collected from many authors along with a few of mine. Many thanks to all these authors.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.
A General Purpose Brachytherapy Software Simulation + Analysis (isodose calculation) 2/10/2002 Geant4 Workshop CERN Susanna Guatelli Univ. and INFN Genova.
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.
Maria Grazia Pia, INFN Genova Introduction to medical physics applications Maria Grazia Pia, INFN Genova Geant4 Workshop,
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Geant4 Training 2006 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki Institute of Physics)
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
LCG – AA review 1 Simulation LCG/AA review Sept 2006.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
Geant4 Training 2004 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Giuseppe Daquino (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki.
Maria Grazia Pia, INFN Genova and CERN1 Geant4 highlights of relevance for medical physics applications Maria Grazia Pia INFN Genova and CERN.
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Maria Grazia Pia, INFN Genova - G4 WG Coord. Meeting, 13/11/2001 ow Energy Electromagnetic Physics ow Energy Electromagnetic Physics New physics features.
Maria Grazia Pia, INFN Genova Advanced Examples Maria Grazia Pia, INFN Genova on behalf of the LowE/advanced examples WG
Advanced Examples Maria Grazia Pia, INFN Genova
Update on the Goodness of Fit Toolkit
Advanced examples Test & Analysis Project LowE e.m. physics
A Statistical Toolkit for Data Analysis
Advanced examples Test & Analysis Project LowE e.m. physics
Low Energy Electromagnetic Physics
Data analysis in HEP: a statistical toolkit
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
Introductory Course PTB, Braunschweig, June 2009
Hadronic physics validation of Geant4
Short Course Siena, 5-6 October 2006
An update on the Goodness of Fit Statistical Toolkit
Introductory Course ORNL, May 2008
Short Course IEEE NSS/MIC 2003 Katsuya Amako (KEK) Makoto Asai (SLAC)
Precision validation of Geant4 electromagnetic physics
Statistical Testing Project
G. A. P. Cirrone1, G. Cuttone1, F. Di Rosa1, S. Guatelli1, A
Comparison of data distributions: the power of Goodness-of-Fit Tests
Data analysis in HEP: a statistical toolkit
Presentation transcript:

Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo CMS Software Meeting 25 July 2005

Maria Grazia Pia, INFN Genova Goodness-of-Fit component Component for modeling multi-parametric fit problems F. Fabozzi, L. Lista INFN Napoli B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo CERN, INFN Genova, IST Genova G.A.P. Cirrone et al., A Goodness-of-Fit Statistical Toolkit IEEE Transactions on Nuclear Science, Vol. 51, Issue: 5, Oct. 2004, Pages: F. Fabozzi and L. Lista., A generic toolkit for multivariate fitting designed with template metaprogramming To be published in IEEE Transactions on Nuclear Science, 2005

Maria Grazia Pia, INFN Genova Vision: the basics software process Rigorous software process vision Have a vision for the project –General purpose tool for statistical analysis –Toolkit approach (choice open to users) –Open source product architecture Build on a solid architecture Clearly define scopeobjectives scope, objectives Flexible, extensible, maintainable Flexible, extensible, maintainable system quality Software quality

Maria Grazia Pia, INFN Genova Architectural guidelines architectural The project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a long time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based architecture –to facilitate re-use and integration in diverse frameworksDependencies –no dependence on any specific analysis tool –the core mathematical component is independent from the user’s representation of his/her analysis objects –the user layer bridges the core statistical component and the user’s analysis –multiple implementations of the user layer (AIDA, ROOT, FITS, GSL etc.)

Maria Grazia Pia, INFN Genova

Simple user layer Shields the user from the complexity of the underlying algorithms and design user’s analysis objectscomparison algorithm Only deal with user’s analysis objects and choice of comparison algorithm

Maria Grazia Pia, INFN Genova Software process United Software Development Process, specifically tailored to the project –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO –significant experience gained in the group from other projects Incremental and iterative life-cycle model

Maria Grazia Pia, INFN Genova GoF algorithms (currently implemented) Algorithms for binned distributions – Anderson-Darling test – Chi-squared test – Fisz-Cramer-von Mises test – Tiku test (Cramer-von Mises test in chi-squared approximation) Algorithms for unbinned distributions – Anderson-Darling test – Cramer-von Mises test – Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) – Kolmogorov-Smirnov test – Kuiper test – Tiku test (Cramer-von Mises test in chi-squared approximation) The most complete statistics software for the comparison of two distributions (also among commercial/professional statistics tools)

Maria Grazia Pia, INFN Genova Recent extensions: algorithms Fisz-Cramer-von Mises test –exact asymptotic distribution (earlier: critical values) Anderson-Darling test –exact formulation (exists for unbinned distributions only, earlier: approximated formulation for both binned and unbinned distributions) Tiku test –Cramer-von Mises test in a chi-squared approximation New tests: weighted Kolmogorov-Smirnov, weighted Cramer-von Mises –various weighting functions available in literature In preparation: Watson test –can be applied in case of cyclic observations (like Kuiper test) It is already the most complete software for the comparison of two distributions (even among commercial/professional statistics tools) –goal: provide all 2-sample GoF algorithms currently existing in statistics literature Publication in preparation to describe the new algorithms

Maria Grazia Pia, INFN Genova Recent extensions: user layer First release: user layer for AIDA analysis objects –LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms –CMS requirement (requested by Pedro Arce) Other user layer implementations foreseen –easy to add –sound architecture decouples the mathematical component and the user’s representation of analysis objects –different requirements from various user communities: satisfy all of them w/o introducing dependencies on any analysis tools More in Andreas’ talk

Maria Grazia Pia, INFN Genova Release Releases are publicly downloadable from the web –code, documentation etc. For the convenience of LCG users, releases are also distributed with LCG AA software as “external contributions” New release with algorithm and user layer extension in the next days –thanks to Pedro Arce for  -testing! –new naming of packages for additional user layers Another new release with new algorithms planned in autumn –+ publication on recent extensions Releases include extensive user documentation –statistics algorithms –how to use the software The project is systematically accompanied by publications on refereed journals to document the recognition of its scientific value

Maria Grazia Pia, INFN Genova Power of GoF tests No comprehensive study of the relative power of GoF tests exists in literature –novel research in statistics –novel research in statistics (not only in physics data analysis!) Compare the various GoF tests w.r.t. a set of “typical” functional distributions guidance to the users Provide guidance to the users based on sound quantitative arguments Preliminary results available, publication in preparation Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is it to recognize real equivalent distributions and to reject fake ones? Which test to use?

Maria Grazia Pia, INFN Genova Usage Geant4 physics validation –rigorous approach: quantitative evaluation of Geant4 physics models with respect to established reference data –see for instance: K. Amako et al., Comparison of Geant4 electromagnetic physics models against the NIST reference data To be published on IEEE Transactions on Nuclear Science LCG Simulation Validation project –see for instance: A. Ribon, Testing Geant4 with a simplified calorimeter setup, talk at Geant4 Physics Validation Workshop, Genova, July 2005, CMS –validation of “new” histograms w.r.t. “reference” ones in OSCAR Validation Suite Geant4 regression testing –prototype developed at Gran Sasso Lab, inclusion in Geant4 regular testing in preparation (discussed at Geant4 Physics validation Workshop last week) Usage also in space science, medicine etc.

Maria Grazia Pia, INFN Genova Outlook Treatment of errors 1-sample GoF tests (comparison w.r.t. a function) Comparison of two/multi-dimensional distributions In all cases: goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Your feedback is very much appreciated Please let us know your requirements –we do our best to satisfy them, compatible with available resources The project is open to developers interested in statistical methods –it is advanced R&D in statistics, not only a useful tool for physics analysis New release coming soon New publications in preparation

Maria Grazia Pia, INFN Genova Will be moved to

Maria Grazia Pia, INFN Genova Acknowledgments Work supported and partially funded by the European Space Agency (ESA) under Contract No.16339/02/NL/FM Fred James (CERN) and Louis Lyons (Oxford) –many useful suggestions, discussions, encouragement... Thanks to our statistician (Paolo Viarengo, IST – National Institute for Cancer Research) for his help with the mathematical calculations and the guidance through statistics literature

Maria Grazia Pia, INFN Genova #include #include "TH1.h" #include "StatisticsTesting/StatisticsComparator.h" #include "ComparisonResult.h" #include "Chi2ComparisonAlgorithm.h" using namespace StatisticsTesting; int main (int, char **) { // Create and fill two 1-dimensional histograms with random numbers. TH1 hA("A", 10, 0.0, 1.0); TH1 hB("B", 10, 0.0, 1.0); for (int i = 0; i < 1000; i++ ) { hA.Fill( drand48() ); hB.Fill( drand48() ); }

Maria Grazia Pia, INFN Genova Comparing two root histograms // Do the (binned) statistical test between the two histograms StatisticsComparator comparator; ComparisonResult result = comparator.compare(hA, hB); std::cout << "Result of the Chi2 Statistical Test: " << std::endl << "distance = " << result.distance() << std::endl << ”NDF = " << result.ndf() << std::endl << "p-value = " << result.quality() << std::endl; }

Maria Grazia Pia, INFN Genova Make and run: $ source setup.sh $ make $.objs/exampleChi2 Result of the Chi2 Statistical Test: distance = ndf (number of degree of freedom) = 9 p-value =