Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.

Slides:



Advertisements
Similar presentations
Alberto Ribon CERN Geant4Workshop Vancouver, September 2003 Tutorial of the Statistical Toolkit
Advertisements

Statistical Toolkit Power of Goodness-of-Fit tests
Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team
Maria Grazia Pia Geant4 LowE Workshop 30-31/5/2002 ow Energy e.m. Workshop CERN, May 2002.
Maria Grazia Pia, INFN Genova PhysicsLists in Geant4 Advanced Examples Geant4.
Maria Grazia Pia, INFN Genova 1 Part V The lesson learned Summary and conclusions.
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Simulation Project Organization update & review of recommendations Gabriele Cosmo, CERN/PH-SFT Application Area Internal.
Comparison of data distributions: the power of Goodness-of-Fit Tests
Introduction to RUP Spring Sharif Univ. of Tech.2 Outlines What is RUP? RUP Phases –Inception –Elaboration –Construction –Transition.
RUP Fundamentals - Instructor Notes
Maria Grazia Pia, INFN Genova Software Process: Physics Maria Grazia Pia INFN Genova on behalf of the Geant4 Collaboration Budker Inst. of Physics IHEP.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
A. Aimar - EP/SFT LCG - Software Process & Infrastructure1 Software Process panel SPI GRIDPP 7 th Collaboration Meeting 30 June – 2 July 2003 A.Aimar -
Geant4 Acceptance Suite for Key Observables CHEP06, T.I.F.R. Mumbai, February 2006 J. Apostolakis, I. MacLaren, J. Apostolakis, I. MacLaren, P. Mendez.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
OOAD… LowE Electrons From HEP computing to medical research and vice versa Bidirectional From HEP computing to medical research and vice versa Bidirectional.
Geant4 Workshop 2004 Maria Grazia Pia, INFN Genova Physics Book Maria Grazia Pia INFN Genova on behalf of the Physics Book Team
Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
SEAL Project Core Libraries and Services 18 December 2002 P. Mato / CERN Shared Environment for Applications at LHC.
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
Geant4 Training 2006 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki Institute of Physics)
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
SEAL Project Overview LCG-AA Internal Review October 2003 P. Mato / CERN.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
Process Asad Ur Rehman Chief Technology Officer Feditec Enterprise.
LCG – AA review 1 Simulation LCG/AA review Sept 2006.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
Geant4 Training 2004 Short Course Katsuya Amako (KEK) Gabriele Cosmo (CERN) Giuseppe Daquino (CERN) Susanna Guatelli (INFN Genova) Aatos Heikkinen (Helsinki.
Maria Grazia Pia, INFN Genova and CERN1 Geant4 highlights of relevance for medical physics applications Maria Grazia Pia INFN Genova and CERN.
Maria Grazia Pia, INFN Genova - G4 WG Coord. Meeting, 13/11/2001 ow Energy Electromagnetic Physics ow Energy Electromagnetic Physics New physics features.
Follow-up to SFT Review (2009/2010) Priorities and Organization for 2011 and 2012.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
TK2023 Object-Oriented Software Engineering
Update on the Goodness of Fit Toolkit
Test and Validation Studies of Mathematical Software Libraries
Advanced examples Test & Analysis Project LowE e.m. physics
A Statistical Toolkit for Data Analysis
Advanced examples Test & Analysis Project LowE e.m. physics
Data analysis in HEP: a statistical toolkit
Modelling Input Data Chapter5.
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
Short Course Siena, 5-6 October 2006
The Hadrontherapy Geant4 advanced example
An update on the Goodness of Fit Statistical Toolkit
Introductory Course ORNL, May 2008
Short Course IEEE NSS/MIC 2003 Katsuya Amako (KEK) Makoto Asai (SLAC)
Statistical Testing Project
Comparison of data distributions: the power of Goodness-of-Fit Tests
SEAL Project Core Libraries and Services
Data analysis in HEP: a statistical toolkit
Presentation transcript:

Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003

Alberto Ribon, CERN What is? Provide tools for the statistical comparison of distributions – simulation data – experimental measurements – data from reference sources – functions deriving from theoretical calculations or from fits physics validation regression testing system testing Main application areas in Geant4: A project to develop a general purpose statistical analysis system A project to develop a general purpose statistical analysis system

Alberto Ribon, CERN The team Development team (mostly part time!) Pablo Cirrone, INFN Southern National Lab Stefania Donadio, Univ. and INFN Genova Susanna Guatelli, CERN/IT/API Technical Student and INFN Genova Alberto Lemut, Univ. and INFN Genova Barbara Mascialino, Univ. and INFN Genova Sandra Parlati, INFN Gran Sasso National Lab Andreas Pfeiffer, CERN/IT/API Maria Grazia Pia, INFN Genova Alberto Ribon, CERN/IT/API Statistical consultancy Paolo Viarengo, Univ. Genova, Statistician Fred James, CERN Geant4 system integration team Gabriele Cosmo, CERN/IT/API - Geant4 Release Manager Sergei Sadilov, CERN/IT/API - Geant4 System Testing Coordinator interested collaborators are welcome!

Alberto Ribon, CERN Scope of the project tools for statistical testing The project will provide tools for statistical testing –physics comparisons and regression testing –multiple comparison algorithms Generality Generality (for application also in other areas) should be pursued –facilitated by a component-based architecture The statistical tools should be used in Geant4 (and in other frameworks) –tool to be used in testing frameworks –not a testing framework itself Re-use existing tools whenever possible –no attempt to re-invent the wheel –but critical, scientific evaluation of candidate tools

Alberto Ribon, CERN So far, only ad hoc solutions An old and common problem (comparison of distributions) The only general “tool” was HDIFF (which does the Kolmogorov- Smirnov test), which, although very useful and used, was never enough for any realistic physics analysis Each experiment (or even each Analysis group) has created each time its ad hoc “tool” for statistical tests, usually based on legacy code which were modified and adapted for the particular needs Example: CDF Coll. PRL 77 (1996) 438 “Inclusive jet cross section in p-pbar collisions at Tevatron”

Alberto Ribon, CERN Architectural guidelines architectural The project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a large time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based approach –Geant4-specific + general –Geant4-specific components + general components –to facilitate re-use and integration in diverse frameworksAIDA –adopt a (HEP) standard –no dependence on any specific analysis toolPython The approach adopted is compatible with the recommendations of the LCG Architecture Blueprint RTAG CERN LCG Architecture Blueprint RTAG

Alberto Ribon, CERN Some use cases Regression testing –Throughout the software life-cycle Online DAQ –Monitoring detector behaviour w.r.t. a reference Simulation validation –Comparison with experimental data Reconstruction –Comparison of reconstructed vs. expected distributions Physics analysis –Comparisons of experimental distributions (signal sample vs. bkg sample) –Comparison with theoretical distributions (data vs. Standard Model)

Alberto Ribon, CERN Goodness-of-fit tests Pearson’s  2 test Kolmogorov test Kolmogorov – Smirnov test Lilliefors test Cramer-von Mises test Anderson-Darling test Kuiper test … System open to extension and evolution Suggestions welcome!

Alberto Ribon, CERN Pearson’s  2 discrete (binned) Applies to discrete (binned) distributions It can be useful also in case of continuous (unbinned) distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached

Alberto Ribon, CERN Kolmogorov test The easiest among non-parametric tests continuous Verify the adaptation of a sample coming from a random continuous variable Based on the computation of the maximum distance between an empirical repartition function and the theoretical repartition one Test statistics: D = sup | F O (x) - F T (x)|

Alberto Ribon, CERN Kolmogorov-Smirnov test Problem of the two samples –mathematically similar to Kolmogorov’s Instead of comparing an empirical distribution with a theoretical one, try to find the maximum difference between the distributions of the two samples F n and G m : D mn = sup |F n (x) - G m (x)| continuous Can be applied only to continuous random variables Conover (1971) and Gibbons and Chakraborti (1992) tried to extend it to cases of discrete random variables

Alberto Ribon, CERN Lilliefors test Similar to Kolmogorov test Based on the null hypothesis that the random continuous variable is normally distributed N(m,  2 ), with m and  2 unknown Performed comparing the empirical repartition function F(z 1,z 2,...,z n ) with the one of the standardized normal distribution  (z): D* = sup | F O (z) -  (z)|

Alberto Ribon, CERN Cramer-von Mises test Based on the test statistics:  2 = integral (F O (x) - F T (x)) 2 dF(x) continuousdiscrete Can be performed both on continuous and discrete variables Satisfactory for symmetric and right-skewed distributions

Alberto Ribon, CERN Anderson-Darling test Performed on the test statistics: A 2 = integral { [F O (x) – F T (x)] 2 / [F T (x) (1-F T (X))] } dF T (x) continuousdiscrete Can be performed both on continuous and discrete variables skewness Seems to be suitable to any data-set (Aksenov and Savageau ) with any skewness (symmetric distributions, left or right skewed) Seems to be sensitive to fat tail of distributions

Alberto Ribon, CERN Kuiper test Based on a quantity that remains invariant for any shift or re-parameterization Does not work well on tails D* = max (F O (x)-F T (x)) + max (F T (x)-F O (x))

Alberto Ribon, CERN OOAD Collection of user requirements First analysis and design of the statistical component Validation of the class design through use cases Some open issues identified, to be addressed in the next design iterations

Alberto Ribon, CERN + more algorithms

Alberto Ribon, CERN

Work in progress Implementation and test of preliminary design What can be re-used? –Almost nothing available either in GSL or NAG Studies in progress –Transformation between binned-unbinned distributions –Strategies to use Kolmogorov-Smirnov with binned distributions (E. Dagum + original ideas) –How to deal with experimental errors (not only statistical!) –Multi-dimensional distributions –Bayesian approach In the to-do list –Conversion from AIDA objects to distributions –“Pythonisation”

Alberto Ribon, CERN Work in progress: User-specific Geant4 testing framework –Development of general physics tests in E.M. domain: collection of relevant observables, and respective reference data/distributions –Integration in the system testing framework CMS transition from Geant3 to Geant4 –An automaatic regression testing procedure is needed –Similar needs also for future Geant4 versions

Alberto Ribon, CERN Where? Core statistical component –Developed in an independent CVS repository –Code, documentation, software process deliverables –Where it will go? CLHEP or LCG ? Geant4-specific stuff –Kept separated in Geant4 Web site – Contact persons

Alberto Ribon, CERN Time scale driven by User needs Aggressive time scale driven by User needs –CMS and Geant4 OOAD + implementation undergoing A first prototype should be ready in few weeks Advanced functional system summer 2003 Open to the needs/suggestions of anyone –compatible with the available resources –possible integration in GSL

Alberto Ribon, CERN Conclusions… Core statistical components of general interest –LHC experiments, Geant4, etc. Project compatible with LCG architecture blueprint –component-based approach, AIDA, Python… Open to scientific collaboration Urgent user needs –CMS and Geant4 First prototype expected in few weeks