Luca Lista, IEEE NSS-MIC 2003, Portland A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista 1, Francesco Fabozzi 1,2.

Slides:



Advertisements
Similar presentations
Probability and Maximum Likelihood. How are we doing on the pass sequence? This fit is pretty good, but… Hand-labeled horizontal coordinate, t The red.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Modular Programming With Functions
Review What is a virtual function? What can be achieved with virtual functions? How to define a pure virtual function? What is an abstract class? Can a.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Visual Recognition Tutorial
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Search for B     with SemiExclusive reconstruction C.Cartaro, G. De Nardo, F. Fabozzi, L. Lista Università & INFN - Sezione di Napoli.
7/12/2015 Top Pairs Meeting 1 A template fit technique to measure the top quark mass in the l+jets channel Ulrich Heintz, Vivek Parihar.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Lecture 7: Simulations.
Particle Filtering in Network Tomography
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Luca Lista, Siena 2004 Advanced software methods for physics analysis Luca Lista INFN Napoli.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Introduction to GSL CS 3414 From GNU Scientific Library Reference Manual at
Quantification of the non- parametric continuous BBNs with expert judgment Iwona Jagielska Msc. Applied Mathematics.
Status of the Glasgow B→hh analysis CP Working group γ from loops 14 th October 2010 Paul Sail, Lars Eklund and Alison Bates.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.
Tarek A. El-Moselhy and Luca Daniel
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory ATLAS RAL Physics Meeting 20 th May 2008.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008.
Fitting in AIDA General Concepts Requirements JAIDA Examples Interfaces Overview Conclusions.
Statistical Methods for Data Analysis Introduction to the course Luca Lista INFN Napoli.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
The KOSMOSHOWS What is it ? The statistic inside What it can do ? Future development Demonstration A. Tilquin (CPPM)
Lecture 4 – Function (Part 1) FTMK, UTeM – Sem /2014.
Lorenzo Moneta,LHCb Software week, 26 May New ROOT Math Libraries Activities MathLib work package from ROOT SEAL merge new proposed structure for.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Generic Programming and Library Design Brian Bartman
Bootstrapping James G. Anderson, Ph.D. Purdue University.
FUNCTIONS (C) KHAERONI, M.SI. OBJECTIVE After this topic, students will be able to understand basic concept of user defined function in C++ to declare.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
Canadian Bioinformatics Workshops
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Chapter 3: Maximum-Likelihood Parameter Estimation
Statistical methods in LHC data analysis introduction
Introduction to GSL CS 3414 From GNU Scientific Library Reference Manual at
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
A Toolkit for the modeling of Multi-parametric fit problems
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
S.Linev, J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel
Computing and Statistical Data Analysis / Stat 7
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
Statistical Methods for Data Analysis Random number generators
G. Delyon, Ph. Réfrégier and F. Galland Physics & Image Processing
Presentation transcript:

Luca Lista, IEEE NSS-MIC 2003, Portland A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista 1, Francesco Fabozzi 1,2 1 INFN Napoli 2 Università della Basilicata

Luca Lista, IEEE NSS-MIC 2003, Portland Introduction The toolkit provides:  a language to describe and model parametric fit problems in C++  utilities to study the fit frequentistic properties Not intended to provide new mathematical algorithms  The underlying minimization engine is Minuit Motivated for analysis in BaBar experiment requiring complex fit modeling and Toy MC

Luca Lista, IEEE NSS-MIC 2003, Portland Main functionalities Description of Probability Distribution Functions (PDF)  most common PDFs provided (Gaussian, Poisson, etc.)  random number generators for each provided PDF  utilities to combine PDFs Manipulation of symbolic expression  simplifies the definition of PDF models and fit functions Fitter tools  different Unbinned Maximum Likelihood (UML) fitters and Chi-square fitter supported Toy Monte Carlo  utility to generate random data samples to validate the fit results (pull distribution, fit bias estimate, etc.) User-defined components can be easily plugged-in

Luca Lista, IEEE NSS-MIC 2003, Portland Design choices The code is optimized for speed  Toy Monte Carlo of complex fits are very CPU intensive It can be achieved without loosing good OO design  avoid virtual functions where not necessary  using template generic programming  the Boost C++ library provides powerful tools Metaprogramming permits type manipulations at compile time User don’t “see” these technical detail in the interface External package dependencies are well isolated  Random number generator engines (ROOT, CLHEP, …)  Minuit wrapper (ROOT, …) Other minimizers may be adopted (NAG, …)

Luca Lista, IEEE NSS-MIC 2003, Portland A PDF implements the “()” operator: P = f( x, y, … ) Users can define new PDFs respecting the above interface PDF interface struct Flat : { PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( double x ) const { return ( x max ? 0 : 1 / ( max - min ) ); } double min, max; }; struct Poissonian { PdfPoissonian( double m ) : mean( m ) { } double operator()( int n ) const { return ( exp( - mean ) * pow( mean, n ) / factorial( n ) ); } double mean; }; Variable set; a sequence of any variable type is supported Returns dP(x) / dx Returns P(n)

Luca Lista, IEEE NSS-MIC 2003, Portland Implements the “generate” method: r.generate( x, y, … ) Random number generators template struct RandomGenerator { RandomGenerator( const Flat& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double & x ) const{ x = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; RANDOM_GENERATOR_SAMPLE(MyPdf, Bins, Min, Max) RANDOM_GENERATOR_HITORMISS(MyPdf, Min, Max, fMax) Users can define new generators with the preferred method Numerical implementations are provided trapezoidal PDF sampling “hit or miss” technique Random engine: CLHEP, ROOT, … Partial specialization

Luca Lista, IEEE NSS-MIC 2003, Portland Combining PDFs Argus shoulder ( 5.20, 5.28, -0.1 ); Gaussian peak( 5.28, 0.05 ); typedef Mixture Mix; Mix pdf( peak, shoulder, 0.1 ); RandomGenerator rnd; double x; rnd.generate( x ); Gaussian sigX( 5.28, 0.05 ); Gaussian sigY ( 0, ); typedef Independent SigXY; RandomGenerator rndXY; double x, y; rndXY.generate( x, y ); 10% peaking component Argus + Gaussian peaking Transformation of variables is also supported  Random variables are be generated in the original coordinate system, then transformed 2D Gaussian peaking Random generators defined automatically

Luca Lista, IEEE NSS-MIC 2003, Portland Fit PDF parameters and run Toy MC const int sig = 100 ; double mean = 0, sigma = 1; Gaussian pdf( mean, sigma ); Likelihood like( pdf ); UMLParameterFitter > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); Poissonian num( sig ); // alternative: Constant Gaussian pdfExp( mean, sigma ); Experiment experiment( num, pdfExp ); for ( int i = 0; i < 50000; i++ ) { Sample sample; experiment.generate( sample ); double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = fitter.fit( par, err, sample ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Poisson PDF for MC generation Parameters “linked” to the fitter Definition of fit model and fitter Type list deduced from Likelihood type

Luca Lista, IEEE NSS-MIC 2003, Portland Parameter fit Results (Pulls) There is a bias (as expected):  2 = 1 / n  i (x i -  ) 2  1 / n-1  i (x i -  ) 2

Luca Lista, IEEE NSS-MIC 2003, Portland UML Yield fit const int sig = 10, bkg = 5; typedef Independent PdfSig; typedef Independent PdfBkg; PdfSig pdfSig( Gaussian( 0, 1 ), Gaussian( 0, 0.5 ) ); PdfBkg pdfBkg( Flat( -5, 5 ), Flat( -5, 5 ) ); typedef ExtendedLikelihood2 Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter fitter( like ); typedef Poissonian Fluctuation; // alternative: Constant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Experiment ToySig; typedef Experiment ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2 toy( toySig, toyBkg ); for ( int i = 0; i < 50000; i++ ) { Sample sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = fitter.fit( s, err, sample ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } Ext. Likelihood with two samples Yield fitter extracts the yield of the two components In 2 dimensions: Flat background in a signal box Gaussian signal

Luca Lista, IEEE NSS-MIC 2003, Portland Yield fit Results (Pulls) Discrete structure because of low statistics Poisson fluctuation = 10 = 5

Luca Lista, IEEE NSS-MIC 2003, Portland Combined Yield and parameter fit const int sig = 10, bkg = 5; typedef Poissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Independent< Gaussian, Gaussian > PdfSig; typedef Independent< Flat, Flat > PdfBkg; Gaussian g1( 0, 1 ), g2( 0, 0.5 ); Flat f1( -5, 5 ), f2( -5, 5 ); Sig pdfSig( g1, g2 ); Bkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, Sig> ToySig; typedef Experiment<Fluctuation, Bkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2 toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; Gaussian G1( 0, 1 ); Sig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 50000; i++ ) { Sample sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = fitter.fit( s, err, sample ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; } 2D Gaussian signal over a 2D flat background: Simultaneous fit of yields and Gaussian mean

Luca Lista, IEEE NSS-MIC 2003, Portland Symbolic function package Symbolic expressions makes the definition of PDFs easier { X x; // declare the variable x // normalize using the symbolic integration at c-tor PdfNonParametric f1( sqr( sin(x) + cos(x) ), 0, 4 * M_PI ); // recompute the normalization every time, since // the parameter tau may change from call to call Parameter tau( ); PdfParametric f2( x * exp( - tau * x ), 0, 10 ); } User can specify different way of performing normalization and integration Normalization: Analytic integral performed by the compiler

Luca Lista, IEEE NSS-MIC 2003, Portland Example of  2 fit { X x; Parameter a( 0 ), b( 1 ), c( 0 ); Function parabola( c + x*( b + x*a ) ); UniformPartition partition( 100, -1.0, 1.0 ); Chi2 > chi2( parabola, partition ); Chi2Fitter > > fitter( chi ); fitter.addParameter( "a", a.ptr() ); fitter.addParameter( "b", b.ptr() ); fitter.addParameter( "c", c.ptr() ); SampleErr sample( partition.bins() ); // fill the sample... double par[] = { a, b, c }, err[] = { 1, 1, 1 }; fitter.fit( par, err, sample ); }

Luca Lista, IEEE NSS-MIC 2003, Portland Possible future improvement Upper limit extraction based on Toy Monte Carlo  Could be based on existing code from BaBar B    analysis Support for  2 fit with correlated errors and covariance matrix Provide more “standard” PDFs  Crystal ball, Tchebichev polynomials,… Managing singular PDF  Delta-Dirac components Managing (un)folding …

Luca Lista, IEEE NSS-MIC 2003, Portland Conclusion We designed a new tool to model fit problems Using template generic programming we obtained:  Generality: User can plug-in new components (PDF, transformations, random generators, etc.) Easy to incorporate in the tool external contributions  Light-weight Most of the code is contained in header ( #include ) files Mild external dependencies  Easy to use Very “synthetic” and “expressive” code  CPU Speed Virtual function calls are extremely limited Most of the methods are inlined Interest has been expressed from:  Geant4 Statistical testing toolkit  LCG/PI (LHC Computing Grid - Physics Interfaces) Will focus on a release version shortly