Luca Lista, IEEE NSS-MIC 2003, Portland A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista 1, Francesco Fabozzi 1,2 1 INFN Napoli 2 Università della Basilicata
Luca Lista, IEEE NSS-MIC 2003, Portland Introduction The toolkit provides: a language to describe and model parametric fit problems in C++ utilities to study the fit frequentistic properties Not intended to provide new mathematical algorithms The underlying minimization engine is Minuit Motivated for analysis in BaBar experiment requiring complex fit modeling and Toy MC
Luca Lista, IEEE NSS-MIC 2003, Portland Main functionalities Description of Probability Distribution Functions (PDF) most common PDFs provided (Gaussian, Poisson, etc.) random number generators for each provided PDF utilities to combine PDFs Manipulation of symbolic expression simplifies the definition of PDF models and fit functions Fitter tools different Unbinned Maximum Likelihood (UML) fitters and Chi-square fitter supported Toy Monte Carlo utility to generate random data samples to validate the fit results (pull distribution, fit bias estimate, etc.) User-defined components can be easily plugged-in
Luca Lista, IEEE NSS-MIC 2003, Portland Design choices The code is optimized for speed Toy Monte Carlo of complex fits are very CPU intensive It can be achieved without loosing good OO design avoid virtual functions where not necessary using template generic programming the Boost C++ library provides powerful tools Metaprogramming permits type manipulations at compile time User don’t “see” these technical detail in the interface External package dependencies are well isolated Random number generator engines (ROOT, CLHEP, …) Minuit wrapper (ROOT, …) Other minimizers may be adopted (NAG, …)
Luca Lista, IEEE NSS-MIC 2003, Portland A PDF implements the “()” operator: P = f( x, y, … ) Users can define new PDFs respecting the above interface PDF interface struct Flat : { PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( double x ) const { return ( x max ? 0 : 1 / ( max - min ) ); } double min, max; }; struct Poissonian { PdfPoissonian( double m ) : mean( m ) { } double operator()( int n ) const { return ( exp( - mean ) * pow( mean, n ) / factorial( n ) ); } double mean; }; Variable set; a sequence of any variable type is supported Returns dP(x) / dx Returns P(n)
Luca Lista, IEEE NSS-MIC 2003, Portland Implements the “generate” method: r.generate( x, y, … ) Random number generators template struct RandomGenerator { RandomGenerator( const Flat& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double & x ) const{ x = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; RANDOM_GENERATOR_SAMPLE(MyPdf, Bins, Min, Max) RANDOM_GENERATOR_HITORMISS(MyPdf, Min, Max, fMax) Users can define new generators with the preferred method Numerical implementations are provided trapezoidal PDF sampling “hit or miss” technique Random engine: CLHEP, ROOT, … Partial specialization
Luca Lista, IEEE NSS-MIC 2003, Portland Combining PDFs Argus shoulder ( 5.20, 5.28, -0.1 ); Gaussian peak( 5.28, 0.05 ); typedef Mixture Mix; Mix pdf( peak, shoulder, 0.1 ); RandomGenerator rnd; double x; rnd.generate( x ); Gaussian sigX( 5.28, 0.05 ); Gaussian sigY ( 0, ); typedef Independent SigXY; RandomGenerator rndXY; double x, y; rndXY.generate( x, y ); 10% peaking component Argus + Gaussian peaking Transformation of variables is also supported Random variables are be generated in the original coordinate system, then transformed 2D Gaussian peaking Random generators defined automatically
Luca Lista, IEEE NSS-MIC 2003, Portland Fit PDF parameters and run Toy MC const int sig = 100 ; double mean = 0, sigma = 1; Gaussian pdf( mean, sigma ); Likelihood like( pdf ); UMLParameterFitter > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); Poissonian num( sig ); // alternative: Constant Gaussian pdfExp( mean, sigma ); Experiment experiment( num, pdfExp ); for ( int i = 0; i < 50000; i++ ) { Sample sample; experiment.generate( sample ); double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = par, err, sample ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Poisson PDF for MC generation Parameters “linked” to the fitter Definition of fit model and fitter Type list deduced from Likelihood type
Luca Lista, IEEE NSS-MIC 2003, Portland Parameter fit Results (Pulls) There is a bias (as expected): 2 = 1 / n i (x i - ) 2 1 / n-1 i (x i - ) 2
Luca Lista, IEEE NSS-MIC 2003, Portland UML Yield fit const int sig = 10, bkg = 5; typedef Independent PdfSig; typedef Independent PdfBkg; PdfSig pdfSig( Gaussian( 0, 1 ), Gaussian( 0, 0.5 ) ); PdfBkg pdfBkg( Flat( -5, 5 ), Flat( -5, 5 ) ); typedef ExtendedLikelihood2 Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter fitter( like ); typedef Poissonian Fluctuation; // alternative: Constant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Experiment ToySig; typedef Experiment ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2 toy( toySig, toyBkg ); for ( int i = 0; i < 50000; i++ ) { Sample sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = s, err, sample ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } Ext. Likelihood with two samples Yield fitter extracts the yield of the two components In 2 dimensions: Flat background in a signal box Gaussian signal
Luca Lista, IEEE NSS-MIC 2003, Portland Yield fit Results (Pulls) Discrete structure because of low statistics Poisson fluctuation = 10 = 5
Luca Lista, IEEE NSS-MIC 2003, Portland Combined Yield and parameter fit const int sig = 10, bkg = 5; typedef Poissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Independent< Gaussian, Gaussian > PdfSig; typedef Independent< Flat, Flat > PdfBkg; Gaussian g1( 0, 1 ), g2( 0, 0.5 ); Flat f1( -5, 5 ), f2( -5, 5 ); Sig pdfSig( g1, g2 ); Bkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, Sig> ToySig; typedef Experiment<Fluctuation, Bkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2 toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; Gaussian G1( 0, 1 ); Sig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 50000; i++ ) { Sample sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = s, err, sample ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; } 2D Gaussian signal over a 2D flat background: Simultaneous fit of yields and Gaussian mean
Luca Lista, IEEE NSS-MIC 2003, Portland Symbolic function package Symbolic expressions makes the definition of PDFs easier { X x; // declare the variable x // normalize using the symbolic integration at c-tor PdfNonParametric f1( sqr( sin(x) + cos(x) ), 0, 4 * M_PI ); // recompute the normalization every time, since // the parameter tau may change from call to call Parameter tau( ); PdfParametric f2( x * exp( - tau * x ), 0, 10 ); } User can specify different way of performing normalization and integration Normalization: Analytic integral performed by the compiler
Luca Lista, IEEE NSS-MIC 2003, Portland Example of 2 fit { X x; Parameter a( 0 ), b( 1 ), c( 0 ); Function parabola( c + x*( b + x*a ) ); UniformPartition partition( 100, -1.0, 1.0 ); Chi2 > chi2( parabola, partition ); Chi2Fitter > > fitter( chi ); fitter.addParameter( "a", a.ptr() ); fitter.addParameter( "b", b.ptr() ); fitter.addParameter( "c", c.ptr() ); SampleErr sample( partition.bins() ); // fill the sample... double par[] = { a, b, c }, err[] = { 1, 1, 1 }; par, err, sample ); }
Luca Lista, IEEE NSS-MIC 2003, Portland Possible future improvement Upper limit extraction based on Toy Monte Carlo Could be based on existing code from BaBar B analysis Support for 2 fit with correlated errors and covariance matrix Provide more “standard” PDFs Crystal ball, Tchebichev polynomials,… Managing singular PDF Delta-Dirac components Managing (un)folding …
Luca Lista, IEEE NSS-MIC 2003, Portland Conclusion We designed a new tool to model fit problems Using template generic programming we obtained: Generality: User can plug-in new components (PDF, transformations, random generators, etc.) Easy to incorporate in the tool external contributions Light-weight Most of the code is contained in header ( #include ) files Mild external dependencies Easy to use Very “synthetic” and “expressive” code CPU Speed Virtual function calls are extremely limited Most of the methods are inlined Interest has been expressed from: Geant4 Statistical testing toolkit LCG/PI (LHC Computing Grid - Physics Interfaces) Will focus on a release version shortly