A Toolkit for the modeling of Multi-parametric fit problems

Slides:



Advertisements
Similar presentations
Measurement of  David Hutchcroft, University of Liverpool BEACH’06      
Advertisements

Biagio Di Micco17/07/ Radiative Phi Decays Meeting 1  Status of the work Biagio Di Micco Università degli Studi di Roma 3.
Luca Lista, IEEE NSS-MIC 2003, Portland A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista 1, Francesco Fabozzi 1,2.
Recent Bottomonium Results from BaBar Bryan Fulsom SLAC National Accelerator Laboratory 35 th International Conference on High Energy Physics Paris, France.
EPS, July  Dalitz plot of D 0   -  +  0 (EPS-208)  Kinematic distributions in  c   e + (EPS-138)  Decay rate of B 0  K * (892) +  -
1 D 0 -D 0 Mixing at BaBar Charm 2007 August, 2007 Abe Seiden University of California at Santa Cruz for The BaBar Collaboration.
16 May 2002Paul Dauncey - BaBar1 Measurements of CP asymmetries and branching fractions in B 0   +  ,  K +  ,  K + K  Paul Dauncey Imperial College,
1 D 0 -D 0 Mixing at BaBar Charm 2007 August, 2007 Abe Seiden University of California at Santa Cruz for The BaBar Collaboration.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
Search for B     with SemiExclusive reconstruction C.Cartaro, G. De Nardo, F. Fabozzi, L. Lista Università & INFN - Sezione di Napoli.
Peter Fauland (for the LHCb collaboration) The sensitivity for the B S - mixing phase  S at LHCb.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
Measurement of the Branching fraction B( B  D* l ) C. Borean, G. Della Ricca G. De Nardo, D. Monorchio M. Rotondo Riunione Gruppo I – Napoli 19 Dicembre.
Luca Lista L.Lista INFN Sezione di Napoli Rare and Hadronic B decays in B A B AR.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Guglielmo De Nardo Napoli University and INFN 7th Meeting on B Physics, Orsay, France, October 4th 2010.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Luca Lista, Siena 2004 Advanced software methods for physics analysis Luca Lista INFN Napoli.
 Candidate events are selected by reconstructing a D, called a tag, in several hadronic modes  Then we reconstruct the semileptonic decay in the system.
Irakli Chakaberia Final Examination April 28, 2014.
CSE 332: C++ Type Programming: Associated Types, Typedefs and Traits A General Look at Type Programming in C++ Associated types (the idea) –Let you associate.
Rare B  baryon decays Jana Thayer University of Rochester CLEO Collaboration EPS 2003 July 19, 2003 Motivation Baryon production in B decays Semileptonic.
Signal and Background Modeling for H → 4l Peter Vankov UK Higgs Meeting, RAL
CP violation measurements with the ATLAS detector E. Kneringer – University of Innsbruck on behalf of the ATLAS collaboration BEACH2012, Wichita, USA “Determination.
Background Subtraction and Likelihood Method of Analysis: First Attempt Jose Benitez 6/26/2006.
Top mass error predictions with variable JES for projected luminosities Joshua Qualls Centre College Mentor: Michael Wang.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
CP Violation Studies in B 0  D (*)  in B A B A R and BELLE Dominique Boutigny LAPP-CNRS/IN2P3 HEP2003 Europhysics Conference in Aachen, Germany July.
B   and B  D ( * )   decays at BaBar Guglielmo De Nardo University of Napoli “Federico II” and INFN Representing the BaBar collaboration 36 th International.
Calo Calibration Meeting 29/04/2009 Plamen Hopchev, LAPP Calibration from π 0 with a converted photon.
4/12/05 -Xiaojian Zhang, 1 UIUC paper review Introduction to Bc Event selection The blind analysis The final result The systematic error.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Guglielmo De Nardo for the BABAR collaboration Napoli University and INFN ICHEP 2010, Paris, 23 July 2010.
Charm Mixing and D Dalitz analysis at BESIII SUN Shengsen Institute of High Energy Physics, Beijing (for BESIII Collaboration) 37 th International Conference.
ICHEP 2002, Amsterdam Marta Calvi - Study of Spectral Moments… 1 Study of Spectral Moments in Semileptonic b Decays with the DELPHI Detector at LEP Marta.
Charmless Hadronic B Decays at BaBar
Statistical Estimation
Inclusive Tag-Side Vertex Reconstruction in Partially Reconstructed B decays -A Progress Report - 11/09/2011 TDBC-BRECO.
Erik Devetak Oxford University SiD Workshop 24/02/2009
CMSSW_3_1_1 preproduction samples
The general linear model and Statistical Parametric Mapping
Search for b → u transitions in B+ → {Kpp0}DK+
Muon momentum scale calibration with J/y peak
For the BaBar Collaboration
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
W boson helicity measurement
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
EE513 Audio Signals and Systems
The general linear model and Statistical Parametric Mapping
Computing and Statistical Data Analysis / Stat 7
Statistical Methods for Data Analysis parameter estimate
Statistical Methods for Data Analysis Random number generators
Top mass measurements at the Tevatron and the standard model fits
Paul Sail, Lars Eklund and Alison Bates
Slope measurements from test-beam irradiations
Vincenzo Vagnoni INFN Bologna CKM Workshop Durham, April 8th 2003
Hot Topic from Belle : Recent results on quarkonia
Introduction to Statistics − Day 4
Unfolding with system identification
Search for Lepton Flavour Violation in the decay  → BaBar
° status report analysis details: overview; “where we are”; plans: before finalizing result.. I.Larin 02/13/2009.
Paul Sail, Lars Eklund and Alison Bates
Paul Sail, Lars Eklund and Alison Bates
Observation of non-BBar decays of (4S)p+p- (1S, 2S)
Presentation transcript:

A Toolkit for the modeling of Multi-parametric fit problems Luca Lista INFN Napoli

Motivation Initially developed while rewriting a fortran fitter for BaBar analysis Simultaneous estimate of: B(B J/) / B(B J/K) direct CP asymmetry More control on the code was needed to justify a bias appeared in the original fitter As much as possible of the code has to be under control and testable separately

Requirements Provide Tools for modeling parametric fit problems Unbinned Maximum Likelihood (UML[*]) fit of: PDF parameters Yields of different sub-samples Both, mixed 2 fits Toy Monte Carlo to study the fit properties Fitted parameter distributions Pulls, Bias, Confidence level of fit results [*] not Unified Modeling Language …  …

Design issues Trying to optimize as much as possible the PDF code Gets called a large number of times Yes, it can be done in C++: Addressed with Template Metaprogramming No need to use virtual functions The underlying minimization engine is Minuit, as always Wrapped in different flavours (ROOT, …)

PDF interface class PdfFlat { public: typedef double type; enum { variables = 1 }; PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( type * v ) const type x = *v; return ( x < min || x > max ? 0 : 1 / ( max - min ) ); } double min, max; }; class PdfPoissonian { public: typedef int type; enum { variables = 1 }; PdfPoissonian( double m ) : mean( m ) { } double operator()( type * v ) const type n = *v; return ( exp( - mean ) * pow( mean, n ) / TMath::Factorial( n ) ); } double mean; }; Variable type Returns P(x) Variable set Returns dP(x)/dx The user can define its own pdfs with the above interface

Random number generators template< class Pdf, class Generator = RootRandom > class RandomGenerator { public: typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ); void generate( type * v ) const; }; template< class Generator > class RandomGenerator< PdfFlat, Generator > { public: typedef PdfFlat Pdf; typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double * v ) const v[ 0 ] = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; Partial specialization Generic Random engine: Root, CLHEP, … The user can define its own generators with the preferred method

Combining PDFs n1 + n2 variables Product of pdfs template<class Pdf1, class Pdf2> class PdfIndependent2 { public: typedef typename Pdf1::type type; enum { variables = Pdf1::variables + Pdf2::variables }; PdfIndependent2( const Pdf1& pdf1, const Pdf2& pdf2 ) : _pdf1( pdf1 ), _pdf2( pdf2 ) { } double operator()( double * val ) const { return _pdf1( val ) * _pdf2( val + Pdf1::variables ); } private: const Pdf1 &_pdf1; const Pdf2 &_pdf2; }; template<class Pdf1, class Pdf2, class Pdf3> class PdfIndependent3 { ... }; template<class Pdf1, class Pdf2, class Pdf3, class Pdf4> class PdfIndependent4 { ... }; n1 + n2 variables Product of pdfs

Transformation of variables template<class Pdf, class Transformation> class PdfTransformed { public: typedef typename Pdf::type type; enum { variables = Pdf::variables }; PdfTransformed( const Pdf& pdf, const Transformation& trans ) : _pdf( pdf ), _trans( trans ) { } double operator()( double * val ) const double x[ variables ]; copy( val, val + variables, x ); _trans( x ); return _pdf( x ); } private: const Pdf &_pdf; Transformation _trans; }; The Jacobian must be 1! typedef PdfIndependent2<PdfGaussian, PdfGaussian> PdfBasic; typedef PdfTransformed<PdfBasic, Rotation2D> Pdf; PdfGaussian g1( 0, 0.1 ), g2( 0, 1 ); Pdf pdf( PdfBasic( g1, g2 ), Rotation2D( M_PI / 4 ) ); Client code example

A data sample to be fitted template< int n, class type = double > class Sample { public: typedef std::vector<type *> container; typedef typename container::size_type size_type; typedef typename container::iterator iterator; typedef typename container::const_iterator const_iterator; ~Sample() { for( iterator i = begin(); i != end(); i ++ ) delete [] *i; } size_type size() const { return _v.size(); } const type* operator[]( int i ) const { return _v[ i ]; } iterator begin() { return _v.begin(); } iterator end() { return _v.end(); } const_iterator begin() const { return _v.begin(); } const_iterator end() const { return _v.end(); } type * extend() { _v.push_back( new type[ n ] ); return _v.back(); } private: container _v; }; Fixed number of variables Basically, a vector of double*

UML PDF Parameter fit Fixed PDF for MC generation Variable PDF const int sig = 100; double mean = 0, sigma = 1; PdfConstant p( sig ); // alternative: PdfPoissonian PdfGaussian q( mean, sigma ); Experiment<PdfConstant, PdfGaussian> experiment( p, q ); PdfGaussian pdf( mean, sigma ); Likelihood<PdfGaussian> like( pdf ); UMLParameterFitter<Likelihood<PdfGaussian> > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); for ( int i = 0; i < 5000; i++ ) { Sample< 1 > sample; experiment.generate( sample ); double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = fitter.fit( sample.begin(), sample.end(), par, err ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Fixed PDF for MC generation Variable PDF For fitting Parameters linked to the fitter

Parameter fit Results (Pulls) There is a bias (as expected): 2 = 1/ni(xi-)2  1/n-1i(xi-)2

UML Yield fit In 2 dimensions: Flat background Gaussian signal const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; // alternative: PdfConstant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfSig pdfSig( PdfGaussian( 0, 1 ), PdfGaussian( 0, 0.5 ) ); PdfBkg pdfBkg( PdfFlat( -5, 5 ), PdfFlat( -5, 5 ) ); typedef Experiment< Fluctuation, PdfSig > ToySig; typedef Experiment< Fluctuation, PdfBkg > ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2< ToySig, ToyBkg > toy( toySig, toyBkg ); typedef ExtendedLikelihood2< PdfSig, PdfBkg > Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter< Likelihood > fitter( like ); for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } In 2 dimensions: Flat background Gaussian signal

Yield fit Results (Pulls) <b> = 5 Discrete structure because of low statistics Poisson fluctuation

Combined Yield and parameter fit const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfGaussian g1( 0, 1 ), g2( 0, 0.5 ); PdfFlat f1( -5, 5 ), f2( -5, 5 ); PdfSig pdfSig( g1, g2 ); PdfBkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, PdfSig> ToySig; PdfBkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2<ToySig, ToyBkg> toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; PdfGaussian G1( 0, 1 ); PdfSig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter<Likelihood> fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; }

Combined fit Results (Pulls)

Support for 2 fit Still no support for Correlated errors! class Line { public: Line( double A, double B ) : a( A ), b( B ) { } double operator()( double v ) const { return a + b * v; } double a, b; }; Line line( 0, 1 ); Chi2<Line> chi2line( line, partition ); Chi2Fitter<Chi2<Line> > fitter1( chi2line ); fitter1.addParameter( "a", &line.a ); fitter1.addParameter( "b", &line.b ); Parabola para( 0, 1, 0 ); Chi2<Parabola> chi2para( para, partition ); Chi2Fitter<Chi2<Parabola> > fitter2( chi2para ); fitter2.addParameter( "a", &para.a ); fitter2.addParameter( "b", &para.b ); fitter2.addParameter( "c", &para.c ); // ... } Still no support for Correlated errors!

Application to B(B J/) / B(B J/K) Four variables: B reconstructed mass Beam - B energy in the  mass hypothesis Beam – B energy in the K mass hypothesis B meson charge Two samples: J/ , J/ ee Simultaneous fit of: Total yield of B J/, B J/K and background Charge asymmetry Resolution and energy shitfs separately for J/ , J/ ee

B(B J/) / B(B J/K), cont. “Peculiar” distribution of kinematical variables Non trivial variable transformation to factorize the pdfs Different samples factorize w.r.t. different variable combinations Real experience: Code much more manageable and under control Different components are testable separately Pdf, random generation Different approaches to the fits Getting confidence in the fit results require massive testing

Model for independent PDFs EK D D E

Dealing with kinematical pre-selection -120 MeV < E, EK < 120MeV A B D C A B D C The area is preserved after the trasformation

Extracting the signal B J/K B J/ Background Likelihood J/y  ee events Background Likelihood projection J/y  mm events

Possible improvement Tools for upper limit extraction based on Toy Monte Carlo Adaptable code available (from B  analysis) Support for 2 fit with full covariance matrix Provide more “standard” pdfs and random generators Exponential, Argus, Crystal ball, … Generic Hit-or-miss generator, etc. Managing singular PDF Delta-Dirac components Factorize out dependence on ROOT, CLHEP, etc. Done for random generators Can be improved for Minuit interface More thoughts about the interface Is passing a double* as set of PDF variables suitable?