Wouter Verkerke, UCSB RooFitTools A general purpose tool kit for data modeling, developed in BaBar Wouter Verkerke (UC Santa Barbara) David Kirkby (Stanford.

Slides:



Advertisements
Similar presentations
Statistical Methods for Data Analysis Modeling PDF’s with RooFit
Advertisements

Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Introduction to C Programming
Introduction to C Programming
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Intermediate Code Generation
Modular Programming With Functions
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
Programming Languages and Paradigms The C Programming Language.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
C++ How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
Practical Object-Oriented Design with UML 2e Slide 1/1 ©The McGraw-Hill Companies, 2004 PRACTICAL OBJECT-ORIENTED DESIGN WITH UML 2e Chapter 5: Restaurant.
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
 2006 Pearson Education, Inc. All rights reserved Introduction to Classes and Objects.
K. Kinoshita University of Cincinnati Belle Collaboration Statistical distribution of  - zero free parameters impact of free parameters some speculations.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 5 - Functions Outline 5.1Introduction 5.2Program.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
RooFit A tool kit for data modeling in ROOT
Templates. Class templates – why? Writing programs we often use abstract data types such as stack, queue or tree. Implementations of these types may be.
 2008 Pearson Education, Inc. All rights reserved Introduction to Classes and Objects.
 2006 Pearson Education, Inc. All rights reserved Introduction to Classes and Objects.
Lesson 6 Functions Also called Methods CS 1 Lesson 6 -- John Cole1.
FunctionsFunctions Systems Programming Concepts. Functions   Simple Function Example   Function Prototype and Declaration   Math Library Functions.
Design Synopsys System Verilog API Donations to Accellera João Geada.
Review of C++ Programming Part II Sheng-Fang Huang.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training course describes how to configure the the C/C++ compiler options.
C++ How to Program, 8/e © by Pearson Education, Inc. All Rights Reserved. Note: C How to Program, Chapter 22 is a copy of C++ How to Program Chapter.
Statistical Methods for Data Analysis Parameter estimates with RooFit Luca Lista INFN Napoli.
RooFit – Open issues W. Verkerke. Datasets Current class structure Data representation –RooAbsData (abstract base class) –RooDataSet (unbinned [weighted]
C++ for Engineers and Scientists Second Edition Chapter 6 Modularity Using Functions.
Programming in Java Unit 2. Class and variable declaration A class is best thought of as a template from which objects are created. You can create many.
RooFit A tool kit for data modeling in ROOT
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. C How To Program - 4th edition Deitels Class 05 University.
Chapter 6: Functions Starting Out with C++ Early Objects
Chapter 06 (Part I) Functions and an Introduction to Recursion.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)
RooFit – tools for ML fits Authors: Wouter Verkerke and David Kirkby.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
1 SystemVerilog Enhancement Requests Daniel Schostak Principal Engineer February 26 th 2010.
10/31/2015PHYS 3446 DØ Data Analysis with ROOT Venkat (for Dr.Yu)
Creating Graphical User Interfaces (GUI’s) with MATLAB By Jeffrey A. Webb OSU Gateway Coalition Member.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Introduction to c++ programming - object oriented programming concepts - Structured Vs OOP. Classes and objects - class definition - Objects - class scope.
Alternate Version of STARTING OUT WITH C++ 4 th Edition Chapter 6 Functions.
Starting Out with C++ Early Objects ~~ 7 th Edition by Tony Gaddis, Judy Walters, Godfrey Muganda Modified for CMPS 1044 Midwestern State University 6-1.
Slides created by: Professor Ian G. Harris Hello World #include main() { printf(“Hello, world.\n”); }  #include is a compiler directive to include (concatenate)
1 Classes II Chapter 7 2 Introduction Continued study of –classes –data abstraction Prepare for operator overloading in next chapter Work with strings.
More about Java Classes Writing your own Java Classes More about constructors and creating objects.
Chapter 7 Constructors and Other Tools Copyright © 2010 Pearson Addison-Wesley. All rights reserved.
Conditional Observables Joe Tuggle BaBar ROOT/RooFit Workshop December 2007.
RooFit Tutorial – Topical Lectures June 2007
Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.
Defining Data Types in C++ Part 2: classes. Quick review of OOP Object: combination of: –data structures (describe object attributes) –functions (describe.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
Chapter 5 Introduction to Defining Classes Fundamentals of Java.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
Friend Class Friend Class A friend class can access private and protected members of other class in which it is declared as friend. It is sometimes useful.
Java Primer 1: Types, Classes and Operators
CMS RooStats Higgs Combination Package
About the Presentations
RooFit A general purpose tool kit for data modeling
Chapter 5 - Functions Outline 5.1 Introduction
Chapter 9 :: Subroutines and Control Abstraction
Constructors and Other Tools
Corresponds with Chapter 5
Presentation transcript:

Wouter Verkerke, UCSB RooFitTools A general purpose tool kit for data modeling, developed in BaBar Wouter Verkerke (UC Santa Barbara) David Kirkby (Stanford University)

Wouter Verkerke, UCSB What is RooFitTools? A data modeling toolkit for –(Un)binned maximum likelihood fits –Toy Monte Carlo generation –Generating plots/tables RooFitTools is a class library built on top of the ROOT interactive C++ environment –Key concepts Datasets Variables and generic functions Probability density functions –A fit/toyMC is setup in a ROOT C++ macro using the building blocks of the RooFitTools class library –TFitter/TMinuit used for actual fitting

Wouter Verkerke, UCSB Key concepts: a simple fitting example: Gauss+Exp void intro() { RooRealVar //define the data variables and fit model parameters m(“m”,”Reconstructed Mass”, 0.5,2.5, ”GeV”), rmass(“rmass”,”Resonance Mass”, 1.5,1.4, 1.6, “GeV”), width(“width”,”Resonance Width”, 0.15,0.1, 0.2, “GeV”), bgshape(“bgshape”,”Background shape”,-1.0,-2.0, 0.0), frac(“frac”,”Signal fraction”, 0.5,0.0, 1.0) ; // Create the fit model components: Gaussian and exponential PDFs RooGaussian signal(“signal”,”Signal Distribution”,m,rmass,width); RooExponential bg(“bg”,”Background distribution”,m,bgshape) ; // Combine them using addition (with relative fraction parameter) RooAddPdf model(“model”,”Signal + Background”,signal,bg,frac) ; // Read the values of ‘m’ from a text file RooDataSet* data = RooDataSet::read(“mvalues.dat”,m) ; // fit the data to the model with an UML fit model.fitTo(data) ; Variables Probability density functions Dataset Description, unit, fit and plot ranges, constant/floating status stored in object Derived from Ttree Maps a TTree row onto a set of RFT variable objects Explicitly self-normalized

Wouter Verkerke, UCSB Plotting and Generating ToyMC Plotting –A RooPlot frame collects multiple histograms, curves, text boxes. Persistable object, I.e. can save complex multi-layer plots in batch fit/generation –Automatic adaptive binning for function curves: always smooth functions regardless of data histogram binning –Poisson/binomial errors on histograms (automatically selected) Generating –Works for real and discrete variables // create empty 1-D plot frame for “m” RooPlot* frame = m.frame() ; // plot distribution of m in data on frame data->plotOn(frame) ; // plot model as function of m model->plotOn(frame) ; // Draw the plot on a canvas frame->Draw() ; RooDataSet* toyMCData = model.generate(varList,numEvents) ; RooDataSet* toyMCData = model.generate(varList,protoData) ;

Wouter Verkerke, UCSB Object structure of example PDF RooAddPdf model vars: m,rmass,width,bgshape,frac inputs: signal,bg,frac RooGaussian signal vars: m,rmass,width inputs: m,rmass,width RooExponential bg vars: m,bgshape inputs: m,bgshape RooRealVar m RooRealVar rmass RooRealVar bgshape RooRealVar width RooRealVar frac

Wouter Verkerke, UCSB Generic functions and composition A more complex PDF: –Replace gaussian(m,mean,width) -> gaussian(m,mean,w off +w slope *alpha) –Need object to represent function w off +w slope *alpha Class RooFormulaVar implements expression based functions –Based on Tformula, very practical for such transformations Example of composition: –PDF signal now has 3 extra variables: offset,slope,alpha –Every variable of a PDF can be can be a function of other variables RooRealVar alpha(“alpha”, ”Mystery parameter”, 1.5,1.4, 1.6, “GeV”), slope(“slope”, ”Slope of resonance width”, 0.3,0.1,0.5), offset(“offset”,”Offset of resonance width”, 0.0,0.0,0.5, “GeV”) ; // Construct width object as function of alpha RooFormulaVar rmassF(“rmassF”,”offset+slope*rmass”, RooArgSet(slope,offset,rmass)) // Plug rmassF function in place of rmass variable RooGaussian signal(“signal”,”Signal distribution”, m,rmassF,width) ;

Wouter Verkerke, UCSB Extend PDF: rmass  rmass(alpha) = offset+slope*alpha RooAddPdf model vars: m,alpha,slope,offset,width,bgshape,frac inputs: signal,bg,frac RooGaussian signal vars: m,alpha,slope,offset,width inputs: m,rmassF,width RooExponential bg vars: m,bgshape inputs: m,bgshape RooRealVar m RooFormulaVar rmassF vars: slope,offset,rmass inputs: slope,offset,rmass RooRealVar bgshape RooRealVar width RooRealVar frac RooRealVar alpha RooRealVar slope RooRealVar offset

Wouter Verkerke, UCSB Discrete variables Often a data model includes discrete variables such as particle ID, decay mode, CP eigenvalue etc. Can be represented by RooCategory: –Finite set of labeled states, numeric code optional Various transformation classes available, e.g. –RooMappedCategory: Pattern matching based category-to-category mapping RooMappedCategory decayCP(“decayCP”,decay,”CPunknown”) ; // A derived category decayCP.map(“*KS*”,”CPminus”) ; // Wildcard mapping “*KS*” -> “CPminus” decayCP.map(“*KL*”,”CPplus”) ; RooCategory decay(“decay”,”Decay Mode”) ; // A category variable decay.defineType(“B0 -> J/psi KS”,0) ; // Type definition with explicit index decay.defineType(“B0 -> J/psi KL”) ; // Type definition with automatic index decay.defineType(“B0 -> psi(2s) KS”) ; // Assignment to other RooCategory, string or integer decay = 0 ; decay = “B0 -> J/psi KS” ; decay = otherDecay

Wouter Verkerke, UCSB Under the hood: Integration & Optimization PDF evaluation/normalization speed critical for complex unbinned likelihood fits. RooFitTools implements several strategies to maximize performance Caching & lazy evaluation –The output value of all function objects is cached –Function value only recalculated if any of the input objects change. –Push/pull model: All RFT objects have links to their client and server objects If an object changes value, it pushes a ‘dirty’ flag to all its registered clients. Clients postpone recalculation to next getVal() call, checking the dirty flag at that point. Precalculation of ‘constant’ functions. –If a PDF exclusively depends on variables in the fitted data set constant non-dataset parameters –it is precalculated for each dataset row. (Limits calculation to 1 fit iteration) Hybrid analytical/numerical integration. –PDFs advertises which (partial) analytical integration it can perform –Dedicated RooRealIntegral object, owned by each PDF coordinates maximum possible analytical integration/summation for given configuration. –PDF normalization stored/cached separately from PDF value Dependencies of PDF integral and value are different

Wouter Verkerke, UCSB Lazy evaluation: dirty state propagation RooAddPdf model vars: m,alpha,slope,offset,width,bgshape,frac inputs: signal,bg,frac RooGaussian signal vars: m,alpha,slope,offset,width inputs: m,rmassF,width RooExponential bg vars: m,bgshape inputs: m,bgshape RooRealVar m RooFormulaVar rmassF vars: slope,offset,rmass inputs: slope,offset,rmass RooRealVar bgshape RooRealVar width RooRealVar frac RooRealVar alpha RooRealVar slope RooRealVar offset Example: bgshape is modified Dirty flag set

Wouter Verkerke, UCSB Constant node precalculation example: fit m for signal fraction only RooAddPdf model vars: m,alpha,slope,offset,width,bgshape,frac inputs: signal,bg,frac RooGaussian signal vars: m,alpha,slope,offset,width inputs: m,rmassF,width RooExponential bg vars: m,bgshape inputs: m,bgshape RooRealVar m RooFormulaVar rmassF vars: slope,offset,rmass inputs: slope,offset,rmass RooRealVar bgshape RooRealVar width RooRealVar frac RooRealVar alpha RooRealVar slope RooRealVar offset width.setConstant(kTRUE) ; alpha.setConstant(kTRUE) ; slope.setConstant(kTRUE) ; offset.setConstant(kTRUE) ; bgshape.setConstant(kTRUE) ;

Wouter Verkerke, UCSB Writing your own PDF Easiest way: use RooGenericPdf –No programming required –Like RooFormulaVar, based on TFormula. –Automatic normalization via full numerical integration. Write your own RooAbsPdf derived class –Faster execution, allows (partial) analytical normalization. –Optimization technology requires PDFs to be ‘good objects’, i.e. well defined copy/clone behaviour. Gory details of link management well hidden in RooAbsPdf and proxy classes. –Minimum implementation consists of 3 functions Constructor/Copy constructor evaluate() function – Returns function value –Extended implementation with analytical integration needs also getAnalyticalIntegral() – Indicates which (partial) integrals can be performed analyticalIntegral() - Implements advertised (partial) integrals RooGenericPdfmyPDF(“myPDF”,”exp(abs(x)/sigma)*sqrt(scale)”, RooArgSet(x,sigma,scale)) ;

Wouter Verkerke, UCSB // Constructor RooGaussian::RooGaussian(const char *name, const char *title, RooAbsReal& _x, RooAbsReal& _mean, RooAbsReal& _sigma) : RooAbsPdf(name,title), x("x","Dependent",this,_x), mean("mean","Mean",this,_mean), sigma("sigma","Width",this,_sigma) { } // Copy constructor RooGaussian::RooGaussian(const RooGaussian& other, const char* name) : RooAbsPdf(other,name), x("x",this,other.x), mean("mean",this,other.mean), sigma("sigma",this,other.sigma) { } // Implementation of value calculation Double_t RooGaussian::evaluate( const RooDataSet* dset) const { Double_t arg= x - mean; return exp(-0.5*arg*arg/(sigma*sigma)) ; } // Advertise which partial analytical integrals are supported Int_t RooGaussian::getAnalyticalIntegral( RooArgSet& allV, RooArgSet& anaV) const { if (matchArgs(allV,anaV,x)) return 1 ; return 0 ; } // Implement advertised analytical integrals Double_t RooGaussian::analyticalIntegral(Int_t code) { switch(code) { case 0: return getVal() ; case 1: // integral over x { static Double_t root2 = sqrt(2) ; static Double_t rootPiBy2 = sqrt(atan2(0.0,-1.0)/2.0); Double_t xscale = root2*sigma; return rootPiBy2*sigma* (erf((x.max()-mean)/xscale)- erf((x.min()-mean)/xscale)); } default: assert(0) ; } RooGaussian: minimum impl. Optional integration support Special proxy class holds object references, implement client/server link management Behaves like ‘Double_t’ to user

Wouter Verkerke, UCSB Present use of RooFitTools in BaBar Most analyses are using RooFitTools for their unbinned maximum likelihood fits, including complex fits like –CP analysis (sin2) –Hadronic, semileptonic and dilepton lifetime & mixing analysis (,m d ) –Charmless 2-body decay ( sin2) Example of fit complexity: –the composite PDF of the CP fit has 280 PDF components and 35 free parameters A major redesign of RooFitTools has just been completed, based on experiences of 1 year of intensive use. RooFitTools is a ‘package’ in the BaBar software structure, but has no dependency on any other BaBar code. –It should be straightforward to decouple it completely from BaBar for outside use, or to package it as a ROOT add-on. Documentation –THtml format from source code. –Users guide –Technical design note (in preparation)

Wouter Verkerke, UCSB ROOT problems/limitations ROOT is a great enabling technology, good value. –We are only exercising a small subset of the functionality Const correctness in ROOT version 3 real improvement CINT problems/limitations: –Empirical observation: Function with >10 arguments of the same time fails without proper error message –Zero pointer casting results in non-zero pointer –#include doesn’t execute all code in global file scope Inconvenient, because different behaviour if same code is compiled (ACLIC) ROOT collection classes: –Container classes cannot hold non-TObjects Inconvenient, can e.g. not collect TIterators, Int_t, Double_t etc… –Will STL containers at some point replace TCollection classes?