Statistical Methods for Data Analysis Modeling PDF’s with RooFit

Slides:



Advertisements
Similar presentations
Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Advertisements

Statistical Methods for Data Analysis a RooStats example
Statistical Methods for Data Analysis Random numbers with ROOT and RooFit Luca Lista INFN Napoli.
Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli.
Introduction to Java 2 Programming Lecture 3 Writing Java Applications, Java Development Tools.
/ Elementi di C++ Introduzione a ROOT , Laboratorio Informatico ROOT warm up , Laboratorio Informatico Introduzione a.
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
System Analysis System Analysis - Mr. Ahmad Al-Ghoul System Analysis and Design.
VORTEX Version Software Application Sociology; Marketing research; Social-psychological research Social-medical research Staff recruitment, staff.
Windhoek, 21 Nov. 02Education Sector Analysis1 IIEP/WGESA/2002/INF. 4 Education Sector Analysis ADEA Working Group on Education Sector Analysis (ESA) Presented.
CE881: Mobile and Social Application Programming Simon M. Lucas Menus and Dialogs.
Slides 2c: Using Spreadsheets for Modeling - Excel Concepts (Updated 1/19/2005) There are several reasons for the popularity of spreadsheets: –Data are.
Luca Lista, IEEE NSS-MIC 2003, Portland A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista 1, Francesco Fabozzi 1,2.
1 In the meantime Varied p T cut (1.5 GeV, 1 GeV, 500 MeV) 2. Allowed for events with 2 good tracks only (+  ), originally 4 good tracks where required,
A Package For Tracking Validation Chris Meyer UC Santa Cruz July 6, 2007.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
RooFit A tool kit for data modeling in ROOT
RooFit Introduction Basic functionality Addition and convolution
7/2/2015 IENG 486 Statistical Quality & Process Control 1 IENG Lecture 05 Interpreting Variation Using Distributions.
1 ECE310 – Lecture 23 Random Signal Analysis 04/27/01.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
MOUSING WITH SPSS Frances Provan, Information Services, Edinburgh University Useful point and click.
A Comprehensive Computer Application for Managing Sample Planning, Electronic Data Manipulation and Data Validation and Verification in Support of the.
Statistical Methods for Data Analysis Parameter estimates with RooFit Luca Lista INFN Napoli.
RooFit/RooStats Tutorial CAT Meeting, June 2009
RooRarFit Tutorial Lei Zhang UC Riverside. 2 What is RooRarFit A general ML fitter based on ROOT/RooFit Why use RooRarFit? A question asked by many people:
RooFit A tool kit for data modeling in ROOT
Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)
RooFit – tools for ML fits Authors: Wouter Verkerke and David Kirkby.
Signal and Background Modeling for H → 4l Peter Vankov UK Higgs Meeting, RAL
10/31/2015PHYS 3446 DØ Data Analysis with ROOT Venkat (for Dr.Yu)
Recap Sum and Product Functions Matrix Size Function Variance and Standard Deviation Random Numbers Complex Numbers.
Documentation javadoc. Documentation not a programmer's first love lives in a separate file somewhere usually a deliverable on the schedule often not.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Fitting in AIDA General Concepts Requirements JAIDA Examples Interfaces Overview Conclusions.
Statistical Methods for Data Analysis Introduction to the course Luca Lista INFN Napoli.
1 Topic 5 - Joint distributions and the CLT Joint distributions –Calculation of probabilities, mean and variance –Expectations of functions based on joint.
Introduction to RooFit W. Verkerke (NIKHEF) 1.Introduction and overview 2.Creation and basic use of models 3.Composing models 4.Working with (profile)
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
Review of Statistical Terms Population Sample Parameter Statistic.
Wouter Verkerke, UCSB RooFitTools A general purpose tool kit for data modeling, developed in BaBar Wouter Verkerke (UC Santa Barbara) David Kirkby (Stanford.
LIGO-G9900XX-00-M DMT Monitor Verification with Simulated Data John Zweizig LIGO/Caltech.
Conditional Observables Joe Tuggle BaBar ROOT/RooFit Workshop December 2007.
Getting started – ROOT setup Start a ROOT 5.34/17 or higher session Load the roofit libraries If you see a message that RooFit v3.60 is loaded you are.
RooFit Tutorial – Topical Lectures June 2007
Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.
Introduction to RooFit
Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
Barbara Storaci, Nicola Serra, Niels Tuning Bs →μ+μ- Bfys-meeting, 15 th May
RooFit A tool kit for data modeling in ROOT
Hands-on Session RooStats and TMVA Exercises
(Day 3).
ROOT: Functions & Histograms
TNSmooth: Root Multi-dimensional PDFs
Statistical methods in LHC data analysis introduction
ROOT: Functions & Fitting
2D fit Zheng Wang 2009/05/20.
RooFit A general purpose tool kit for data modeling
Lab 2 Data Manipulation and Descriptive Stats in R
EE/CSE 576 HW 1 Notes.
EE/CSE 576 HW 1 Notes.
S.Linev, J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel
Statistical Methods for Data Analysis Parameter estimates with RooFit
JMP 11 added new features and improvements to CCB and MSA.
Statistical Methods for Data Analysis Modeling PDF’s with RooFit
Statistical Methods for Data Analysis Random numbers with ROOT and RooFit Luca Lista INFN Napoli.
ECE/CSE 576 HW 1 Notes.
Templates Generic Programming.
Presentation transcript:

Statistical Methods for Data Analysis Modeling PDF’s with RooFit Luca Lista INFN Napoli

Statistical Methods for Data Analysis Credits RooFit slides and examples extracted and/or inspired by original presentations by Wouter Verkerke under the author’s permission Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Prerequisites RooFit is a tool designed to work within ROOT framework RooFit is distributed together with ROOT in recent versions Must install the full ROOT release to also have RooFit From CINT prompt, load RooFit shared library: gSystem->Load(“libRooFit.so”); Luca Lista Statistical Methods for Data Analysis

Variables/parameters definition Variables and parameters are not distinct with RooFit RooRealVar x("x", "x coordinate", -1, 1); RooRealVar mu("mu", "average", 0, -5, 5); RooRealVar sigma("sigma", “r.m.s.", 1, 0, 5); x = 1.2345; x.Print(); Assignment beyond limits are brought back at extreme values: x = 3; [#0] WARNING:InputArguments -- RooAbsRealLValue::inFitRange(mu): value 3 rounded down to max limit 1 name description range initial value Luca Lista Statistical Methods for Data Analysis

PDF definition and plotting // Build Gaussian PDF RooRealVar x("x","x",-10,10); RooRealVar mean("mean","mean of gaussian",0,-10,10); RooRealVar sigma("sigma","width of gaussian",3); RooGaussian gauss("gauss","gaussian PDF",x,mean,sigma); // Plot PDF RooPlot* xframe = x.frame(); gauss.plotOn(xframe); xframe->Draw(); Axis label from gauss title Unit normalization A RooPlot is an empty frame capable of holding anything plotted versus it variable Luca Lista Statistical Methods for Data Analysis Plot range taken from limits of x

Plotting in more dimensions No equivalent of RooPlot for >1 dimensions Usually >1D plots are not overlaid anyway Easy to use createHistogram() methods provided in both RooAbsData and RooAbsPdf to fill ROOT 2D,3D histograms TH2D* ph2 = pdf.createHistogram(“ph2”,x,YVar(y)) ; TH2* dh2 = data.createHistogram(“dg2",x,Binning(10), YVar(y,Binning(10))); ph2->Draw("SURF"); dh2->Draw("LEGO"); Luca Lista Statistical Methods for Data Analysis

Pre-defined PDF’s RooFit provides a variety of pre-defined PDF’s Automatic normalization in the variable range provided by RooFit Roo2DKeysPdf RooArgusBG RooBCPEffDecay RooBCPGenDecay RooBDecay RooBMixDecay RooBifurGauss RooBlindTools RooBreitWigner RooBukinPdf RooCBShape RooChebychev RooDecay RooDstD0BG RooExponential RooGExpModel RooGaussModel RooGaussian RooKeysPdf RooLandau RooNonCPEigenDecay RooNovosibirsk RooParametricStepFunction RooPolynomial RooUnblindCPAsymVar RooUnblindOffset RooUnblindPrecision RooUnblindUniform RooVoigtian ... Luca Lista Statistical Methods for Data Analysis

PDF inferred from histogram Will highlight two types of non-parametric p.d.f.s Class RooHistPdf – a p.d.f. described by a histogram Not so great at low statistics (especially problematic in >1 dim) dataHist RooHistPdf(N=0) RooHistPdf(N=4) // Histogram based p.d.f with N-th order interpolation RooHistPdf ph("ph", "ph", x,*dataHist, N) ; Luca Lista Statistical Methods for Data Analysis

Kernel estimated PDF Class RooKeysPdf – A kernel estimation p.d.f. Uses unbinned data Idea represent each event of your MC sample as a Gaussian probability distribution Add probability distributions from all events in sample Gaussian probability distributions for each event Summed probability distribution for all events in sample Sample of events Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Custom PDF’s String based description (RooGenericPdf) RooRealVar x("x", "x", -10, 10); RooRealVar y("y", "y", 0, 5); RooRealVar a("a", "a", 3.0); RooRealVar b("b", "b", -2.0); RooGenericPdf pdf("pdf", "my pdf", "exp(x*y+a)-b*x", RooArgSet(x, y, a, b); Variable and parameter list is taken from the data set one wants to analyze Note that plotting requires x.frame() ! Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Writing PDF’s in C++ Generate a class skeleton directly within ROOT prompt: gSystem->Load("libRooFit.so"); RooClassFactory::makePdf("RooMyPdf","x,alpha"); ROOT will create two files definig a subclass of RooAbsPdf: RooMyPdf.cxx RooMyPdf.h Edit the skeleton cxx file and implement the method: Double_t RooMyPdf::evaluate() const { return exp(-alpha*x*x) ; } User your new class as PDF model ini RooFit Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Overload PDF defaults Overloading default numerical integration: Int_t getAnalyticalIntegral(const RooArgSet& integSet, RooArgSet& anaIntSet); Double_t analyticalIntegral(Int_t code); Overloading default hit or miss generator: Int_t getGenerator(const RooArgSet& generateVars, RooArgSet& directVars); void generateEvent(Int_t code); integSet: set of dependents for which integration is requested copy the subset of dependents it can analytically integrate to anaIntSet Return non-null codes for supported integral Perform analytical integration for given code Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Combining PDF’s Multiplication Addition Composition Convolution Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Adding PDF’s Add more PDF’s with different fractions n - 1 fractions are provided; the last fraction is 1 - i fi RooRealVar x("x", "x", -10, 10); RooRealVar mu("mu", "average", 0, -1, 1); RooRealVar sigma("sigma", "r.m.s", 1, 0, 5); RooGaussian gauss("gauss","gaussian PDF", x, mu, sigma); RooRealVar lambda("lambda", "exponential slope", -0.1); RooExponential expo("expo", "exponential PDF", x, lambda); RooRealVar f("f", "gaussian fraction", 0.5, 0, 1); RooAddPdf sum("sum", "g+e", RooArgList(gauss, expo), RooArgList(f)); Can plot the different components separately RooPlot * xFrame = x.frame(); sum.plotOn(xFrame, RooFit::LineColor(kRed)) ; sum.plotOn(xFrame, RooFit::Components(expo), RooFit::LineColor(kBlue)); Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Multiplying PDF’s Produces product of PDF’s in more dimensions: RooRealVar x("x", "x", -10, 10); RooRealVar y("y", "y", -10, 10); RooRealVar mux("mux", "average-x'", 0, -1, 1); RooRealVar sigmax("sigmax", "sigma-x'", 0.5, 0, 5); RooGaussian gaussx("gaussx","gaussian PDF x'", x, mux, sigmax); RooRealVar muy("muy", "average-y'", 0, -1, 1); RooRealVar sigmay("sigmay", "sigma-y'", 1.5, 0, 5); RooGaussian gaussy("gaussy","gaussian PDF y'", y, muy, sigmay); RooProdPdf gaussxy("gaussxy", "gaussxy", RooArgSet(gaussx, gaussy)); PDF’s can’t share dependent components Luca Lista Statistical Methods for Data Analysis

Composition of functions Some of PDF parameters can be defined as RooFormulaVar, being function of other PDF’s RooRealVar x("x", "x", -10, 10); RooRealVar y("y", "y", 0, 3); RooRealVar a("a", "a", 3.0); RooRealVar b("b", "b", -2.0); RooFormulaVar mean("mean", "a+b*y", RooArgList(a, b, y)); RooRealVar sigma("sigma", "r.m.s", 1, 0, 5); RooGaussian gauss("gauss","gaussian PDF", x, mean, sigma); Needs some string interventions Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Convolution RooResolutionModel is a base class for all PDF that can model a resolution Specialization of ordinary PDF Special cases are provided by RooFit for fast analytical convolution E.g.: Exp Gaussian RooRealVar x(“x”,”x”,-10,10); RooRealVar meanl(“meanl”, ”mean of Landau”, 2); RooRealVar sigmal(“sigmal”,”sigma of Landau”,1); RooLandau landau(“landau”, ”landau”,x, meanl, sigmal); RooRealVar meang(“meang”, ”mean of Gaussian”, 0); RooRealVar sigmag(“sigmag”, ”sigma of Gaussian”, 2); RooGaussian gauss(“gauss”, ”gauss”, x, meang, sigmag); RooNumConvPdf model(“model”, ”model”, x, landau, gauss); May be slow! Integration range may be specified: landau.setConvolutionWindow(meang, sigmag, 5) Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis References RooFit home: http://roofit.sourceforge.net/ RooFit online tutorial http://roofit.sourceforge.net/docs/tutorial/ index.html Luca Lista Statistical Methods for Data Analysis