Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.

Slides:



Advertisements
Similar presentations
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Advertisements

Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
Lecture 3 Getting Started with ITK!. Goals for this lecture Learn how to use Cmake Build ITK Example programs that use ITK.
Setting Limits in the Presence of Nuisance Parameters Wolfgang A Rolke Angel M López Jan Conrad, CERN.
Visual Recognition Tutorial
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
QCD Meeting October 1, 2004 Is it due to the hard collision? Is it due to fragmentation? Strong polarization seen in fixed-target experiments where jet.
Case study - usability evaluation Howell Istance.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Lecture II-2: Probability Review
1 QED In Vivo USB Input Output Box configuration This tutorial contains a number of instructions embedded in a great deal of explanation. Procedures that.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Use a Large Bold Type for the Main Title Use Smaller Type for the Subtitle. Above type is 96 pt, this type is 66 pt Make Authors’ names smaller. This is.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Statistical Methods for Data Analysis Parameter estimates with RooFit Luca Lista INFN Napoli.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Quantitative Skills 1: Graphing
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Lab 3b: Distribution of the mean
WRITING REPORTS Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall 2015.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
Lecture 5 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
August 26, 2003P. Nilsson, SPD Group Meeting1 Paul Nilsson, SPD Group Meeting, August 26, 2003 Test Beam 2002 Analysis Techniques for Estimating Intrinsic.
Background Subtraction and Likelihood Method of Analysis: First Attempt Jose Benitez 6/26/2006.
V0 analytical selection Marian Ivanov, Alexander Kalweit.
Introduction to RooFit W. Verkerke (NIKHEF) 1.Introduction and overview 2.Creation and basic use of models 3.Composing models 4.Working with (profile)
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
PHP Form Processing * referenced from
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.
Getting started – ROOT setup Start a ROOT 5.34/17 or higher session Load the roofit libraries If you see a message that RooFit v3.60 is loaded you are.
S. Ferrag, G. Steele University of Glasgow. RooStats and MClimit comparison Exercise to use RooStats by an MClimit-formatted person: – Use two programs.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
23 Jan 2012 Background shape estimates using sidebands Paul Dauncey G. Davies, D. Futyan, J. Hays, M. Jarvis, M. Kenzie, C. Seez, J. Virdee, N. Wardle.
An Introduction to AD Model Builder PFRP
Hands-on Session RooStats and TMVA Exercises
(Day 3).
Chapter 13 Simple Linear Regression
NETSTORM.
Developments in other math and statistical classes
TNSmooth: Root Multi-dimensional PDFs
OptiSystem applications: SER & BER analysis of QAM-PSK-PAM systems
Linear Regression.
NUUO Tools Welcome to NUUO general education service. This session allows users to have the overview of NUUO tools for system design. (Click)
Ex1: Event Generation (Binomial Distribution)
Two Interpretations of What it Means to Normalize the Low Energy Monte Carlo Events to the Low Energy Data Atms MC Atms MC Data Data Signal Signal Apply.
CMS RooStats Higgs Combination Package
Estimating with PROBE II
Generalized Linear Models (GLM) in R
Modelling data and curve fitting
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Dilepton Mass. Progress report.
Use a Large Bold Type for the Main Title (80 pt):
Permeability (% of Control)
Permeability (% of Control)
Presentation transcript:

Hands-on exercises *

Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5 (SLC5) choose appropriate line below Now move to your personal working area Load the roofit & roostats libraries If you see a message that RooFit v3.10 is loaded you are (almost) ready to go. Import the namespace RooFit in CINT Recommendation: put the last two lines in your ROOT login script to automate the loading –At least for the duration of the tutorial lxplus> source ~verkerke/public/setup_slc4.csh lxplus> source ~verkerke/public/setup_slc4.sh lxplus> source ~verkerke/public/setup_slc5.csh lxplus> source ~verkerke/public/setup_slc5.sh root> gSystem->Load(“libRooStats”) ; root> using namespace RooFit ;

Getting started – Online reference material RooFit class documentation (from code) – – RooFit home page at ROOT web site – –Has links to manual and tutorial macros Input files are –WEB: –CERN: ~verkerke/public

Overview of exercises Stars rate amount of work, not difficulty. Demos require no work Exercise 1 *** –Factory basics, composite models, extended ML fitting, working with ranges, error propagation Demo 1 – FFT convolution Exercise 2 ** (do this one last if your field is not B physics) –Analytical convolution of decay functions with resolution models, visualizing the correlation matrix, visualizing uncertainties on model projections Demo 2 – Simultaneous fitting Exercise 3 * –Workspace persistence Exercise 4 ** –Creating the likelihood function, using interactive MINUIT, plotting likelihood ratios Demo 3 – Likelihood ratio plots Exercise 5 ** –Constructing a profile likelihood, estimating intervals from profile likelihood Exercise 6 * –Multi-core likelihood parallelization

Exercise 1 – Composite models Take input file ex1.C, look at it and run it. Step 1 – Using the factory –Modify the code so that it uses the factory to create the pdf. –Remove the code that creates the pdf directly and import() call. –Run again to verify that you get the same result Step 2 – Adding background –Rename the Gaussian pdf from “model” to “signal”. –Add an ArgusBG model named bkg to the workspace with m0=5.291 (fixed) and a slope of -40 with a range of [-100,0] look in $ROOTSYS/include for the constructor syntax and map that the corresponding factory call –Create a sum of the signal and background with a signal fraction that is 20% (with range 0,1) –Rerun the macro –Add a plotOn() call that draws the background component of model using a Components() argument and give it a dashed linestyle (add LineStyle(kDashed)). –Call Print() on the workspace to see the contents. Also call Print(“t”) to see the same contents shown as a tree structure

Exercise 1 – Composite models Step 3 – Making an extended ML fit –Rewrite the SUM() string so that it construct a pdf suitable for extended ML fitting: Multiply the signal pdf by Nsig (200 events, range 0,10000) and the background pdf by Nbkg (800 events, range 0,10000) Step 4 – Simple use of ranges –Define a ‘signal range’ in observable mes: w.var(“mes”)->setRange(“sigrange”,5.27,5.29) ; –Create an integral object that represents the fraction of background events in the signal range w.factory(“int::bkg_frac_sigRange(bkg,mes|sigrange,mes)”) ; the first mes indicate which observable to integrate over and in which range, the second mes indicates which observables to normalize over. (Without a range specification this would result in 1 by construction) –Retrieve the value of the fraction by calling w.function(“bkg_frac_sigRange”)->getVal() ;

Exercise 1 – Composite models –Now construct a formula named Nbkg_sigRange that expresses the number of signal events in the signal range: use product operator w.factory(“prod::Nbkg_SigRange(Nbkg,bkg_frac_sigRange)”) –Evaluate the Nbkg_sigRange function in the workspace to count the number of signal events in the range [5.27,5.29] Step 5 – Linear error propagation –Now we calculate the error on Nbkg_SigRange. To that end we first need to save a RooFitResult object from the fitTo() operation: Save the RooFitResult* pointer returned by fitTo() in an object named fr, and add a Save() argument to fitTo() to instruct to make sure an fit resulted will be returned. –Calculate the error on the number of signal events by calling w.function(“Nbkg_SigRange”)->getPropagatedError(*fr) ;

Demo 1 – FFT convolution of arbitrary pdfs NB: This demo run at CERN only, because it requires ROOT to be configured with FFTW support (it’s easy & free to install FFTW on your laptop if you want it) Copy ~verkerke/public/fftdemo.C and run it This macro demonstrates how the FCONV fourier convolution operator is used to convolute a Landau pdf with a Gaussian resolution model A binned likelihood fit of the numerically convoluted pdf with three floating parameters takes ~1 second

Exercise 2 – B physics decay with resolution Take input file ex2.C look at it and run it –The input macro constructs a B Decay distribution with mixing without resolution effect (convolution with delta function). It then generates some data and plots the decay distribution of mixed and unmixed events separately, as well as the mixing asymmetry. Step 1 – Adding a resolution –Using the factory, construct a Gaussian resolution model (class RooGaussModel) with mean 0 (fixed) and width 2 (floating, range ) and change the decay pdf to use that resolution model. Rerun the macro and observe the effect on the decay distributions and the asymmetry plot. –Now construct a composite resolution model consisting of two Gaussians: 80% (fixed) of a narrow Gaussian (mean 0, width 1 (floating)) and the remainder a wide Gaussian (mean 0, width 5 (floating)). Rerun the macro and observe the effect on the decay distributions and the asymmetry plot.

Exercise 2 – B physics decay with resolution Step 2 – Visualize the correlation matrix –Look at the correlation matrix of the fit. To make a visual presentation of the correlation matrix, save the RooFitResult object from the fitTo() command (don’t forget to add Save() as well) add the following code gStyle->SetPalette(1) ; fr->correlationHist()->Draw(“colz”) ; –What are the largest correlations? If correlations are very strong (>>0.9) the model may become unstable and it may be worthwhile to fix one of the parameters in the fit. This works best if the correlation is between two nuisance parameters (i.e. non- physics parameters such as the mistag rate) If a correlation is between a parameter of interest (=physics, e.g. tau, Δm) and a nuisance parameter (=others, e.g. mistag rate) fixing a nuisance parameter will strongly underestimate the uncertainty on physics parameter and you’ll need another strategy to control the error on the nuisance parameter.

Exercise 2 – B physics decay with resolution Step 3 – Visualize the uncertainty on the asymmetry –You can also visualize the uncertainty on the asymmetry curve through linear propagation of the covariance matrix of the fit parameters. To do so duplicate the plotOn() call for the asymmetry curve in the macro and add the following argument to the first call VisualizeError(*fr),FillColor(kOrange)) ;

Demo 2 – simultaneous fitting Copy ~verkerke/public/simfitdemo.C and run it This macro demonstrates techniques to make simultaneous fits to a ‘signal’ and ‘control’ samples in multiple ways 1.Plain fit to signal sample with sigPdf+BkgPdf 2.Plain fit to control sample with sigPdf+BkgPdfCtrl 3.Simultaneous fit to signal and control samples 4.Construct a pdf on sigPdf parameters from fit 2), multiplied with pdf for signal sample. –Equivalent to 3) in the approximation of a parabolic likelihood for the control sample

Exercise 3 – Persisting your model Copy ~verkerke/public/ex3a.C look at it and run it At the end of the macro, import the toy data that is generated into the workspace as follows –w.import(data,Rename(“data”)) ; Write your workspace to file –using the method w.writeToFile(“model.root”). Now quit your ROOT session Copy ~verkerke/public/ex3b.C. –This macro will read in your model.root file and plot the pdf and dataset contained in it Look at the macro and run it

Exercise 4 – Working with the likelihood Copy ex3b.C to ex4.C Remove the plotting code and add a line to create a function object that represents the –log(likelihood) –Use method RooAbsPdf::createNLL(RooAbsData&), the returned object is of type RooAbsReal* –See page 41 in the presentation for help Minimize the likelihood function ‘by hand’ by passing it to a RooMinuit object and calling its methods migrad() and hesse() –See page page 42 in the presentation for help (also for below) –Now call the minos() function only for parameter Nsig. –Call w::Nsig.Print() afterwards to see that the asymmetric error has been propagated –Fix the width of the Gaussian and run minos again and observe the effect. (use w::sigma.setConstant(kTRUE))

Exercise 4 – Working with the likelihood Make a plot of –log(L) vs Nsig –First create a plot frame in the parameter using RooPlot* frame = w::Nsig.frame() ; –Now plot the likelihood function on the frame, using plotOn() as usual –If you like you can add a ShiftToZero() argument to the plotOn() call and see what that does –You can adjust the virtual range of the plot frame with SetMinimum() and SetMaximum().

Demo 3 – n-Dim models and likelihood ratio plot Copy ~verkerke/public/llrplot.C and run it This macro builds a 3-dimensional model –Flat background in (x,y,z) –Gaussian signal in (x,y,z) with correlations It plots three 2D projections (x,y), (x,z) and (y,z) Then it makes three varieties of 1D plots of model and data –Plain projection on x (shows lots of background) –Projection on x in a ‘signal box’ in (y,z) –Projection on x with a cut on the LR(y,z)>68%, where LR(y,z) is defined as (i.e. the signal probability according to the model using the (y,z) observables only)

Exercise 5 – Profile likelihood Copy ~verkerke/public/ex4.C (standard solution to ex4) to ex5.C Adjust the horizontal plot range of the likelihood plot so that it just covers the interval ΔLL=+25 units –Make a new plot frame that zooms in on that range and plot the likelihood again (you can use myparam.frame(pmin,pmax) to control the plot range) Create the profile likelihood function in Nsig –Call createProfile(w::Nsig) on the likelihood and save the returned pointer to the profile likelihood function (of type RooAbsReal*) –Plot the profile likelihood ratio on the Nsig frame too (make it red by adding a LineColor(kRed)) Find the profile likelihood ratio interval of Nsig : find the points at which the PLR rises by +0.5 units –Compare the interval to that of the MINOS error of exercise Ex 4.

Exercise 6 – Parallelizing the likelihood calculation NB: Likelihood parallelization is only supported on UNIX-style platforms (linux,mac=yes, windows=no) Check the number of CPU cores available on the current host (‘cat /proc/cpuinfo’) Modify the createNLL() call of ex5 to take an extra NumCPU(N) argument –The likelihood calculation will now be parallelized over N cores Rerun ex5 and observe the difference in wall-time execution speed. –The speedup is best demonstrated on an empty worker node (your best is lx64slc5)