Hands-on Session RooStats and TMVA Exercises

Slides:



Advertisements
Similar presentations
Statistical Methods for Data Analysis a RooStats example
Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
27 th March CERN Higgs searches: CL s W. J. Murray RAL.
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
7/12/2015 Top Pairs Meeting 1 A template fit technique to measure the top quark mass in the l+jets channel Ulrich Heintz, Vivek Parihar.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Discovery Experience: CMS Giovanni Petrucciani (UCSD)
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography.
Results of combination Higgs toy combination, within and across experiments, with RooStats Grégory Schott Institute for Experimental Nuclear Physics of.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
A taste of statistics Normal error (Gaussian) distribution  most important in statistical analysis of data, describes the distribution of random observations.
Statistical Methods for Data Analysis Introduction to the course Luca Lista INFN Napoli.
ROOT and statistics tutorial Exercise: Discover the Higgs, part 2 Attilio Andreazza Università di Milano and INFN Caterina Doglioni Université de Genève.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
DIJET STATUS Kazim Gumus 30 Aug Our signal is spread over many bins, and the background varies widely over the bins. If we were to simply sum up.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Conditional Observables Joe Tuggle BaBar ROOT/RooFit Workshop December 2007.
Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.
1 Comparing Unbinned likelihood methods IceCube/Antares Common Point source analysis from IC22 & Antares 2007/2008 data sets J. Brunner.
Getting started – ROOT setup Start a ROOT 5.34/17 or higher session Load the roofit libraries If you see a message that RooFit v3.60 is loaded you are.
Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
S. Ferrag, G. Steele University of Glasgow. RooStats and MClimit comparison Exercise to use RooStats by an MClimit-formatted person: – Use two programs.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
Max Baak (CERN) 1 Summary of experiences with HistFactory and RooStats Max Baak (CERN) (on behalf of list of people) RooFit / RooStats meeting January.
Using RooFit/RooStat in rare decay searches Serra, Storaci, Tuning Rare Decays WG.
(Day 3).
Status of the Higgs to tau tau
arXiv:physics/ v3 [physics.data-an]
The expected confident intervals for triple gauge coupling parameter
Multichannel number counting experiments
The asymmetric uncertainties on data points in Roofit
Statistical Methods used for Higgs Boson Searches
BAYES and FREQUENTISM: The Return of an Old Controversy
Ex1: Event Generation (Binomial Distribution)
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Confidence Intervals and Limits
CMS RooStats Higgs Combination Package
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Grégory Schott Institute for Experimental Nuclear Physics
Comment on Event Quality Variables for Multivariate Analyses
Lecture 4 1 Probability (90 min.)
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
TESTING HYPOTHESES AND ASSESSING GOODNESS OF FIT
THE CLs METHOD Statistica per l'analisi dei dati – Dottorato in Fisica – XXIX Ciclo Alessandro Pistone.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistical Methods for the LHC
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

Hands-on Session RooStats and TMVA Exercises

Getting Started A few advice words: Install version 5.25.02 locally setting up . root_Installaton_directory/bin/thisroot.sh (or .csh) recommend to compile your macros and run with: .x macro.C+; or .L macro.C+ then macro();, include: using namespace RooFit; using namespace RooStats;, avoid running a macro multiple times in the same ROOT session If not working locally: setting ROOT 5.25.02 using the CERN AFS installation: on SLC4: /afs/cern.ch/sw/lcg/app/releases/ROOT/5.25.02/slc4_amd64_gcc34/root/bin/thisroot.csh or .sh

RooStats Exercises (1) Generate a Poisson Model (use RooFit factory) Use RooStats calculators class to find limits, significances and produce plots // Use a RooWorkspace to store the PDF models, prior // informations, list of parameters, ... myWS.factory("S[2,0,10]"); // default value 2 and range [0,10] // Number of signal and background events RooWorkspace myWS("myWS"); // Observable (dummy observable used) myWS.factory("B[1]"); // value fixed to 1 // Signal and background distribution of the observable myWS.var("x")->setBins(1); myWS.factory("x[0,1]"); // arbitrary range [0,1] myWS.factory("Uniform::bkgPdf(x)"); myWS.factory("Uniform::sigPdf(x)"); myWS.factory("ExtendPdf::modelBkg(bkgPdf,B)"); myWS.factory("SUM::model(S*sigPdf,B*bkgPdf"); // S+B and B-only models (both extended PDFs) RooAbsData* data = myWS.pdf("model")->generateBinned(*myWS.var("x"),myWS.var("S")->getVal()+myWS.var("B")->getVal(),Name("data")); // generate binned data with fixed number of events //plot the data

ProfileLikelihood Exercise ProfileLikelihoodCalculator get interval: get significance: RooRealVar * S = myWS.var("S");RooArgSet POI(*S);RooAbsPdf * model = myWS.pdf("model");ProfileLikelihoodCalculator plc(*data,*model,POI); //set thest sizeplc.SetTestSize(0.10); model->fitTo(*data,SumW2Error(kFALSE));LikelihoodInterval * interval = plc.GetInterval();const double lowerLimit = interval->LowerLimit(*S);const double upperLimit = interval->UpperLimit(*S);LikelihoodIntervalPlot lplot(interval);lplot.Draw() // Create a copy of the POI parameters to set the values to zeroRooArgSet nullparams;nullparams.addClone(*myWS.var("S"));((RooRealVar *) (nullparams.first()))->setVal(0);plc.SetNullParameters(nullparams); HypoTestResult* plcResult = plc.GetHypoTest();const double significance = plcResult->Significance(); //get significance

Result for S=2 you should get: Significance = 1.60987

Gaussian Model Generate Gaussian signal over flat background systematics in sigma of mass and background RooWorkspace myWS("myWS"); // ObservablemyWS.factory("mass[0,500]"); // range [0,500]// Signal and background distribution of the observable myWS.factory("Gaussian::sigPdf(mass,200,sigSigma[0,100])") ;myWS.factory("Uniform::bkgPdf(mass)") ;myWS.factory("SUM::model(S[5,0,30]*sigPdf,B[10,0,100]*bkgPdf") ;// Background only pdfmyWS.factory("ExtendPdf::modelBkg(bkgPdf,B)") ; // Prior for signalmyWS.factory("Uniform::priorPOI(S)") ;// Priors for nuisance parameters (signal + backg) myWS.factory("Gaussian::prior_sigSigma(sigSigma,50,5)") ;myWS.factory("Gaussian::prior_B(B,10,3)") ;myWS.factory("PROD::priorNuisance(prior_sigSigma,prior_B)") ; RooAbsData * data = myWS.pdf("model")->generate(*myWS.set("observables"),Extended(),Name("data")); myWS.defineSet("observables","mass"); // generate unbinned data

RooStats Exercise Compute: 68% CL 2-sided confidence interval and significance from the (profiled-) likelihood ratio plot profile log-likelihood ratio Frequentist p-value in the S+B and B-only hypotheses, signal significance, CL_S ratio (HybridCalculator) Repeat including systematic uncertainty in the background and signal width Use also Bayesian calculator , MCMCCalculator and Neyman construction for finding limits (with Poisson or Gaussian example) More examples and code available in https://twiki.cern.ch/twiki/bin/view/RooStats/TutorialsOctober2009

HybridCalculator //Run and retrieve the results HybridCalculator hc("hc","HybridCalculator",*data,*modelSB,*modelB); hc.SetNumberOfToys(5000); hs.SetTestStatistics(1); //Run and retrieve the results HybridResult* hcResult = hc.GetHypoTest(); double p_value_sb = hcResult->AlternatePValue(); double p_value_b = hcResult->NullPValue(); double cl_s = hcResult->CLs(); double significance = hcResult->Significance(); //Making a plot of the results HybridPlot* hcPlot = hcResult->GetPlot("hcPlot","p-Values plot",100); hcPlot->Draw();

Bayesian Calculator //Compute the credibility interval BayesianCalculator bc(data,*model,RooArgSet(*POI),*priorPOI,&nuisanceParameters); //Compute the credibility interval //Set the confidence level of the credibility interval and compute it. Returns a SimpleInterval. bc.SetTestSize(0.05); SimpleInterval* interval = bc.GetInterval(); double lowerLimit = interval->LowerLimit(); double upperLimit = interval->UpperLimit(); // The code below produce a plot: RooAbsPdf* fPosteriorPdf = bcalc.GetPosteriorPdf(); RooPlot* plot = POI->frame(); plot->SetTitle(TString("Posterior probability of parameter \"")+TString(POI->GetName())+TString("\"")); fPosteriorPdf->plotOn(plot,RooFit::Range(interval->LowerLimit(),interval->UpperLimit(),kFALSE),RooFit::VLines(),RooFit::DrawOption("F"),RooFit::MoveToBack(),RooFit::FillColor(kGray)); fPosteriorPdf->plotOn(plot); plot->GetYaxis()->SetTitle("posterior probability"); plot->Draw();

RooStats Exercises (2) Generate a multi-dimensional model have both S and mass as parameter of interest Compare results of Profile likelihood, Neyman construction and MCMC calculator examples are to be downloaded from: http://www.cern.ch/moneta/temp/roostats2.tar run to generate data: rs500e_PrepareWorkspace_GaussOverFlat_withSystematics_floatingMass.C to run exercise: rs501_ThreeTypesOfLimits.C

TMVA exercises Download macro code from http://www.cern.ch/moneta/temp/tmva_exercises.tar Try first generating different data-sets (use RooFit for doing it ) macro makesample.C usage: makesample(int sampleID) ID=0,1,2,3,4 depending on type of data to generate Try and play with the various methods macro driver.C driver(Int_t sampleID = 0, TString obsList = "x,y,z", TString myMethodList = "Fisher,Likelihood,MLP,BDT") Obtaine TMVA GUI from macro produce plots of the signal and background distribution look at the result, classifier outputs and performances (ROC curve )