Wouter Verkerke, NIKHEF RooFit – status & plans Wouter Verkerke (NIKHEF)

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Experimental Particle Physics PHYS6011 Joel Goldstein, RAL 1.Introduction & Accelerators 2.Particle Interactions and Detectors (2) 3.Collider Experiments.
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
KET-BSM meeting Aachen, April 2006 View from the Schauinsland in Freiburg a couple of weeks ago View from the Schauinsland in Freiburg a couple of weeks.
1 Data Analysis II Beate Heinemann UC Berkeley and Lawrence Berkeley National Laboratory Hadron Collider Physics Summer School, Fermilab, August 2008.
Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT (W. Verkerke, D. Kirkby) RooStats A tool kit for statistical analysis (K. Cranmer,
Current limits (95% C.L.): LEP direct searches m H > GeV Global fit to precision EW data (excludes direct search results) m H < 157 GeV Latest Tevatron.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
A Bayesian Analysis of Parton Distribution Uncertainties Clare Quarman Atlas UK Physics meeting – UCL 15 th Dec 2003.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
8. Hypotheses 8.4 Two more things K. Desch – Statistical methods of data analysis SS10 Inclusion of systematic errors LHR methods needs a prediction (from.
Recent Electroweak Results from the Tevatron Weak Interactions and Neutrinos Workshop Delphi, Greece, 6-11 June, 2005 Dhiman Chakraborty Northern Illinois.
Top Physics at the Tevatron Mike Arov (Louisiana Tech University) for D0 and CDF Collaborations 1.
On the Trail of the Higgs Boson Meenakshi Narain.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical analysis tools for the Higgs discovery and beyond
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Measurements, Model Independence & Monte Carlo Jon Butterworth University College London ICTP/MCnet school São Paulo 27/4/2015.
E. Devetak - LCWS t-tbar analysis at SiD Erik Devetak Oxford University LCWS /11/2008 Flavour tagging for ttbar Hadronic ttbar events ID.
RooFit A tool kit for data modeling in ROOT
Irakli Chakaberia Final Examination April 28, 2014.
Matthew Schwartz Harvard University with J. Gallicchio, PRL, 105:022001,2010 (superstructure) with K. Black, J. Gallicchio, J. Huth, M. Kagan and B. Tweedie.
G.Corti, P.Robbe LHCb Software Week - 19 June 2009 FSR in Gauss: Generator’s statistics - What type of object is going in the FSR ? - How are the objects.
CP violation measurements with the ATLAS detector E. Kneringer – University of Innsbruck on behalf of the ATLAS collaboration BEACH2012, Wichita, USA “Determination.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory ATLAS RAL Physics Meeting 20 th May 2008.
W+jets and Z+jets studies at CMS Christopher S. Rogan, California Institute of Technology - HCP Evian-les-Bains Analysis Strategy Analysis Overview:
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Question paper 1997.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
Top quark IIHE Future interests  Towards data taking → new software  Have a broader physics output  Link theory-experiment  Calibration and.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
Top mass error predictions with variable JES for projected luminosities Joshua Qualls Centre College Mentor: Michael Wang.
23/2/2000Status of GAUDI 1 P. Mato / CERN Computing meeting, LHCb Week 23 February 2000.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
SHERPA Simulation for High Energy Reaction of PArticles.
1 TOP MASS MEASUREMENT WITH ATLAS A.-I. Etienvre, for the ATLAS Collaboration.
LCG – AA review 1 Simulation LCG/AA review Sept 2006.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Stano Tokar, slide 1 Top into Dileptons Stano Tokar Comenius University, Bratislava With a kind permissison of the CDF top group Dec 2004 RTN Workshop.
H->bb Note Plans for Summer Ricardo Gonçalo (RHUL) on behalf of the HSG5 H->bb group Higgs Working Group Meeting, 9 June 2011.
Living Long At the LHC G. WATTS (UW/SEATTLE/MARSEILLE) WG3: EXOTIC HIGGS FERMILAB MAY 21, 2015.
1 UCSD Meeting Calibration of High Pt Hadronic W Haifeng Pi 10/16/2007 Outline Introduction High Pt Hadronic W in TTbar and Higgs events Reconstruction.
Wouter Verkerke, Nikhef Systematic uncertainties and profiling Wouter Verkerke (Nikhef/Atlas)
TtH(H->bb) searches in ATLAS and CMS Ricardo Gonçalo Collider Cross Talk, 18 October 2012.
Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues.
Extrapolation Techniques  Four different techniques have been used to extrapolate near detector data to the far detector to predict the neutrino energy.
SEARCH FOR DIRECT PRODUCTION OF SUPERSYMMETRIC PAIRS OF TOP QUARKS AT √ S = 8 TEV, WITH ONE LEPTON IN THE FINAL STATE. Juan Pablo Gómez Cardona PhD Candidate.
Introduction 08/11/2007 Higgs WG – Trigger meeting Ricardo Gonçalo, RHUL.
Investigation on CDF Top Physics Group Ye Li Graduate Student UW - Madison.
Analysis Tools interface - configuration Wouter Verkerke Wouter Verkerke, NIKHEF 1.
Study of Diboson Physics with the ATLAS Detector at LHC Hai-Jun Yang University of Michigan (for the ATLAS Collaboration) APS April Meeting St. Louis,
Using RooFit/RooStat in rare decay searches Serra, Storaci, Tuning Rare Decays WG.
RooFit A tool kit for data modeling in ROOT
(Day 3).
Status of the Higgs to tau tau
CMS RooStats Higgs Combination Package
Multi-dimensional likelihood
Data Analysis in Particle Physics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Application of CLs Method to ATLAS Higgs Searches
Top mass measurements at the Tevatron and the standard model fits
Greg Heath University of Bristol
Presentation transcript:

Wouter Verkerke, NIKHEF RooFit – status & plans Wouter Verkerke (NIKHEF)

What is Roofit? RooFit is a language to formulate models to describe your data Relates to end-game of (nearly) all HEP physics: statistical analysis. Original focus on complex model, new focus also on low-statistics problems Complex model Simple model High statistics Low statistics Measurement of CP violation at BaBar Discovery of Higgs boson at LHC

How does it work – code structure Key concept: represent individual elements of a mathematical model by separate C++ objects Wouter Verkerke, NIKHEF variable RooRealVar function RooAbsReal PDF RooAbsPdf space point RooArgSet list of space points RooAbsData integral RooRealIntegral RooFit classMathematical concept

Coding a model in RooFit Construct each ingredient with a single line of code Wouter Verkerke, NIKHEF RooRealVar x(“x”,”x”,-10,10) ; RooRealVar y(“y”,”y”,-10,10) ; RooRealVar a(“a”,”a”,0) ; RooRealVar b(“b”,”b”,-1.5) ; RooFormulaVar m(“a*y+b”,a,y,b) ; RooGaussian f(“f”,”f”,x,m,C(1)) ; RooGaussian g(“g”,”g”,y,C(0),C(3)) ; RooProdPdf F(“F”,”F”,g,Conditional(f,y)) ; Gauss f(x,a*y+b,1) Gauss g(y,0,3) F(x,y) = f(x|y)*g(y)

Goals of RooFit Allow users to cleanly & simply express the physics problem (the probability model for your data) –Computational optimization of calculations make code ugly & inflexible (unless you put a lot of effort in it) –In RooFit users do not need to worry about performance optimization of calculations  Automated analysis of expression tree for optimization opportunities. Implemented automatically prior to use of likelihood Modularity & flexibility: provide an as-small-as-possible set of powerful building blocks from which models can be built  keep language as simple as possible –No arbitrary restrictions RooProdPdf can multiply any set of pdfs, RooAddPdf can sum any type of pdf, etc Well defined scope – statistical model building (only) –Very little mission creep over the years. Wouter Verkerke, NIKHEF

Why do/don’t people use RooFit My experience from interacting with users over the past 15(!) years (not a formal survey) Don’t use RooFit because: –User problem is too simple –Would like to write their own from scratch Do use RooFit because –Don’t want to start from scratch –You can still write your own analysis code (it’s a toolkit, not a framework) –Because it is fun to use (really!) –Recommendations by other users –Demonstrated scalability - It’s been shown to scale to very complex projects (e.g. Higgs combination) –It’s easy to combine results with other analysis groups Wouter Verkerke, NIKHEF

Who’s using RooFit? No survey – based on direct user communication and/or mentioned use in journal papers The LHC experiments: ATLAS & CMS very widespread use for Higgs, SUSY, Exotics (in ATLAS nearly 100% of all Higgs & SUSY results) LHCb used for various complex unbinned ML fits Tevatron Limited use at CDF and D0 B-Factories Very widespread use in BaBar [ originated here ] Also used in Belle Other Also (limited) use in non-collider experiments (e.g Xenon) Wouter Verkerke, NIKHEF

RooFit Development focus has been LHC in past years RooFit originally developed for (unbinned) maximum likelihood fits with analytical models at B-factories Hadron physics at LHC is messy  signal and background not described by analytical shapes, but rather by histogram templates from MC simulation Wouter Verkerke, NIKHEF Analytical form: Gaussian+Polynomial Template form: Histogram (discrete)

From empirical shapes to template morphing Along with shift to MC-based templates comes new approach to degrees of freedom that probability models should have Wouter Verkerke, NIKHEF Should background be described by 3 rd or 4 th order polynomial? Not clear how to answer this question rigorously…. What are the uncertainties in the prediction from MC simulation? We do know how to answer this question (in principle)!

Expected distributions obtained from simulation chain Wouter Verkerke, NIKHEF Simulation of high-energy physics process Simulation of ‘soft physics’ physics process Simulation of ATLAS detector Reconstruction of ATLAS detector LHC data Analysis Event selection

Wouter Verkerke, NIKHEF Simulation of high-energy physics process Simulation of ‘soft physics’ physics process Simulation of ATLAS detector Reconstruction of ATLAS detector LHC data Analysis Event selection Theory uncertainties Expected distributions obtained from simulation chain Detector uncertainties

Example uncertainties Every “systematic uncertainty” maps to existence of one or more parameters with unknown values Theory –QCD factorization and normalization scale  unknown value μ –Top production cross-section uncertainty  unknown value σ(tt) Detector –B-tagging  Unknown true b-tagging efficiency ε b –Jet calibration  Unknown true jet energy scale α JES MC statistics –Unknown true MC prediction in bin of distribution, given 3 simulated events passing all cuts Profile likelihood approach  Construct a probability model of observed distribution that explicitly parametrizes dependence on unknown quantities Wouter Verkerke, NIKHEF F(N|μ,σ tt,ε b,α JES,…)

Parametrizing histograms  template morphing For each known uncertainty from simulation, evaluate predicted distribution and three or more settings Parametric model: f(x|α) Input histograms from simulation Repeat for each known parameter

Code example – template morphing Example of template morphing systematic in a binned likelihood Wouter Verkerke, NIKHEF // Construct template models from histograms w.factory(“HistFunc::s_0(x[80,100],hs_0)”) ; w.factory(“HistFunc::s_p(x,hs_p)”) ; w.factory(“HistFunc::s_m(x,hs_m)”) ; // Construct morphing model w.factory(“PiecewiseInterpolation::sig(s_0,s_,m,s_p,alpha[-5,5])”) ; // Construct full model w.factory(“PROD::model(ASUM(sig,bkg,f[0,1]),Gaussian(0,alpha,1))”) ; Class from the HistFactory project (K. Cranmer, A. Shibata, G. Lewis, L. Moneta, W. Verkerke)

Advanced model building – describe MC statistical uncertainty Histogram-based models have intrinsic uncertainty to MC statistics… How to express corresponding shape uncertainty with model params? –Assign parameter to each histogram bin, introduce Poisson ‘constraint’ on each bin –‘Beeston-Barlow’ technique. Mathematically accurate, but introduce results in complex models with many parameters. Binned likelihood with rigid template Response function w.r.t. s, b as parameters Subsidiary measurements of s,b from s~,b~ Normalized NP model (nominal value of all γ is 1)

Code example – Beeston-Barlow Beeston-Barlow-(lite) modeling of MC statistical uncertainties Wouter Verkerke, NIKHEF // Import template histogram in workspace w.import(hs) ; // Construct parametric template models from histograms // implicitly creates vector of gamma parameters w.factory(“ParamHistFunc::s(hs)”) ; // Product of subsidiary measurement w.factory(“HistConstraint::subs(s)”) ; // Construct full model w.factory(“PROD::model(s,subs)”) ;

Code example: BB + morphing Template morphing model with Beeston-Barlow-lite MC statistical uncertainties // Construct parametric template morphing signal model w.factory(“ParamHistFunc::s_p(hs_p)”) ; w.factory(“HistFunc::s_m(x,hs_m)”) ; w.factory(“HistFunc::s_0(x[80,100],hs_0)”) ; w.factory(“PiecewiseInterpolation::sig(s_0,s_,m,s_p,alpha[-5,5])”) ; // Construct parametric background model (sharing gamma’s with s_p) w.factory(“ParamHistFunc::bkg(hb,s_p)”) ; // Construct full model with BB-lite MC stats modeling w.factory(“PROD::model(ASUM(sig,bkg,f[0,1]), HistConstraint({s_0,bkg}),Gaussian(0,alpha,1))”) ;

Morphing algorithms active area of development Wouter Verkerke, NIKHEF 2D morphing with 2 params

RooFit for LHC high-p T physics Profile Likelihood paradigm now dominant at LHC RooFit provides very powerful modular building blocks that allow to implement profile likelihood models (morphing interpolation functions) Area under very active development (new algorithms, higher dimensions, performance tuning Higher-level tools exist to simply bookkeeping process of building very complex models (HistFactory – in ROOT, HistFitter – not in ROOT (yet)) Wouter Verkerke, NIKHEF

RooFit for LHC high-pT physics – combining & reinterpreting Higgs boson discovery has critically relied on combination of many individual standalone analyses RooFit has greatly simplified this combination effort through the concept of Workspaces  persistence of complete (final) probability models that interpret data of individual analysis With workspaces anyone with a ROOT release, can redo the statistical analysis of another analysis team, which just 5 lines of code – independent of complexity of model. You just need the ROOT file with the workspace Wouter Verkerke, NIKHEF

The workspace The workspace concept has revolutionized the way people share and combine analysis –You can give somebody an analytical likelihood of a (potentially very complex) physics analysis in a way to the easy-to-use, provides introspection, and is easy to modify. Wouter Verkerke, NIKHEF RooWorkspace RooWorkspace w(“w”) ; w.import(sum) ; w.writeToFile(“model.root”) ; model.root

Using a workspace Wouter Verkerke, NIKHEF RooWorkspace // Resurrect model and data TFile f(“model.root”) ; RooWorkspace* w = f.Get(“w”) ; RooAbsPdf* model = w->pdf(“sum”) ; RooAbsData* data = w->data(“xxx”) ; // Use model and data model->fitTo(*data) ; RooPlot* frame = w->var(“dt”)->frame() ; data->plotOn(frame) ; model->plotOn(frame) ;

How well does it scale? Graph of the ATLAS Higgs combination discovery model Wouter Verkerke, NIKHEF Model has ~ function objects, ~1600 parameters Reading/writing of full model takes ~4 seconds ROOT file with workspace is ~6 Mb

Workspaces make technical aspect of combining analyses trivial Technical process of combining analyses has been straightforward, even when combining across experiments Example high-profile results –LHCb + CMS: B S  μμ –ATLAS + CMS: Higgs boson mass and couplings Even the most complex of combinations ever built the ATLAS+CMS Higgs coupling combination (578 distributions modeled with 4200 parameters using function objects) can be reassembled by 1 person from scratch in ~1 day. Technical ease allows physicists to focus on content - correlations of systematic uncertainties between channels and experiments. Wouter Verkerke, NIKHEF

The ATLAS+CMS Higgs combination Wouter Verkerke, NIKHEF

Pushing the boundary on RooFit model complexity MINUIT minimization (still) works well with 4200 parameters. –Had to disable default MINUIT2 feature to save intermediate covariance matrix at every VariableMetric step (each V takes ~70 Mb. 100 steps = 7 Gb…) Some tuning of memory model and code optimization needed. ATLAS/CMS model consumes ~6 Gb, minimizes w.r.t 4200 params in ~5 hours –Profiling with callgrind, memcheck, massif –40% used by objects representing functions, 30% on links between objects, 30% on caches of various types –Majority of CPU time spent in probability functions doing the ‘actual work’ (morphing transformations) Work on scalability improvements going –Most scaling issues in model manipulation (setup phase for fit) - usually fixed with lookup tables etc Wouter Verkerke, NIKHEF

Further development plans Documentation (yes I know…) Holy grail project - develop a guide to statistical analysis with hands-on RooFit implementation [ big project! ] Improved internal optimization of likelihood calculation & parallelization (many ideas - not so much time yet) Keep working on scalability and performance - so far has never been a showstopper Incorporate new tools and concepts that emerge from collaborations (a posteriori trimming of model complexity - ‘pruning') Replace old core code with modern STL implementations (big help here so far from Manuel Schiller!) Wouter Verkerke, NIKHEF