Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues.

Similar presentations


Presentation on theme: "Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues."— Presentation transcript:

1 Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues

2 RooFit: Focus: coding a probability density function Focus on one practical aspect of many data analysis in HEP: How do you formulate your p.d.f. in ROOT –For ‘simple’ problems (gauss, polynomial) this is easy –But if you want to do unbinned ML fits, use non-trivial functions, or work with multidimensional functions you quickly find that you need some tools to help you 1

3 Introduction – Why RooFit was developed BaBar experiment at SLAC: Extract sin(2) from time dependent CP violation of B decay: e + e -  Y(4s)  BB –Reconstruct both Bs, measure decay time difference –Physics of interest is in decay time dependent oscillation Many issues arise –Standard ROOT function framework clearly insufficient to handle such complicated functions  must develop new framework –Normalization of p.d.f. not always trivial to calculate  may need numeric integration techniques –Unbinned fit, >2 dimensions, many events  computation performance important  must try optimize code for acceptable performance –Simultaneous fit to control samples to account for detector performance 2

4 Wouter Verkerke, UCSB Implementation – Add-on package to ROOT C++ command line interface & macros Data management & histogramming Graphics interface I/O support MINUIT ToyMC data Generation Data/Model Fitting Data Modeling Model Visualization libRooFitCore.so libRooFitModels.so

5 Wouter Verkerke, UCSB Data modeling – OO representation Mathematical objects are represented as C++ objects variable RooRealVar function RooAbsReal PDF RooAbsPdf space point RooArgSet list of space points RooAbsData integral RooRealIntegral RooFit classMathematical concept

6 Wouter Verkerke, UCSB Data modeling – Constructing composite objects Straightforward correlation between mathematical representation of formula and RooFit code RooRealVar x RooRealVar s RooFormulaVar sqrts RooGaussian g  RooRealVar x(“x”,”x”,-10,10) ;  RooRealVar m(“m”,”mean”,0) ;  RooRealVar s(“s”,”sigma”,2,0,10) ;  RooFormulaVar sqrts(“sqrts”,”sqrt(s)”,s) ;  RooGaussian g(“g”,”gauss”,x,m,sqrts) ; Math RooFit diagram RooFit code RooRealVar m     

7 Wouter Verkerke, UCSB Model building – (Re)using standard components RooFit provides a collection of compiled standard PDF classes RooArgusBG RooPolynomial RooBMixDecay RooHistPdf RooGaussian Basic Gaussian, Exponential, Polynomial,… Physics inspired ARGUS,Crystal Ball, Breit-Wigner, Voigtian, B/D-Decay,…. Non-parametric Histogram, KEYS (Probability Density Estimate) PDF Normalization By default RooFit uses numeric integration to achieve normalization Classes can optionally provide (partial) analytical integrals Final normalization can be hybrid numeric/analytic form

8 Wouter Verkerke, UCSB RooBMixDecay RooPolynomial RooHistPdf RooArgusBG Model building – (Re)using standard components Most physics models can be composed from ‘basic’ shapes RooAddPdf + RooGaussian

9 RooBMixDecay RooPolynomial RooHistPdf RooGaussian Model building – (Re)using standard components Most physics models can be composed from ‘basic’ shapes RooFFTConvPdf ⊗ RooLandau

10 Wouter Verkerke, UCSB RooBMixDecay RooPolynomial RooHistPdf RooArgusBG RooGaussian Model building – (Re)using standard components Most physics models can be composed from ‘basic’ shapes RooProdPdf *

11 Wouter Verkerke, UCSB Model building – (Re)using standard components Building blocks are flexible –Function variables can be functions themselves –Just plug in anything you like –Universally supported by core code (PDF classes don’t need to implement special handling) g(x;m,s) m(y;a 0,a 1 ) g(x,y;a 0,a 1,s) RooPolyVar m(“m”,y,RooArgList(a0,a1)) ; RooGaussian g(“g”,”gauss”,x,m,s) ;

12 Wouter Verkerke, UCSB Using models - Overview All RooFit models provide universal and complete fitting and Toy Monte Carlo generating functionality –Model complexity only limited by available memory and CPU power models with >16000 components, >1000 fixed parameters and>80 floating parameters have been used (published physics result) –Very easy to use – Most operations are one-liners RooAbsPdf RooDataSet RooAbsData gauss.fitTo(data) data = gauss.generate(x,1000) FittingGenerating

13 Using both model & p.d.f from file Persisting models in the workspace Wouter Verkerke, NIKHEF TFile f(“myresults.root”) ; RooWorkspace* w = f.Get(“w”) ; RooPlot* xframe = w->var(“x”)->frame() ; w->data(“d”)->plotOn(xframe) ; w->pdf(“g”)->plotOn(xframe) ; RooNLLVar nll(“nll”,”nll”,*w->pdf(“g”),*w->data(“d”)) ; RooProfileLL pll(“pll”,”pll”, nll,*w->var(“m”)) ; RooPlot* mframe = w->var(“m”)->frame(-1,1) ; pll.plotOn(mframe) ; mframe->Draw() Make plot of data and p.d.f Construct likelihood & profile LH Draw profile LH

14 What is it used for / has it been used for In many published results from B factories –(Time dependent) fits for CP asymmetries –Search for rare decays (N-dimensional likelihoods) (in at least 50 papers – I’ve stopped counting...) Various LHC experiments –In LHCb (fits) –In ATLAS – extraction of top cross section, B-physics, SUSY, Higgs, Exotics –In CMS (in various places) Basis of RooStats project (a joint ATLAS/CMS project to develop common statistical tools) RooFit/RooStats may play fundamental role in LHC Higgs combination (lot’s of work is ongoing) –RooWorkspace concept facilitate sharing (and publishing) of exact multi-dimensional likelihoods Wouter Verkerke, NIKHEF

15 NIKHEF Annual report 2010 Annual Report - ATLAS Annual Report - LHCb tt cross-section extracted using RooFit

16 ATLAS/CMS joint stat. forum – Higgs combination Wouter Verkerke, NIKHEF

17 Project timeline 1999 : Project started –First application: ‘sin2b’ measurement of BaBar (model with 5 observables, 37 floating parameters, simultaneous fit to multiple CP and control channels) 2000 : Complete overhaul of design based on experience with sin2b fit –Very useful exercise: new design is still current design 2003 : Public release of RooFit with ROOT – code maintained in SourceForge 2004 : Over 50 BaBar physics publications using RooFit 2007 : Integration of RooFit in ROOT CVS source 2008 : Upgrade in functionality as part of RooStats project –Improved analytical and numeric integration handling, improved toy MC generation, addition of workspace 2009 : Now ~100K lines of code –(For comparison RooStats proper is ~5000 lines of code) last modification before date lines of code 4

18 Experience – Distribution channel Distribution of SW within experiment no issue – almost always well organized. To distribute outside collaboration: For analysis tools and packages for experimental physics ROOT is very much preferred –Everybody has it –Well run core team that can help with integration issues –Allows for distribution of users manual, tutorials –Provides communication platform with end users Alternative (for general software) – SourceForge –Free, excellent infrastructure (Web page, SVN, forum tools, etc) –Requires license that allows free distribution –Not HEP specific (both advantaged and disadvantage) For physics software other channels exist (HepForge) Wouter Verkerke, NIKHEF

19 Experience – Gaining acceptance How do you get everyone to use your software? –Make sure it is easily available [ROOT, or experiment] –Provide good user support –Make sure you have a ‘killer feature’ – i.e. something that can only be done (easily) with your software E.g. convolution of a Landau and a Gaussian (this surprised me) To get started – provide intensive support for one or two highly visible projects – a succesful and published analysis is the best sales pitch RooFit – two classes of users –A few very smart and ambitious people that want to do something really complicated –Normal users that want to do something close to routine on a short notice –When you get started it is better to focus on the 1 st group Wouter Verkerke, NIKHEF

20 Experience – Maintenance and bug-fixing If project is very small (i.e. a single C++ class) usually not an issue –Also ROOT team can ‘adopt’ your single class and take over maintenance For larger projects – this presents continuous work –Users keep finding bugs –New features usually result in new bugs Modern software tools simplify debugging –Use valgrind for tracing of memory leaks & corruption –Use cachegrind for performance profiling –Use Coverity for static code analysis (not free – but CERN has access) Useful to have ‘validation suite’ –Series of standard tests that exercise most of your software –Allows to verify quickly and frequently that everything is still working (useful close to release) –Can serve as vehicle for valgrind/cachegrind performance testting Wouter Verkerke, NIKHEF

21 Experience – User support ‘Formal’ user support is important for the long-term viabilility of your project –You should make sure that people that don’t know you personally and are outside your experiment can ask you questions Load of user support depends very much on size and nature of applications –Should be prepared to spend some fraction of your time on this every week (if only 1 hour) –If you project is succesful,this will never end... No user support = slow death of project Incorporation in ROOT very beneficial here –Allows users to ask questions in familiar place Wouter Verkerke, NIKHEF

22 Experience - Documentation Documentation is very important – people tend to not use poorly documented software Good documentation also simplifies user support  Can simply refer to manual or tutorial macro instead of writing long e-mails every time RooFit provide 4 forms of documentation –Quick-start guide (20 pages - allows users to get started easily) –Users manual (>100 page document). Pedagical document –Class documentation (online on ROOT Web site) – Answer practical issues concerns use of software –Tutorial macros – Fully functional examples that can serve as starting point for user problems Tutorial macros especially useful –Also recyclable for validation suite Maintenance –Updating of class documentation, tutorials relatively easy. –Update of Users Manual time consuming Wouter Verkerke, NIKHEF

23 Experience - Copyright How to deal with copyright RooFit issued under a BSD license –BSD Retains copyright (by your employer!) but allows free distribution provided copyright notice is retained. Employer should (in principle) consent to free distribution Other options –GPL – You relinquish copyright (your employer should consent) and allow free distribution. Also projects that include your code should be GPL as well (GPL is ‘contagious’) –LGPL – As GPL except that it doesn’t require projects to adopt GPL (usually better). Example: ROOT is LGPL –No statement – Copyright is still implicit (I think – but I am not a lawyer) but distribution and reuse conditions are not spelled out. You should really avoid this. Can create complicated situation and conflicts later on –Custom - Not common. Could e.g. require that use requires acknowledgements, but this is tricky issue... Wouter Verkerke, NIKHEF

24 Experience - Acknowledgements Do you want people to reference you in papers etc? –Depends very much on nature of software For MC generators very common (close to physics) –For general software much less so (ROOT,TMVA,RooFit) RooFit generally not referenced – I don’t feel strongly that this is needed given the general nature of the package –If you feel strongly about this – recommend to spell your wishes out in the distribution notice In any case, be sure there is a document that can be used as reference in papers. Wouter Verkerke, NIKHEF

25 Experience – conferences & workshops Several interesting conference series exist for (physics) software CHEP – very large, more ‘industrial focus’. Will accept presentations on e.g. RooFit but not main focus PHYSTAT – For analysis software. Small conference, focus on statistical inference, but interesting and populated by ‘important people’ ACAT – Somewhere in between the above two Try to generate at least one conference proceeding early that can serve as reference Wouter Verkerke, NIKHEF

26 Summary Wouter Verkerke, NIKHEF


Download ppt "Software - RooFit/RooStats W. Verkerke Wouter Verkerke, NIKHEF What is it Where is it used Experience, Lessons and Issues."

Similar presentations


Ads by Google