Unfolding algorithms and tests using RooUnfold

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Roundoff and truncation errors
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
RooUnfold unfolding framework and algorithms
Lecture 3 Probability and Measurement Error, Part 2.
JET CHARGE AT LHC Scott Lieberman New Jersey Institute of Technology TUNL REU 2013 Duke High Energy Physics Group Working with: Professor Ayana Arce Dave.
Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook.
MINOS Feb Antineutrino running Pedro Ochoa Caltech.
Motion Analysis (contd.) Slides are from RPI Registration Class.
CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.
1 PID Detectors & Emittance Resolution Chris Rogers Rutherford Appleton Laboratory MICE CM17.
Chapter 4 Multiple Regression.
Statistical Image Modelling and Particle Physics Comments on talk by D.M. Titterington Glen Cowan RHUL Physics PHYSTAT05 Glen Cowan Royal Holloway, University.
10. Unfolding ctd… K. Desch – Statistical methods of data analysis SS10.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
PPR meeting - January 23, 2003 Andrea Dainese 1 TPC tracking parameterization: a useful tool for simulation studies with large statistics Motivation Implementation.
880.P20 Winter 2006 Richard Kass Propagation of Errors Suppose we measure the branching fraction BR(Higgs  +  - ) using the number of produced Higgs.
Analysis Meeting vol Shun University.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
HERA/LHC Workshop, MC Tools working group, HzTool, JetWeb and CEDAR Tools for validating and tuning MC models Ben Waugh, UCL Workshop on.
Unfolding jet multiplicity and leading jet p T spectra in jet production in association with W and Z Bosons Christos Lazaridis University of Wisconsin-Madison.
Modern Navigation Thomas Herring
TOF Meeting, 9 December 2009, CERN Chiara Zampolli for the ALICE-TOF.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory ATLAS RAL Physics Meeting 20 th May 2008.
Debugging and Profiling With some help from Software Carpentry resources.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008.
Status of Reference Network Simulations John Dale TILC09 20 April 2009.
1 Iterative dynamically stabilized (IDS) method of data unfolding (*) (*arXiv: ) Bogdan MALAESCU CERN PHYSTAT 2011 Workshop on unfolding.
Some thoughts on error handling for FTIR retrievals Prepared by Stephen Wood and Brian Connor, NIWA with input and ideas from others...
Feedback from LHC Experiments on using CLHEP Lorenzo Moneta CLHEP workshop 28 January 2003.
PESAsim – the e/  analysis framework Validation of the framework First look at a trigger menu combining several signatures Short-term plans Mark Sutton.
ILC Main Linac Alignment Simulations John Dale 2009 Linear Collider Workshop of the Americas.
Unfolding in ALICE Jan Fiete Grosse-Oetringhaus, CERN for the ALICE collaboration PHYSTAT 2011 CERN, January 2011.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
JET CHARGE AT LHC Scott Lieberman New Jersey Institute of Technology TUNL REU 2013 Duke High Energy Physics Group Working with: Professor Ayana Arce Dave.
Computing Performance Recommendations #10, #11, #12, #15, #16, #17.
Status of Reference Network Simulations John Dale ILC-CLIC LET Beam Dynamics Workshop 23 June 2009.
Tutorial I: Missing Value Analysis
8th December 2004Tim Adye1 Proposal for a general-purpose unfolding framework in ROOT Tim Adye Rutherford Appleton Laboratory BaBar Statistics Working.
Spectrum Reconstruction of Atmospheric Neutrinos with Unfolding Techniques Juande Zornoza UW Madison.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 41 Statistics for HEP Lecture 4: Unfolding Academic Training Lectures CERN, 2–5 April,
15 December 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 15 th December 2000.
A different cc/nc oscillation analysis Peter Litchfield  The Idea:  Translate near detector events to the far detector event-by-event, incorporating.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 G4UIRoot Isidro González ALICE ROOT /10/2002.
Belle General meeting Measurement of spectral function in the decay 1. Motivation 2. Event selection 3. mass spectrum (unfolding) 4. Evaluation.
Paolo Massarotti Kaon meeting March 2007  ±  X    X  Time measurement use neutral vertex only in order to obtain a completely independent.
Tau31 Tracking Efficiency at BaBar Ian Nugent UNIVERSITY OF VICTORIA Sept 2005 Outline Introduction  Decays Efficiency Charge Asymmetry Pt Dependence.
1 D *+ production Alexandr Kozlinskiy Thomas Bauer Vanya Belyaev
Extrapolation Techniques  Four different techniques have been used to extrapolate near detector data to the far detector to predict the neutrino energy.
I'm concerned that the OS requirement for the signal is inefficient as the charge of the TeV scale leptons can be easily mis-assigned. As a result we do.
Analysis Tools interface - configuration Wouter Verkerke Wouter Verkerke, NIKHEF 1.
Reduction of Variables in Parameter Inference
Status of Reference Network Simulations
Unfolding Problem: A Machine Learning Approach
° status report analysis details: overview; “where we are”; plans: before finalizing result.. I.Larin 02/13/2009.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
ILC Main Linac Alignment Simulations
Statistical Methods for Particle Physics Lecture 4: unfolding
Introduction to Unfolding
Dilepton Mass. Progress report.
Juande Zornoza UW Madison
Unfolding with system identification
° status report analysis details: overview; “where we are”; plans: before finalizing result.. I.Larin 02/13/2009.
ATLAS full run-2 luminosity combination
Presentation transcript:

Unfolding algorithms and tests using RooUnfold Tim Adye Rutherford Appleton Laboratory PHYSTAT 2011 Workshop on Unfolding and Deconvolution 20th January 2011

Outline RooUnfold aims, design, and features Example of unfolding Summary of the algorithms Comparison of algorithms Unfolding errors Status and Plans References Tim Adye - RAL RooUnfold

RooUnfold package aims Provide a framework for different algorithms Can compare performance directly, with common user code RooUnfold takes care of different binning, normalisation, efficiency conventions Can use common RooUnfold utilities Write once, use for all algorithms Currently implement or interface to iterative Bayes, SVD, TUnfold, unregularised matrix inversion, and bin-by-bin correction factors algorithms Simple OO design “response matrix” object can be filled separately from training sample in a different routine, or a different program (root i/o support) Simple interface for the user From program, root/cint script, or interactive root prompt Fill with histograms, vectors/matrices,… or direct methods: response->Fill(xmeasured, xtrue) and Miss(xtrue) methods takes care of normalisation Results as a histogram with errors, or vector and covariance matrix Tim Adye - RAL RooUnfold

RooUnfold features Supports different binning scenarios multi-dimensional distributions (1D, 2D, and 3D) Different binning (or even dimensionality) for measured and truth Option to include or exclude histogram under/overflow bins in the unfolding Supports different methods for error computation (simple switch). In order of increasing CPU time: No error calculation (uses √N) bin-by-bin errors (no correlations) full covariance matrix from the propagation of measurement errors in the unfolding, or covariance matrix from MC toys useful to test error propagation and when it is inaccurate These details are handled by the framework, so don’t need to be implemented for each algorithm Tim Adye - RAL RooUnfold

Testing Calculates resolutions, pulls, and χ2 Includes a toy MC test framework, allowing selection of different PDFs and PDF parameters binning 1D, 2D, 3D tests unfolding methods and parameters Test procedures for the regularisation parameter and errors and plotting results from a single command Tim Adye - RAL RooUnfold

RooUnfold Classes or Unfolded distribution or and errors Training TH1D Response matrix Training truth Training measured for (i=0; i<N; i++) if (measured[i]) R->Fill (measured[i], truth[i]); else R->Miss (truth[i]); or disk TH1D Measured data RooUnfoldResponse Subclasses of RooUnfold Or use TH2D/TH3D for truth and/or measured distributions RooUnfoldBayes RooUnfoldSvd RooUnfold RooUnfoldTUnfold TH1D Unfolded distribution and errors TVector TMatrix RooUnfoldInvert RooUnfoldBinByBin or RooUnfoldExample RooUnfoldTest RooUnfoldTest2D RooUnfoldTest3D Test programs Tim Adye - RAL RooUnfold

Unfolding example (Bayes) Training sample single Gaussian + flat background Gaussian smearing, systematic translation, and variable inefficiency Unfolded sample Double Breit-Wigner + flat background unfolded result

Unfolded x-projection Unfolded y-projection truth measured reco 2D unfolding 2D Smearing, bias, variable efficiency, and variable rotation

Algorithms – Iterative Bayes’ theorem Uses the method of Giulio D'Agostini (1995), implemented by Fergus Wilson and myself see talk by Katharina Bierwagen for details of the method Uses repeated application of Bayes’ theorem to invert the response matrix Regularisation by stopping iterations before reaching “true” (but wildly fluctuating) inverse Regularisation parameters is the number of iterations, which in principle has to be tuned according to the statistics, number of bins, etc. In practice, the results are fairly insensitive to the precise setting. Implementation details: Initial prior is taken from training truth, rather than a flat distribution Does not bias result once we have iterated, but perhaps reach optimum faster Takes account of multinomial errors on the data sample but not, by default, uncertainties in the response matrix (finite MC statistics), which is very slow Does not normally do smoothing (can be enabled with an option) Tim Adye - RAL RooUnfold

Algorithms – SVD Uses the method of Andreas Höcker and Vato Kartvelishvili (guru Fortran program) see Vato’s talk for details of the method Implemented in C++/ROOT by Kerstin Tackmann and Heiko Lacker and included in the most recent version of ROOT (5.28) as TSVDUnfold see Kerstin’s talk for details of the method RooUnfold includes an interface to this class Obtains inverse of response matrix using singular value decomposition Use number-of-events matrix to keep track of MC uncertainties Regularisation with a smooth cut-off on small singular value contributions (these correspond to high-frequency fluctuations) Replace si2 → si2 / (si2 + sk2) k determines the relative contributions of MC truth and data k too small → result dominated by MC truth k too large → result dominated by statistical fluctuations k needs to be tuned for the particular type of distribution, number of bins, and approximate sample size Unfolded error matrix includes effect of finite MC training statistics (usually small) Tim Adye - RAL RooUnfold

Algorithms – TUnfold Uses the TUnfold method implemented by Stefan Schmitt and included in ROOT RooUnfold includes an interface to this class Performs a matrix inversion with 0-, 1-, or 2-order polynomial regularisation of neighbouring bins RooUnfold automatically takes care of packing 2D and 3D distributions and creating the appropriate regularisation matrix required by TUnfold TUnfold can determine an optimal regularisation parameter (τ) by scanning the “L-curve” of log10(χ2) vs log10(τ). Tim Adye - RAL RooUnfold

Unregularised Algorithms Very simple algorithms using bin-by-bin correction factors, with no inter-bin migration using unregularised matrix inversion with singular value removal (TDecompSVD) are included for comparison – and to demonstrate why they should not be used in most cases! Tim Adye - RAL RooUnfold

Comparison of algorithm features TUnfold and unregularised matrix inversion require the number of bins, Nmeasured ≥ Ntrue TUnfold claims best results if Nmeasured > Ntrue, eg. Nmeasured = 2Ntrue This is a common general recommendation from unfolding experts, but perhaps is most relevant to these types of algorithms with explicit regularisation This is an implicit additional regularisation, since we are “smoothing” two bins into one SVD implementation and bin-by-bin methods only support Nmeasured = Ntrue SVD implementation also only works well for 1D distributions The choice of the SVD regularisation parameter has to be done by the user TUnfold can often do this automatically Can we do something similar for the SVD method? The performance of the Bayes method is relatively insensitive to the regularisation parameter (number of iterations) Tim Adye - RAL RooUnfold

RooUnfold with Bayes algorithm 3 iterations Gaussian smearing, systematic translation, and variable inefficiency χ2bin=47 χ2cov=9178 Residuals Correlation matrix

RooUnfold with SVD algorithm k = 30 Gaussian smearing, systematic translation, and variable inefficiency χ2bin=42 χ2cov=535 Residuals Correlation matrix

RooUnfold with TUnfold algorithm τ=0.004 (L-curve scan gave 0.0014) Gaussian smearing, systematic translation, and variable inefficiency Two bins per truth bin χ2bin=37 χ2cov=37 Residuals Correlation matrix

Unregularised matrix inversion Gaussian smearing, variable inefficiency, and no systematic translation χ2bin=30 χ2cov=34 Residuals Correlation matrix

simple correction factors Gaussian smearing, systematic translation, and variable inefficiency Variable inefficiency only χ2=47

Unfolding errors All methods return a full covariance matrix of the errors on the unfolded histogram due to uncertainties on the measured distribution. This is often calculated by propagation of errors but not always possible if there are non-linearities or other problems, eg. the iterations in the Bayes method are not handled in D'Agostini’s formalism: RooUnfold allows the covariance matrix to be calculated from toy MC instead provides a cross-check of the error propagation or replace it if there are problems Unfolding errors Errors from toy MC Tim Adye - RAL RooUnfold

Bin-to-bin correlations Regularisation introduces inevitable correlations between bins in the unfolded distribution To calculate a correct χ2, one has to invert the covariance matrix: χ2 = (xm−xt)T V-1 (xm−xt) However, in many cases, the covariance matrix is poorly conditioned, which makes calculating the inverse problematic Inverting a poorly conditioned matrix involves subtracting large, but very similar numbers, leading to significant effects due to the machine precision In any case, χ2 may not be the best figure of merit could improve χ2 by relaxing regularisation → larger errors, but also larger residuals Is there a better figure of merit? Tim Adye - RAL RooUnfold

RooUnfold Status RooUnfold was first released stand-alone (outside the BaBar framework) in 2007 Have received many questions, suggestions, and even a few bug reports ~60 people working on many different experiments in PP, PA, and NP Prompted new versions with fixes and improvements Last year I started working with a small group hosted by the Helmholtz Alliance, the Unfolding Framework Project The project is developing unfolding experience, software, algorithms, and performance tests Also developing and running a “Banff challenge”-esque example It has adopted RooUnfold as a framework for development Tim Adye - RAL RooUnfold

Plans and possible improvements Incorporate into the next version of ROOT Encouraging discussions with Lorenzo Moneta from the ROOT team More common tools, useful for all algorithms Further automate validation and selection of regularisation parameter Allow propagation of errors from the response matrix equally in all algorithms and with toy MC More algorithms Interface to TTRUE (Natalie Milke), which is a C++ implementation of Volker Blobel’s RUN method Requires unbinned MC responses Interface to Iterative Dynamically Stabilized method of Bogdan Malaescu ...? Tim Adye - RAL RooUnfold

References RooUnfold code, documentation, and references to unfolding reviews and techniques can be found on this web page http://hepunx.rl.ac.uk/~adye/software/unfold/RooUnfold.html Tim Adye - RAL RooUnfold