Comment on Event Quality Variables for Multivariate Analyses

Slides:



Advertisements
Similar presentations
Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Advertisements

G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Terascale Statistics School 2015 / Statistics for Run II1 Statistics for LHC Run II Terascale Statistics School DESY, Hamburg March 23-27, 2015.
G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
Statistics for HEP Lecture 1: Introduction and basic formalism
Discussion on significance
Status of the Higgs to tau tau
Computing and Statistical Data Analysis / Stat 11
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Estimating Statistical Significance
Statistics for the LHC Lecture 3: Setting limits
Some Statistical Tools for Particle Physics
LAL Orsay, 2016 / Lectures on Statistics
Tutorial on Multivariate Methods (TMVA)
ISTEP 2016 Final Project— Project on
TAE 2017 Centro de ciencias Pedro Pascual Benasque, Spain
School on Data Science in (Astro)particle Physics
Statistical Methods for Particle Physics Lecture 2: further topics
Computing and Statistical Data Analysis / Stat 8
Statistical Methods for the LHC Discovering New Physics
Lectures on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Project on H →ττ and multivariate methods
Statistical Treatment of Data in HEP
TAE 2018 / Statistics Lecture 3
Statistics for Particle Physics Lecture 2: Statistical Tests
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Statistical Methods for the LHC
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 6
Some Prospects for Statistical Data Analysis in High Energy Physics
Computing and Statistical Data Analysis / Stat 7
TRISEP 2016 / Statistics Lecture 1
TAE 2017 / Statistics Lecture 2
TRISEP 2016 / Statistics Lecture 2
Statistical Methods for HEP Lecture 3: Discovery and Limits
Statistical Analysis Tools for Particle Physics
Application of CLs Method to ATLAS Higgs Searches
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Decomposition of Stat/Sys Errors
TAE 2018 Centro de ciencias Pedro Pascual Benasque, Spain
Introduction to Statistics − Day 4
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Comment on Event Quality Variables for Multivariate Analyses ATLAS Machine Learning Workshop CERN, 29-31 March 2016 indico.cern.ch/event/483999/ Glen Cowan Physics Department Royal Holloway, University of London www.pp.rhul.ac.uk/~cowan g.cowan@rhul.ac.uk G. Cowan ATLAS Machine Learning / Event Quality Variables

Outline First comment is to emphasize that variables related to the quality of an event’s reconstruction can help in a multivariate analysis, even if that variable by itself gives no discrimination between event types. See also Ben Sowden, 10 Sep 2015 Stat. Forum talk Second comment (if there is time) is on use of the distribution of a classifier output in a search. G. Cowan ATLAS Machine Learning / Event Quality Variables

A simple example (2D) Consider two variables, x1 and x2, and suppose we have formulas for the joint pdfs for both signal (s) and background (b) events (in real problems the formulas are usually notavailable). f(x1|x2) ~ Gaussian, different means for s/b, Gaussians have same σ, which depends on x2, f(x2) ~ exponential, same for both s and b, f(x1, x2) = f(x1|x2) f(x2): G. Cowan ATLAS Machine Learning / Event Quality Variables

Joint and marginal distributions of x1, x2 background signal Distribution f(x2) same for s, b. So does x2 help discriminate between the two event types? G. Cowan ATLAS Machine Learning / Event Quality Variables

Likelihood ratio for 2D example Neyman-Pearson lemma says best critical region for classification is determined by the likelihood ratio: Equivalently we can use any monotonic function of this as a test statistic, e.g., Boundary of optimal critical region will be curve of constant ln t, and this depends on x2! G. Cowan ATLAS Machine Learning / Event Quality Variables

Contours of constant MVA output Exact likelihood ratio Fisher discriminant G. Cowan ATLAS Machine Learning / Event Quality Variables

Contours of constant MVA output Multilayer Perceptron 1 hidden layer with 2 nodes Boosted Decision Tree 200 iterations (AdaBoost) Training samples: 105 signal and 105 background events G. Cowan ATLAS Machine Learning / Event Quality Variables

ROC curve ROC = “receiver operating characteristic” (term from signal processing). Shows (usually) background rejection (1-εb) versus signal efficiency εs. Higher curve is better; usually analysis focused on a small part of the curve. G. Cowan ATLAS Machine Learning / Event Quality Variables

2D Example: discussion Even though the distribution of x2 is same for signal and background, x1 and x2 are not independent, so using x2 as an input variable helps. Here we can understand why: high values of x2 correspond to a smaller σ for the Gaussian of x1. So high x2 means that the value of x1 was well measured. If we don’t consider x2, then all of the x1 measurements are lumped together. Those with large σ (low x2) “pollute” the well measured events with low σ (high x2). Often in HEP there may be variables that are characteristic of how well measured an event is (region of detector, number of pile-up vertices,...). Including these variables in a multivariate analysis preserves the information carried by the well-measured events, leading to improved performance. In this example we can understand why x2 is useful, even though both signal and background have same pdf for x2. G. Cowan ATLAS Machine Learning / Event Quality Variables

Comment on use of classifier output Often start with input variables x, construct a classifier y(x) using MC samples of (s/b) training data and then construct a statistic based on the distribution of y to test values of a parameter μ proportional to rate of signal process (e.g., μ =1 is background only, μ =1 is nominal signal model). E.g., from a histogram of y with N bins, entries (n1,..., nN), construct a Poisson likelihood L(μ) = P(n1,..., nN| μ) with mean value in bin i, E[ni|μ] = μsi + bi . If there are nuisance parameters ν corresponding to systematic uncertainties, then s → s(ν), b →b(ν), the likelihood becomes L(μ,ν) and one uses e.g. profiling or Bayesian marginalization. If one wants to exploit only the shape of the distribution of y and not the absolute numbers of events found, use Multinomial model (examples follow). G. Cowan ATLAS Machine Learning / Event Quality Variables

Example with ttH (Run I) Background: inclusive tt from Powheg + Pythia8 Signal: ttH Pythia8, mH = 125 GeV After initial selection (4 b jets, 2 leptons), for 20 fb-1: stot = 4.24 btot = 124.4 So without any further cuts, naively one has discovery sensitivity: s/√b= 4.24/√124.4 = 0.38 Then define10 variables for input variables to train classifier (shown on next slides; details not important here). G. Cowan ATLAS Machine Learning / Event Quality Variables

Distributions of input variables G. Cowan ATLAS Machine Learning / Event Quality Variables

Distributions of input variables G. Cowan ATLAS Machine Learning / Event Quality Variables

Distribution of MVA output (BDT) G. Cowan ATLAS Machine Learning / Event Quality Variables

Likelihood ratio statistic for discovery test In bin i of test statistic t, expected numbers of signal/background: Likelihood function for strength parameter μ with data n1,..., nN Statistic for test of μ = 0: (Asimov Paper: CCGV EPJC 71 (2011) 1554; arXiv:1007.1727) G. Cowan ATLAS Machine Learning / Event Quality Variables

Background-only distribution of q0 For background-only (μ = 0) toy MC, generate ni ~ Poisson(bi). Large-sample asymptotic formula is “half-chi-square”. G. Cowan ATLAS Machine Learning / Event Quality Variables

Discovery sensitivity Good agreement between toy MC and large-sample formulae, so OK to use asymptotic formula for significance Z, Median significance of test of background-only hypothesis under assumption of signal+background from “Asimov data set”: gives med[Z|s+b] = 0.59 (BDT) med[Z|s+b] = 0.46 (Fisher) (recall s/√b = 0.38) G. Cowan ATLAS Machine Learning / Event Quality Variables

Multinomial model Nominal ATLAS analysis uses the information from the distribution of the MVA ouput (“shape information”). No info is taken from the total observed number of events (presumably because the systematic uncertainty on b is large). This corresponds to using a multinomial model for the observed histogram of MVA output values: G. Cowan ATLAS Machine Learning / Event Quality Variables

Multinomial model results The multinomial model gives discovery sensitivities: med[Z|s+b] = 0.45 (BDT) med[Z|s+b] = 0.27 (Fisher) Check (recall s/√b = 0.38): √(0.382 + 0.452) = 0.59 √(0.382 + 0.272) = 0.47 I.e. the s/√b = 0.38 represents the contribution from event counting, the second term from the shape information. G. Cowan ATLAS Machine Learning / Event Quality Variables

Include variables related to event quality If the number of pile-up-vertices is included as a variable, then med[Z|s+b] = 0.447 → 0.458 (multinomial, BDT) i.e., not much help in this case. Including the Higgs mass resolution for individual events med[Z|s+b] = 0.458 → 0.530 (multinomial, BDT) So there does seem to be some scope for improvement. Perhaps even better if jet resolution includes more info, e.g, EM fraction, presence of leptons, ... G. Cowan ATLAS Machine Learning / Event Quality Variables

Extra slides G. Cowan ATLAS Machine Learning / Event Quality Variables

Comment on systematics If the training data are imperfect, the statistic based on y(x) will not give an optimal test (not maximum power wrt μ > 1 for test of μ = 0). Once the function y(x) is fixed, the question of systematics boils down to the uncertainty in distribution of y under the different hypotheses, e.g., p(y|μ=0) for a test of background-only hyp. If e.g. p(y|μ=0) is well determined using a data control sample, imperfections in the MC used to design y(x) do not lead to any further systematic error (only sub-optimality of test). G. Cowan ATLAS Machine Learning / Event Quality Variables

Distribution of MVA output (Fisher) G. Cowan ATLAS Machine Learning / Event Quality Variables

Background-only cumulative distribution of q0 p-value is probability, assuming μ = 0, to find q0 even higher than the one observed (one minus cumulative distribution). From p-value, equivalent significance: Z = Φ-1(1 – p) Φ-1 = standard normal quantile G. Cowan ATLAS Machine Learning / Event Quality Variables