G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.

Slides:



Advertisements
Similar presentations
Computing and Statistical Data Analysis / Stat 4
Advertisements

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Pattern Recognition and Machine Learning
Statistical Data Analysis / Stat 2
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2007 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 1
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits Lecture 1: Introduction, Statistical.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics CERN-FNAL Hadron.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 1 1 Statistical Methods for Particle Physics Lecture 1: Introduction, Statistical Tests,
Introduction to Statistics − Day 2
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods.
Principles of Pattern Recognition
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Statistical Decision Theory
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 11 Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN,
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.
1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 31 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Statistical Data Analysis: Lecture 4 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan S0S 2010 / Statistical Tests and Limits 1 Statistical Tests and Limits Lecture 1: general formalism IN2P3 School of Statistics Autrans, France.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Statistical Data Analysis / Stat 2 1 Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests London Postgraduate Lectures on Particle.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 21 Statistical Methods for Particle Physics Lecture 2: hypothesis tests I; multivariate methods.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 21 Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
Lecture 4 1 Probability (90 min.)
TAE 2017 / Statistics Lecture 1
Computing and Statistical Data Analysis / Stat 8
Statistical Methods for Particle Physics (I)
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 7
TRISEP 2016 / Statistics Lecture 2
SUSSP65, St Andrews, August 2009 / Statistical Methods 3
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

G. Cowan Statistical methods for particle physics page 2 Example setting for statistical tests: the Large Hadron Collider Counter-rotating proton beams in 27 km circumference ring pp centre-of-mass energy 14 TeV Detectors at 4 pp collision points: ATLAS CMS LHCb (b physics) ALICE (heavy ion physics) general purpose

G. Cowan Statistical methods for particle physics page 3 The ATLAS detector 2100 physicists 37 countries 167 universities/labs 25 m diameter 46 m length 7000 tonnes ~10 8 electronic channels

G. Cowan Statistical methods for particle physics page 4 A simulated SUSY event in ATLAS high p T muons high p T jets of hadrons missing transverse energy p p

G. Cowan Statistical methods for particle physics page 5 Background events This event from Standard Model ttbar production also has high p T jets and muons, and some missing transverse energy. → can easily mimic a SUSY event.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 6 For each reaction we consider we will have a hypothesis for the pdf of, e.g., Statistical tests (in a particle physics context) Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = mean p t of jets, x 3 = missing energy,... follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it etc. Often call H 0 the signal hypothesis (the event type we want); H 1, H 2,... are background hypotheses.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 7 Selecting events Suppose we have a data sample with two kinds of events, corresponding to hypotheses H 0 and H 1 and we want to select those of type H 0. Each event is a point in space. What ‘decision boundary’ should we use to accept/reject events as belonging to event type H 0 ? accept H0H0 H1H1 Perhaps select events with ‘cuts’:

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 8 Other ways to select events Or maybe use some other sort of decision boundary: accept H0H0 H1H1 H0H0 H1H1 linearor nonlinear How can we do this in an ‘optimal’ way? What are the difficulties in a high-dimensional space?

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 9 Test statistics Construct a ‘test statistic’ of lower dimension (e.g. scalar) We can work out the pdfs Goal is to compactify data without losing ability to discriminate between hypotheses. Decision boundary is now a single ‘cut’ on t. This effectively divides the sample space into two regions, where we accept or reject H 0.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 10 Significance level and power of a test Probability to reject H 0 if it is true (error of the 1st kind): (significance level) Probability to accept H 0 if H 1 is true (error of the 2nd kind): (  power)

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 11 Efficiency of event selection Probability to accept an event which is signal (signal efficiency): Probability to accept an event which is background (background efficiency):

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 12 Purity of event selection Suppose only one background type b; overall fractions of signal and background events are  s and  b (prior probabilities). Suppose we select events with t < t cut. What is the ‘purity’ of our selected sample? Here purity means the probability to be signal given that the event was accepted. Using Bayes’ theorem we find: So the purity depends on the prior probabilities as well as on the signal and background efficiencies.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 13 Constructing a test statistic How can we select events in an ‘optimal way’? Neyman-Pearson lemma (proof in Brandt Ch. 8) states: To get the lowest  b for a given  s (highest power for a given significance level), choose acceptance region such that where c is a constant which determines  s. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is just as good.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 14 Purity vs. efficiency — optimal trade-off Consider selecting n events: expected numbers s from signal, b from background; → n ~ Poisson (s + b) Suppose b is known and goal is to estimate s with minimum relative statistical error. Take as estimator: Variance of Poisson variable equals its mean, therefore → So we should maximize equivalent to maximizing product of signal efficiency  purity.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 15 Why Neyman-Pearson doesn’t always help The problem is that we usually don’t have explicit formulae for the pdfs Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram. Use e.g. M bins for each of the n dimensions, total of M n cells. But n is potentially large, → prohibitively large number of cells to populate with Monte Carlo data. Compromise: make Ansatz for form of test statistic with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 16 Multivariate methods Many new (and some old) methods: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA, Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/ StatPatternRecognition, I. Narsky, physics/

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 17 Linear test statistic Ansatz: → Fisher: maximize Choose the parameters a 1,..., a n so that the pdfs have maximum ‘separation’. We want: ss bb t g (t) bb large distance between mean values, small widths ss

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 18 Determining coefficients for maximum separation We have where In terms of mean and variance of this becomes

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 19 Determining the coefficients (2) The numerator of J(a) is and the denominator is ‘between’ classes ‘within’ classes → maximize

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 20 Fisher discriminant Setting accept H0H0 H1H1 Corresponds to a linear decision boundary. gives Fisher’s linear discriminant function:

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 21 Fisher discriminant: comment on least squares We obtain equivalent separation between hypotheses if we multiply the a i by a common scale factor and add an arbitrary offset a 0 : Thus we can fix the mean values  0 and  1 under the null and alternative hypotheses to arbitrary values, e.g., 0 and 1. Then maximizing is equivalent to minimizing Maximizing Fisher’s J(a) → ‘least squares’ In practice, expectation values replaced by averages using samples of training data, e.g., from Monte Carlo models.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 22 Fisher discriminant for Gaussian data Suppose and covariance matrices V 0 = V 1 = V for both. We can write the Fisher discriminant (with an offset) as is multivariate Gaussian with mean values Then the likelihood ratio becomes

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 23 Fisher discriminant for Gaussian data (2) That is, (monotonic) so for this case, the Fisher discriminant is equivalent to using the likelihood ratio, and thus gives maximum purity for a given efficiency. For non-Gaussian data this no longer holds, but linear discriminant function may be simplest practical solution. Often try to transform data so as to better approximate Gaussian before constructing Fisher discrimimant.

G. Cowan Statistical methods for particle physics page 24

G. Cowan Statistical methods for particle physics page 25

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 26 Fisher discriminant and Gaussian data (3) Multivariate Gaussian data with equal covariance matrices also gives a simple expression for posterior probabilities, e.g., For a particular choice of the offset a 0 this can be written: which is the logistic sigmoid function: (We will use this later in connection with Neural Networks.)

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 27 Wrapping up lecture 6 We looked at statistical tests and related issues: discriminate between event types (hypotheses), determine selection efficiency, sample purity, etc. We discussed a method to construct a test statistic using a linear function of the data: Fisher discriminant Next we will discuss nonlinear test variables such as neural networks