G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits Lecture 1: Introduction, Statistical.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 1 1 Statistical Methods for Particle Physics Lecture 1: Introduction, Statistical Tests,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Lecture II-2: Probability Review
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
Model Inference and Averaging
Overview course in Statistics (usually given in 26h, but now in 2h)  introduction of basic concepts of probability  concepts of parameter estimation.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 11 Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN,
Uniovi1 Some statistics books, papers, etc. G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 see also R.J. Barlow,
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Statistical Data Analysis: Lecture 4 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan INFN School of Statistics, Vietri Sul Mare, 3-7 June Statistical Methods (Lectures 2.1, 2.2) INFN School of Statistics Vietri sul Mare,
Probability Theory and Parameter Estimation I
Lecture 4 1 Probability (90 min.)
TAE 2018 / Statistics Lecture 1
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
TAE 2017 / Statistics Lecture 1
Statistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Methods: Parameter Estimation
Computing and Statistical Data Analysis / Stat 8
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Computing and Statistical Data Analysis / Stat 7
Introduction to Unfolding
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 2 Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V  I  is a positive semi-definite matrix, where Therefore Often use I  as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 3 Example of ML with 2 parameters Consider a scattering angle distribution with x = cos , or if x min < x < x max, need always to normalize so that Example:  = 0.5,  = 0.5, x min =  0.95, x max = 0.95, generate n = 2000 events with Monte Carlo.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 4 Example of ML with 2 parameters: fit result Finding maximum of ln L( ,  ) numerically ( MINUIT ) gives N.B. No binning of data for fit, but can compare to histogram for goodness-of-fit (e.g. ‘visual’ or  2 ). (Co)variances from ( MINUIT routine HESSE )

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 5 Two-parameter fit: MC study Repeat ML fit with 500 experiments, all with n = 2000 events: Estimates average to ~ true values; (Co)variances close to previous estimates; marginal pdfs approximately Gaussian.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 6 The ln L max  1/2 contour For large n, ln L takes on quadratic form near maximum: The contour is an ellipse:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 7 (Co)variances from ln L contour → Tangent lines to contours give standard deviations. → Angle of ellipse  related to correlation: Correlations between estimators result in an increase in their standard deviations (statistical errors). The ,  plane for the first MC data set

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 8 Extended ML Sometimes regard n not as fixed, but as a Poisson r.v., mean. Result of experiment defined as: n, x 1,..., x n. The (extended) likelihood function is: Suppose theory gives = (  ), then the log-likelihood is where C represents terms not depending on .

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 9 Extended ML (2) Extended ML uses more info → smaller errors for Example: expected number of events where the total cross section  (  ) is predicted as a function of the parameters of a theory, as is the distribution of a variable x. If does not depend on  but remains a free parameter, extended ML gives: Important e.g. for anomalous couplings in e  e  → W + W 

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 10 Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: f s (x) and f b (x). We observe a mixture of the two event types, signal fraction = , expected total number =, observed total number = n. Let goal is to estimate  s,  b. →

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 11 Extended ML example (2) Maximize log-likelihood in terms of  s and  b : Monte Carlo example with combination of exponential and Gaussian: Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 12 Extended ML example: an unphysical estimate A downwards fluctuation of data in the peak region can lead to even fewer events than what would be obtained from background alone. Estimate for  s here pushed negative (unphysical). We can let this happen as long as the (total) pdf stays positive everywhere.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 13 Unphysical estimators (2) Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 14 ML with binned data Often put data into a histogram: Hypothesis iswhere If we model the data as multinomial (n tot constant), then the log-likelihood function is:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 15 ML example with binned data Previous example with exponential, now put data into histogram: Limit of zero bin width → usual unbinned ML. If n i treated as Poisson, we get extended log-likelihood:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 16 Relationship between ML and Bayesian estimators In Bayesian statistics, both  and x are random variables: Recall the Bayesian method: Use subjective probability for hypotheses (  ); before experiment, knowledge summarized by prior pdf  (  ); use Bayes’ theorem to update prior in light of data: Posterior pdf (conditional pdf for  given x)

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 17 ML and Bayesian estimators (2) Purist Bayesian: p(  | x) contains all knowledge about . Pragmatist Bayesian: p(  | x) could be a complicated function, → summarize using an estimator Take mode of p(  | x), (could also use e.g. expectation value) What do we use for  (  )? No golden rule (subjective!), often represent ‘prior ignorance’ by  (  ) = constant, in which case But... we could have used a different parameter, e.g., = 1/ , and if prior   (  ) is constant, then  ( ) is not! ‘Complete prior ignorance’ is not well defined.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 18 Wrapping up lecture 10 We’ve now seen several examples of the method of Maximum Likelihood: multiparameter case variable sample size (extended ML) histogram-based data and we’ve seen the connection between ML and Bayesian parameter estimation. Next we will consider a special case of ML with Gaussian data and show how this leads to the method of Least Squares.