Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Visual Recognition Tutorial
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Maximum likelihood (ML)
Lecture II-2: Probability Review
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan INFN School of Statistics, Vietri Sul Mare, 3-7 June Statistical Methods (Lectures 2.1, 2.2) INFN School of Statistics Vietri sul Mare,
Statistics for HEP Lecture 1: Introduction and basic formalism
Discussion on significance
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Visual Recognition Tutorial
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Lecture 4 1 Probability (90 min.)
TAE 2018 / Statistics Lecture 1
TAE 2017 / Statistics Lecture 1
Statistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Methods: Parameter Estimation
Computing and Statistical Data Analysis / Stat 8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Summarizing Data by Statistics
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Statistical Tests and Limits Lecture 2: Limits
Computing and Statistical Data Analysis / Stat 6
Computing and Statistical Data Analysis / Stat 7
Introduction to Unfolding
Statistical Methods for HEP Lecture 3: Discovery and Limits
Decomposition of Stat/Sys Errors
Parametric Methods Berlin Chen, 2005 References:
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
12. Principles of Parameter Estimation
Chapter 8 Estimation.
Mathematical Foundations of BME Reza Shadmehr
Computing and Statistical Data Analysis / Stat 10
Applied Statistics and Probability for Engineers
Presentation transcript:

Statistical Methods for Particle Physics Lecture 3: parameter estimation www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Outline 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general concepts, test statistics, multivariate methods, goodness-of-fit tests 3 Parameter estimation general concepts, maximum likelihood, variance of estimators, least squares 4 Hypothesis tests for discovery and exclusion discovery significance, sensitivity, setting limits 5 Further topics systematic errors, Bayesian methods, MCMC G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Frequentist parameter estimation Suppose we have a pdf characterized by one or more parameters: random variable parameter Suppose we have a sample of observed values: We want to find some function of the data to estimate the parameter(s): ← estimator written with a hat Sometimes we say ‘estimator’ for the function of x1, ..., xn; ‘estimate’ for the value of the estimator with a particular data set. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Properties of estimators If we were to repeat the entire measurement, the estimates from each would follow a pdf: best large variance biased We want small (or zero) bias (systematic error): → average of repeated measurements should tend to true value. And we want a small variance (statistical error): → small bias & variance are in general conflicting criteria G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Distribution, likelihood, model Suppose the outcome of a measurement is x. (e.g., a number of events, a histogram, or some larger set of numbers). The probability density (or mass) function or ‘distribution’ of x, which may depend on parameters θ, is: P(x|θ) (Independent variable is x; θ is a constant.) If we evaluate P(x|θ) with the observed data and regard it as a function of the parameter(s), then this is the likelihood: L(θ) = P(x|θ) (Data x fixed; treat L as function of θ.) We will use the term ‘model’ to refer to the full function P(x|θ) that contains the dependence both on x and θ. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Bayesian use of the term ‘likelihood’ We can write Bayes theorem as where L(x|θ) is the likelihood. It is the probability for x given θ, evaluated with the observed x, and viewed as a function of θ. Bayes’ theorem only needs L(x|θ) evaluated with a given data set (the ‘likelihood principle’). For frequentist methods, in general one needs the full model. For some approximate frequentist methods, the likelihood is enough. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

The likelihood function for i.i.d.*. data * i.i.d. = independent and identically distributed Consider n independent observations of x: x1, ..., xn, where x follows f (x; q). The joint pdf for the whole data sample is: In this case the likelihood function is (xi constant) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Maximum likelihood The most important frequentist method for constructing estimators is to take the value of the parameter(s) that maximize the likelihood: The resulting estimators are functions of the data and thus characterized by a sampling distribution with a given (co)variance: In general they may have a nonzero bias: Under conditions usually satisfied in practice, bias of ML estimators is zero in the large sample limit, and the variance is as small as possible for unbiased estimators. ML estimator may not in some cases be regarded as the optimal trade-off between these criteria (cf. regularized unfolding). G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

ML example: parameter of exponential pdf Consider exponential pdf, and suppose we have i.i.d. data, The likelihood function is The value of t for which L(t) is maximum also gives the maximum value of its logarithm (the log-likelihood function): G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

ML example: parameter of exponential pdf (2) Find its maximum by setting → Monte Carlo test: generate 50 values using t = 1: We find the ML estimate: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of estimators: Monte Carlo method Having estimated our parameter we now need to report its ‘statistical error’, i.e., how widely distributed would estimates be if we were to repeat the entire measurement many times. One way to do this would be to simulate the entire experiment many times with a Monte Carlo program (use ML estimate for MC). For exponential example, from sample variance of estimates we find: Note distribution of estimates is roughly Gaussian − (almost) always true for ML in large sample limit. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of estimators from information inequality The information inequality (RCF) sets a lower bound on the variance of any estimator (not only ML): Minimum Variance Bound (MVB) Often the bias b is small, and equality either holds exactly or is a good approximation (e.g. large data sample limit). Then, Estimate this using the 2nd derivative of ln L at its maximum: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of estimators: graphical method Expand ln L (q) about its maximum: First term is ln Lmax, second term is zero, for third term use information inequality (assume equality): i.e., → to get , change q away from until ln L decreases by 1/2. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example of variance by graphical method ML example with exponential: Not quite parabolic ln L since finite sample size (n = 50). G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Functions of ML estimators Suppose we had written the exponential pdf as i.e., we use l = 1/t. What is the ML estimator for l? Rewrite the likelihood replacing τ by 1/λ. The λ that maximizes L(λ) is the λ that corresponds to the τ that maximizes L(τ), i.e., So for the decay constant we have Caveat: is biased, even though is unbiased. Can show (bias →0 for n →∞) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V - I-1 is a positive semi-definite matrix, where Therefore Often use I-1 as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Two-parameter example of ML Consider a scattering angle distribution with x = cos q, Data: x1,..., xn, n = 2000 events. As test generate with MC using a = 0.5, b = 0.5 From data compute log-likelihood: Maximize numerically (e.g., program MINUIT) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example of ML: fit result Finding maximum of ln L(a, b) numerically (MINUIT) gives N.B. Here no binning of data for fit, but can compare to histogram for goodness-of-fit (e.g. ‘visual’ or c2). (Co)variances from (MINUIT routine HESSE) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of ML estimators: graphical method Often (e.g., large sample case) one can approximate the covariances using only the likelihood L(θ): This translates into a simple graphical recipe: ML fit result → Tangent lines to contours give standard deviations. → Angle of ellipse f related to correlation: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of ML estimators: MC To find the ML estimate itself one only needs the likelihood L(θ) . In principle to find the covariance of the estimators, one requires the full model L(x|θ). E.g., simulate many times independent data sets and look at distribution of the resulting estimates: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extended ML Sometimes regard n not as fixed, but as a Poisson r.v., mean n. Result of experiment defined as: n, x1, ..., xn. The (extended) likelihood function is: Suppose theory gives n = n(q), then the log-likelihood is where C represents terms not depending on q. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extended ML (2) Example: expected number of events where the total cross section s(q) is predicted as a function of the parameters of a theory, as is the distribution of a variable x. Extended ML uses more info → smaller errors for Important e.g. for anomalous couplings in e+e- → W+W- If n does not depend on q but remains a free parameter, extended ML gives: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: fs(x) and fb(x). We observe a mixture of the two event types, signal fraction = q, expected total number = n, observed total number = n. Let goal is to estimate ms, mb. → G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extended ML example (2) Monte Carlo example with combination of exponential and Gaussian: Maximize log-likelihood in terms of ms and mb: Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

ML with binned data Often put data into a histogram: Hypothesis is where If we model the data as multinomial (ntot constant), then the log-likelihood function is: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

ML example with binned data Previous example with exponential, now put data into histogram: Limit of zero bin width → usual unbinned ML. If ni treated as Poisson, we get extended log-likelihood: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Relationship between ML and Bayesian estimators In Bayesian statistics, both q and x are random variables: Recall the Bayesian method: Use subjective probability for hypotheses (q); before experiment, knowledge summarized by prior pdf p(q); use Bayes’ theorem to update prior in light of data: Posterior pdf (conditional pdf for q given x) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

ML and Bayesian estimators (2) Purist Bayesian: p(q | x) contains all knowledge about q. Pragmatist Bayesian: p(q | x) could be a complicated function, → summarize using an estimator Take mode of p(q | x) , (could also use e.g. expectation value) What do we use for p(q)? No golden rule (subjective!), often represent ‘prior ignorance’ by p(q) = constant, in which case But... we could have used a different parameter, e.g., l = 1/q, and if prior pq(q) is constant, then pl(l) is not! ‘Complete prior ignorance’ is not well defined. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

The method of least squares Suppose we measure N values, y1, ..., yN, assumed to be independent Gaussian r.v.s with Assume known values of the control variable x1, ..., xN and known variances We want to estimate θ, i.e., fit the curve to the data points. The likelihood function is G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

The method of least squares (2) The log-likelihood function is therefore So maximizing the likelihood is equivalent to minimizing Minimum defines the least squares (LS) estimator Very often measurement errors are ~Gaussian and so ML and LS are essentially the same. Often minimize χ2 numerically (e.g. program MINUIT). G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

LS with correlated measurements If the yi follow a multivariate Gaussian, covariance matrix V, Then maximizing the likelihood is equivalent to minimizing G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example of least squares fit Fit a polynomial of order p: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Variance of LS estimators In most cases of interest we obtain the variance in a manner similar to ML. E.g. for data ~ Gaussian we have and so 1.0 or for the graphical method we take the values of θ where G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Two-parameter LS fit G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness-of-fit with least squares The value of the χ2 at its minimum is a measure of the level of agreement between the data and fitted curve: It can therefore be employed as a goodness-of-fit statistic to test the hypothesized functional form λ(x; θ). We can show that if the hypothesis is correct, then the statistic t = χ2min follows the chi-square pdf, where the number of degrees of freedom is nd = number of data points - number of fitted parameters G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness-of-fit with least squares (2) The chi-square pdf has an expectation value equal to the number of degrees of freedom, so if χ2min ≈ nd the fit is ‘good’. More generally, find the p-value: This is the probability of obtaining a χ2min as high as the one we got, or higher, if the hypothesis is correct. E.g. for the previous example with 1st order polynomial (line), whereas for the 0th order polynomial (horizontal line), G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness-of-fit vs. statistical errors G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness-of-fit vs. stat. errors (2) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

LS with binned data G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

LS with binned data (2) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

LS with binned data — normalization G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

LS normalization example G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit from the likelihood ratio Suppose we model data using a likelihood L(μ) that depends on N parameters μ = (μ1,..., μΝ). Define the statistic Value of tμ reflects agreement between hypothesized μ and the data. Good agreement means μ ≈ μ, so tμ is small; Larger tμ means less compatibility between data and μ. Quantify “goodness of fit” with p-value: ⌃ G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Likelihood ratio (2) Now suppose the parameters μ = (μ1,..., μΝ) can be determined by another set of parameters θ = (θ1,..., θM), with M < N. E.g. in LS fit, use μi = μ(xi; θ) where x is a control variable. Define the statistic fit M parameters fit N parameters Use qμ to test hypothesized functional form of μ(x; θ). To get p-value, need pdf f(qμ|μ). G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Wilks’ Theorem (1938) Wilks’ Theorem: if the hypothesized parameters μ = (μ1,..., μΝ) are true then in the large sample limit (and provided certain conditions are satisfied) tμ and qμ follow chi-square distributions. For case with μ = (μ1,..., μΝ) fixed in numerator: Or if M parameters adjusted in numerator, degrees of freedom G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit with Gaussian data Suppose the data are N independent Gaussian distributed values: want to estimate known Likelihood: Log-likelihood: ML estimators: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Likelihood ratios for Gaussian data The goodness-of-fit statistics become So Wilks’ theorem formally states the well-known property of the minimized chi-squared from an LS fit. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Likelihood ratio for Poisson data Suppose the data are a set of values n = (n1,..., nΝ), e.g., the numbers of events in a histogram with N bins. Assume ni ~ Poisson(νi), i = 1,..., N, all independent. Goal is to estimate ν = (ν1,..., νΝ). Likelihood: Log-likelihood: ML estimators: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit with Poisson data The likelihood ratio statistic (all parameters fixed in numerator): Wilks’ theorem: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit with Poisson data (2) Or with M fitted parameters in numerator: Wilks’ theorem: Use tμ, qμ to quantify goodness of fit (p-value). Sampling distribution from Wilks’ theorem (chi-square). Exact in large sample limit; in practice good approximation for surprisingly small ni (~several). G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit with multinomial data Similar if data n = (n1,..., nΝ) follow multinomial distribution: E.g. histogram with N bins but fix: Log-likelihood: ML estimators: (Only N-1 independent; one is ntot minus sum of rest.) G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Goodness of fit with multinomial data (2) The likelihood ratio statistics become: One less degree of freedom than in Poisson case because effectively only N-1 parameters fitted in denominator. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Estimators and g.o.f. all at once Evaluate numerators with θ (not its estimator): (Poisson) (Multinomial) These are equal to the corresponding -2 ln L(θ) plus terms not depending on θ, so minimizing them gives the usual ML estimators for θ. The minimized value gives the statistic qμ, so we get goodness-of-fit for free. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Using LS to combine measurements G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Combining correlated measurements with LS G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example: averaging two correlated measurements G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Negative weights in LS average G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extra slides G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example of ML: parameters of Gaussian pdf Consider independent x1, ..., xn, with xi ~ Gaussian (m,s2) The log-likelihood function is G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Example of ML: parameters of Gaussian pdf (2) Set derivatives with respect to m, s2 to zero and solve, We already know that the estimator for m is unbiased. But we find, however, so ML estimator for s2 has a bias, but b→0 for n→∞. Recall, however, that is an unbiased estimator for s2. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Extended ML example: an unphysical estimate A downwards fluctuation of data in the peak region can lead to even fewer events than what would be obtained from background alone. Estimate for ms here pushed negative (unphysical). We can let this happen as long as the (total) pdf stays positive everywhere. G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3

Unphysical estimators (2) Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates: G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 3