880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Statistical Estimation and Sampling Distributions
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
The Simple Linear Regression Model: Specification and Estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Statistics.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
5. Estimation 5.3 Estimation of the mean K. Desch – Statistical methods of data analysis SS10 Is an efficient estimator for μ ?  depends on the distribution.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Chi Square Distribution (c2) and Least Squares Fitting
Linear and generalised linear models
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Least-Squares Regression
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Introduction l Example: Suppose we measure the current (I) and resistance (R) of a resistor. u Ohm's law relates V and I: V = IR u If we know the uncertainties.
Regression Analysis (2)
880.P20 Winter 2006 Richard Kass Propagation of Errors Suppose we measure the branching fraction BR(Higgs  +  - ) using the number of produced Higgs.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
880.P20 Winter 2006 Richard Kass 1 Hypothesis Testing The goal of hypothesis testing is to set up a procedure(s) to allow us to decide if a model is acceptable.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Physics 114: Exam 2 Review Lectures 11-16
R Kass/SP07 P416 Lecture 4 1 Propagation of Errors ( Chapter 3, Taylor ) Introduction Example: Suppose we measure the current (I) and resistance (R) of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Lab 3b: Distribution of the mean
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
ES 07 These slides can be found at optimized for Windows)
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
CHAPTER- 3.1 ERROR ANALYSIS.  Now we shall further consider  how to estimate uncertainties in our measurements,  the sources of the uncertainties,
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
The simple linear regression model and parameter estimation
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Probability Theory and Parameter Estimation I
Parameter Estimation and Fitting to Data
Model Inference and Averaging
The Maximum Likelihood Method
The Maximum Likelihood Method
Statistical Methods For Engineers
CONCEPTS OF ESTIMATION
Modelling data and curve fitting
Chi Square Distribution (c2) and Least Squares Fitting
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
5.2 Least-Squares Fit to a Straight Line
Computing and Statistical Data Analysis / Stat 7
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method for estimating parameters from existing data. We drop the dx n since it is just proportionality constant

880.P20 Winter 2006 Richard Kass 2 Maximum Likelihood Method (MLM) Average !

880.P20 Winter 2006 Richard Kass 3 Maximum Likelihood Method (MLM) Average ! Cramer-Rao bound

880.P20 Winter 2006 Richard Kass 4 Errors & Maximum Likelihood Method (MLM) How do we calculate errors (  ’s) using the MLM? Start by looking at the case where we have a gaussian pdf. The likelihood function is: It is easier to work with lnL: If we take two derivatives of lnL with respect to  we get: For the case of a gaussian pdf we get the familiar result: The big news here is that the variance of the parameter of interest is related to the 2 nd derivative of L. Since our example uses a gaussian pdf the result is exact. More important, the result is asymptotically true for ALL pdf’s since for large samples (n  ) all likelihood functions become “gaussian”.

880.P20 Winter 2006 Richard Kass 5 Errors & MLM The previous example was for one variable. We can generalize the result to the case where we determine several parameters from the likelihood function (e.g.  1,  2, …  n ): Here V ij is a matrix, (the “covariance matrix” or “error matrix”) and it is evaluated at the values of (  1,  2, …  n ) that maximize the likelihood function. In practice it is often very difficult or impossible to analytically evaluate the 2 nd derivatives. The procedure most often used to determine the variances in the parameters relies on the property that the likelihood function becomes gaussian (or parabolic) asymptotically. We expand lnL about the ML estimate for the parameters. For the one parameter case we have: Since we are evaluating lnL at the value of  (=  * ) that maximizes L, the term with the 1 st derivative is zero. Using the expression for the variance of  on the previous page and neglecting higher order terms we find: Thus we can determine the  k  limits on the parameters by finding the values where lnL decreases by k 2 /2 from its maximum value. This is what MINUIT does!

880.P20 Winter 2006 Richard Kass 6 Example: Log-Likelihood Errors & MLM lnL  lnL  y = m3-(m0-m1)^ 2/(2*m2^ 2) ErrorValue m m m3 NA Chisq NA R Example: Exponential decay: Log-likelihood function for 10 events lnL max for  =189 1  points: (140, 265) Vs exact: (129, 245) L not gaussian Generate events according to an exponential distribution with   = 100  generate  times from an exponential using:  i =-  0 lnr i Calculate lnL vs  &  find max of lnL and the points where lnL=lnL max -1/2 (“1  points”) Compare errors from “exact” formula and log-likelihood points Log-likelihood function for 10 4 events lnL max for  =  points: (99.8, 101.8) Vs exact: (99.8, L is fit by a gaussian  G The variance of an exponential pdf with mean lifetime=  is:  2 =  2 /n ten events: , , , , , , , , ,

880.P20 Winter 2006 Richard Kass 7 Determining the Slope and Intercept with MLM Example: MLM and determining slope and intercept of a line Assume we have a set of measurements: (x 1, y 1   ), (x 2, y 2    … (x n, y n  n  and the points are thought to come from a straight line, y=  +  x, and the measurements come from a gaussian pdf. The likelihood function is: We wish to find the  and  that maximizes the likelihood function L. Thus we need to take some derivatives: We have to solve the two equations for the two unknowns,  and . We can get an exact solution since these equations are linear in  and . Just have to invert a matrix.

880.P20 Winter 2006 Richard Kass 8 Determining the Errors on the Slope and Intercept with MLM Let’s calculate the error (covariance) matrix for  and  : Note: We could also derive the variance of  and  just using propagation of errors on the formulas for  and .

880.P20 Winter 2006 Richard Kass 9 Chi-Square (  2 ) Distribution Chi-square (  2 ) distribution: Assume that our measurements (x i  i ’s) come from a gaussian pdf with mean = . Define a statistic called chi-square: It can be shown that the pdf for  2 is: This is a continuous pdf. It is a function of two variables,  2 and n = number of degrees of freedom. (  = "Gamma Function“)  2 distribution for different degrees of freedom v A few words about the number of degrees of freedom n: n = # data points - # of parameters calculated from the data points Reminder: If you collected N events in an experiment and you histogram your data in n bins before performing the fit, then you have n data points! EXAMPLE: You count cosmic ray events in 15 second intervals and sort the data into 5 bins: number of intervals with 0 cosmic rays2 number of intervals with 1 cosmic rays7 number of intervals with 2 cosmic rays6 number of intervals with 3 cosmic rays3 number of intervals with 4 cosmic rays2 Although there were 36 cosmic rays in your sample you have only 5 data points. EXAMPLE: We have 10 data points with  and  the mean and standard deviation of the data set. If we calculate  and  from the 10 data point then n = 8 If we know  and calculate  OR if we know  and calculate  then n = 9 If we know  and  then n = 10 RULE of THUMB A good fit has  2 /DOF  1 For n  20, P(  2 >y) can be approximated using a gaussian pdf with y=(2  2 ) 1/2 -(2n-1) 1/2 A common approximation (useful for poisson case) “Pearson’s  2 ”: approximately  2 with n-1 DOF

880.P20 Winter 2006 Richard Kass 10 MLM, Chi-Square, and Least Squares Fitting Assume we have n data points of the form (y i,  i ) and we believe a functional relationship exists between the points: y=f(x,a,b…) In addition, assume we know (exactly) the x i that goes with each y i. We wish to determine the parameters a, b,.. A common procedure is to minimize the following  2 with respect to the parameters: If the y i ’s are from a gaussian pdf then minimizing the  2 is equivalent to the MLM. However, often times the y i ’s are NOT from a gaussian pdf. In these instances we call this technique “  2 fitting” or “Least Squares Fitting”. Strictly speaking, we can only use a  2 probability table when y is from a gaussian pdf. However, there are many instances where even for non-gaussian pdf’s the above sum approximates  2 pdf. From a common sense point of view minimizing the above sum makes sense regardless of the underlying pdf.

880.P20 Winter 2006 Richard Kass 11 Least Squares Fitting Example Example: Leo’s 4.8 (P107) The following data from a radioactive source was taken at 15 s intervals. Determine the lifetime (  ) of the source. The pdf that describes radioactivity (or the decay of a charmed particle) is: As written the above pdf is not linear in . We can turn this into a linear problem by taking the natural log of both sides of the pdf. We can now use the methods of linear least squares to find D and then . In doing the LSQ fit what do we use to weight the data points ? The fluctuations in each bin are governed by Poisson statistics:  2 i =N i. However in this problem the fitting variable is lnN so we must use propagation of errors to transform the variances of N into the variances of lnN. Technically the pdf is |dN(t)/(N(0)dt)| =N(t)/(N(0)  ). Leo has a “1” here

880.P20 Winter 2006 Richard Kass 12 Least Squares Fitting-Exponential Example The slope of the line is given by: Thus the lifetime (  ) = -1/D = s The error in the lifetime is:  = ± 12.3 sec. Caution: Leo has a factor of ½ in his error matrix (V -1 ) ij, Eq He minimizes: Using MLM we minimized: Note: fitting without weighting yields:  =96.8 s. Line of “best fit”

880.P20 Winter 2006 Richard Kass 13 Least Squares Fitting-Exponential Example We can calculate the  2 to see how “good” the data fits an exponential decay distribution: For this problem: lnA=4.725  A= and  = sec Poisson approximation The chi sq per dof is 1.96 The chi sq prob. is 4.9 % Mathematica Calculation: Do[{csq=csq+(cnt[i]-a*Exp[-x[i]/tau])^2/(a*Exp[-x[i]/tau])},{i,1,10}] Print["The chi sq per dof is ",csq/8] xvt=1-CDF[ChiSquareDistribution[8],csq]; Print["The chi sq prob. is ",100*xvt,"%"] This is not such a good fit since the probability is only ~4.9%.

880.P20 Winter 2006 Richard Kass 14 Extended MLM Often we want to do a MLM fit to determine the number of a signal & background events. Let’s assume we know the pdfs that describe the signal (p s ) and background (p b ) and the pdfs depend on some measured quantity x (e.g. energy, momentum, cerenkov angle..) We can write the Likelihood for a single event (i) as: L=f s p s (x i )+(1-f s )p b (x i ) with f s the fraction of signal events in the sample, and the number of signal events: N s =f s N The likelihood function to maximize (with respect to f s ) is: Usually, there is no closed form solution for f s There are several drawbacks to this solution: 1) The number of signal and background are 100% correlated. 2) the (poisson) fluctuations in the number of events (N) is not taken into account Another solution which explicitly takes into account 2) is the EXTENDED MLM: Here v=N s +N b so we can re-write the likelihood function as: The N! term drops out when we take derivatives to max L. We maximize L in terms of N s and N b. If N s & N b are poisson then so is their product for fixed N

880.P20 Winter 2006 Richard Kass 15 Extended MLM Example: BF(B  D 0 K * ) Event yields are determined from an unbinned EMLM fit in the region 5.2  m ES  5.3 GeV/c 2 Choose simple PDFs to fit m ES distributions: A=Argus G=Gaussian Perform ML fits simultaneously in 3 regions. In each region fit K , K  0, K3  m ES distributions (k=1,2,3) 9 PDFs in all I) |  E| Sideband: -100  |  E|  -60 MeV & 60  |  E|  200 MeV pdf: A k II) D 0 Sideband: |m D -m D,PDG |  Take into account “doubly peaking” backgrounds (DP) pdf: (N noP A+N DP G) k III) Signal region: |  E|  25MeV pdf: (N q q A+  N DP G+N sig G) k  scales the N DP found in D 0 the sideband fit. ~520 signal events signal region D 0 sideband  E region “fake” D 0 ’s give fake B’s should be no B’s in this region