1 7. What to Optimize? In this session: 1.Can one do better by optimizing something else? 2.Likelihood, not LS? 3.Using a handful of likelihood functions.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
Point Estimation Notes of STAT 6205 by Dr. Fan.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Statistical Estimation and Sampling Distributions
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
SPF workshop February 2014, UBCO1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Section 6.1 Let X 1, X 2, …, X n be a random sample from a distribution described by p.m.f./p.d.f. f(x ;  ) where the value of  is unknown; then  is.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Maximum likelihood (ML)
458 Fitting models to data – IV (Yet more on Maximum Likelihood Estimation) Fish 458, Lecture 11.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Stat 321 – Lecture 26 Estimators (cont.) The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Chi Square Distribution (c2) and Least Squares Fitting
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Maximum likelihood (ML)
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
HYPOTHESIS TESTING Distributions(continued); Maximum Likelihood; Parametric hypothesis tests (chi-squared goodness of fit, t-test, F-test) LECTURE 2 Supplementary.
SPF workshop February 2014, UBCO1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.
One Madison Avenue New York Reducing Reserve Variance.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
SPF workshop UBCO February CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Review of statistical modeling and probability theory Alan Moses ML4bio.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Computacion Inteligente Least-Square Methods for System Identification.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Virtual University of Pakistan Lecture No. 34 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Virtual University of Pakistan
Statistics 350 Lecture 3.
The Maximum Likelihood Method
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Goodness of Fit x² -Test
The Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Simple Linear Regression
12. Principles of Parameter Estimation
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Presentation transcript:

1 7. What to Optimize? In this session: 1.Can one do better by optimizing something else? 2.Likelihood, not LS? 3.Using a handful of likelihood functions. CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff Ch9: Adding variables CH10. Choosing a model equation SPF workshop February 2014, UBCO

2 Perhaps the fit was bad because: The function is not good Important traits are missing The objective function is not appropriate later Now The two common methods:  Least Squares  Maximum Likelihood Carl Friedrich Gauss Sir Ronald Fisher

SPF workshop February 2014, UBCO3 Introducing ‘Likelihood’ In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters.estimatingparametersstatistical model statistical modelestimates From Wikipedia Popular in SPF modeling

SPF workshop February 2014, UBCO4 Year1234 Crashes1740 What is the probability of 1, 7, 4, 0 if μ=2.0 crashes/year? Open #9. ‘Likelihood functions’ on ‘Poisson’ workpage Example 1: Get the ML estimate of a 

The likelihood of μ=4.0 The likelihood of μ=2.0 The ‘Likelihood Function’ ℒ(.) will be used to denote a likelihood function. The dot in the parenthesis is a placeholder for parameters. Thus, e.g., ℒ(μ) is the is the likelihood function of μ. 5

6 Computing likelihood at very many μ’s we would see a smooth curve - the ‘likelihood function’. Probability to observe 1, 7,4 and 0 accidents The  at which Observing 1 & 7 &4 & 0 is most probable SPF workshop February 2014, UBCO

7 The parameter value at which the likelihood function has its peak is the ‘Maximum Likelihood’ (ML) estimate of that parameter. It is not the most probable value of the parameter. It is the parameter value at which the observations are most probable. SPF workshop February 2014, UBCO

8 Return to #9 on the ‘Poisson ML’ workpage The ‘Target’ cell The ‘By Changing’ cell Use ‘Solver’ With the 1, 7, 4, 0 crash record, which μ is most likely? Show that ML estimate of  is 3.00.

Example 2: What distribution fits the data? Number of drivers out of who, during had 1 accident. Continue now to the ‘Does the Poisson fit’ workpage. The Data If all drivers had the same  then one would expect n(k) to be consistent with the Poisson distribution. Is it? 9SPF workshop February 2014, UBCO

10 Here the Poisson predicts too few crashes If Poisson was a good fit Answer:.... SPF workshop February 2014, UBCO

What distribution does fit the data? Continue to the ‘NegBin ML Empty’ workpage in #9. NB applies to populations of units where each unit may have a different  and the  ’s are Gamma distributed Poisson applies to population of units that all have the same  11

Parameters to be estimated Will NB fit the data?

13 Question 1: What are the ML estimates of ‘a’ and ‘b’? (both must be positive) Question 2: With these ‘a’ and ‘b’ how good is the correspondence between the observed and fitted n(k) SPF workshop February 2014, UBCO

14 The likelihood function is the product of many small probabilities. To avoid computational difficulties we use the log-likelihood ‘product’ replaced by ‘sum’. Initial guesses Preparing the likelihood function for Solver SPF workshop February 2014, UBCO

15 The probability than n(k) units have k accidents is P(K=k) n(k) The log-likelihood is the sum over all k of n(k)ln[P(K=k)] Now we are ready to estimate ‘a’ and ‘b’ B*C SPF workshop February 2014, UBCO

16 ‘a’ and ‘b’ must be non-negative ML estimates Does this NB fit the data?

SPF workshop February 2014, UBCO17 Was the NB assumption for populations reasonable? Numbers expected if NB and parameters both were true. (By method of moments in 1.4 we got 3.55 and 0.85)

SPF workshop February 2014, UBCO18 Now the ground is ready: The Poisson Likelihood function for SPF curve-fitting Chapter 8 Replace by and you have a function of. Now you can find values of which make the log-likelihood largest. Does not matter

19 The C-F spreadsheet for Poisson likelihood function. Go to: #10.Poisson fit (Full).xlsx on ‘Poisson’ workpage Our model equation (for now) Sum of log-likelihoods Formula:=-E8+D8*LN(E8), copy down

Very similar to OLS, No point in CURE SOLVER solution Click

21 The Poisson L-F solves the ‘equal variances’ problem. However, it has a problem of its own – no overdispersion. The Negative Binomial Likelihood Function To illustrate, 91 of 5323 segments are 0.01 miles long. For these, Sample Variance of accident counts =0.114, Sample Mean of accident counts = If Poisson, Variance=Mean If Variance>Mean, Overdispersion, not Poisson SPF workshop February 2014, UBCO

22 (Sample variance)/(Sample Mean) Segment Length [miles] 50 segment length bins If Poisson

23 The NB Likelihood Function continued PoissonNegative Binomial Crash Counts for each unit are Poisson distributed Units with the same traits in the model equation have the same μ Units with the same traits in the model equation have μ’s that comes from a Gamma distribution Assumptions: Common and Different SPF workshop February 2014, UBCO

24 The Gamma pdf can take on a variety of shapes. Limitations. E{μ}=b/a, VAR{μ}=(E{μ}) 2 /b For many populations NB fits. Ergo: Gamma is often OK.

SPF workshop February 2014, UBCO25 Go to #11 NB fit.xlsx Our model equation (for now) Implementing the NB on a C-F spreadsheet. Log-likelihood for segment 1 Sum of log-likelihoods

26 Details in text Modifying the Poisson C-F spreadsheet to NB =IF(OR(B8<=0,C8<=0,E8<=0),0,GAMMALN(D8+$G$2*B8)- GAMMALN($G$2*B8)+$G$2*B8*LN($G$2*B8)+D8*LN(E8)- ($G$2*B8+D8)*LN($G$2*B8+E8)) Add cell for new parameter

Very similar to OLS, and Poisson SOLVER solution 27

SPF workshop February 2014, UBCO28 Example of use: What are the estimates of E{μ} and VAR{μ} for a 0.7 mile long segment (Colorado, two-lane,...)? Answer: Estimate of E{μ}=1.636 × =1.20 I&F crashes in 5 years; V{μ i }=(E{μ i }) 2 /b i and b i = ×L i Estimate of VAR{μ}= /(0.531*0.7)=3.87 (I&F...) ±1.97, must reduce uncertainty!

SPF workshop February 2014, UBCO29 In this session we asked: What to Optimize? Traditionally: 1.Minimize (weighted) SSD 2.Maximize Likelihood Both are motivated by focus on parameters When the focus is on ‘How to predict well’, other criteria emerge: 1.Minimize absolute deviations 2.Minimize Total Absolute Bias 3.Minimize, etc. In lecture notes.

SPF workshop February 2014, UBCO30 Summary for Chapter 7. 1.Instead of minimizing SSD (which gave poor fits), we asked whether fit is improved by maximizing likelihood; 2.Likelihood was explained and illustrated; 3.To write a likelihood function one must make assumptions. The assumptions behind the Poisson and NB likelihoods were discussed; 4.We used the Poisson likelihood function. The fit was not improved. 5. One of the assumptions behind the Poisson likelihood is not realistic. The NB likelihood function removes the blemish.

SPF workshop February 2014, UBCO31 6.The estimate of the shape parameter ‘b’ is needed for tasks such as blackspot identification and EB safety estimation; 7.The fit is still not very good. Can it be improved by using a better model equation?