Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
The Simple Linear Regression Model: Specification and Estimation
Species interaction models. Goal Determine whether a site is occupied by two different species and if they affect each others' detection and occupancy.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Visual Recognition Tutorial
Chapter 4 Multiple Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
Linear and generalised linear models
An Introduction to Logistic Regression
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Generalized Linear Models
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
1 CY1B2 Statistics Aims: To introduce basic statistics. Outcomes: To understand some fundamental concepts in statistics, and be able to apply some probability.
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
King Saud University Women Students
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Dan Piett STAT West Virginia University Lecture 12.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Computacion Inteligente Least-Square Methods for System Identification.
1 Occupancy models extension: Species Co-occurrence.
Conditional Expectation
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Nonlinear function minimization (review). Newton’s minimization method Ecological detective p. 267 Isaac Newton We want to find the minimum value of f(x)
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Copyright © Cengage Learning. All rights reserved.
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Maximum Likelihood Estimation
Generalized Linear Models
Random Variables Binomial Distributions
Estimating mean abundance from repeated presence-absence surveys
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Discrete Random Variables: Basics
Discrete Random Variables: Joint PMFs, Conditioning and Independence
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Wildlife Population Analysis
Applied Statistical and Optimization Models
Presentation transcript:

Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria

Maximum Likelihood Estimates (MLE) Given a model (  ) MLE is (are) the value(s) that are most likely to estimate the parameter(s) of interest. That is, they maximize the probability of the model given the data. The likelihood of a model is the product of the probabilities of the observations.

Maximum Likelihood Estimation For linear models (e.g., ANOVA and regression) these are usually determined using the linear equations which minimize the sum of the squared residuals – closed form For nonlinear models and some distributions we determine MLEs setting the first derivative equal to zero and then making sure it is a maxima by setting the second derivative equal to zero – closed form. Or we can search for values that maximize the probabilities of all of the observations – numerical estimation. Search stops when certain criteria are met: Precision of the estimate Change in the likelihood Solution seems unlikely (stops after n iterations)

Binomial probability Some theory and math An example Assumptions Adding a link function Additional assumptions about  s

Binomial Sampling Characterized by two mutually exclusive events Heads or tails On or off Dead or alive Used or not used, or Occupied or not occupied. Often referred to as Bernoulli trials

Models Trials have an associated parameter p p = probability of success. 1- p = probability of failure ( = q ) p + q = 1 p also represents a model Single parameter p is equal for every trial

Binomial Sampling p is a continuous variable between 0 and 1 (0 < p <1) y is the number of successful outcomes n is the number of trials. This estimator is unbiased.

Binomial Probability Function The probability of observing y successes given n trials with the underlying probability p is... Example: 10 flips of a fair coin ( p = 0.5), 7 of which turn up heads is written

Binomial Probability Function (2) evaluated numerically: In Excel: =BINOMDIST( y, n, p, FALSE)

Binomial Probability Function (3) n 10 y p 0.5 yBINPROB

Reality: have data ( n and y ) don’t know the model ( p ) leads us to the likelihood function: read the likelihood of p given n and y is... not a probability function. is a positive function (0 < p < 1) Likelihood Function of Binomial Probability

Likelihood Function of Binomial Probability(2) Alternatively, the likelihood of the data given the model can be thought of as the product of the probabilities of the individual observations. The probability of the observations is: Therefore, f = 1 for success, f = 0 for failure

Binomial Probability Function and it's likelihood maximum

Log likelihood Although the Likelihood function is useful, the log- likelihood has some desirable properties in that the terms are additive and the binomial coefficient does not include p.

Log likelihood Using the alternative: The estimate of p that maximizes the value of ln( L ) is the MLE.

Precision L ( p |10,7) L ( p |100,70) As n , precision , variance 

Properties of MLEs Asymptotically normally distributed Asymptotically minimize variance Asymptotically unbiased as n →  One-to-one transformations of MLEs are also MLEs. For example mean lifespan: is also an MLE.

Assumptions: n trials must be identical – i.e., the population is well defined (e.g.,20 coin flips, 50 Kirtland's warbler nests, 75 radio-marked black bears in the Pisgah Bear Sanctuary). Each trial results in one of two mutually exclusive outcomes. (e.g., heads or tails, survived or died, successful or failed, etc.) The probability of success on each trial remains constant. (homogeneous) Trials are independent events (the outcome of one does not depend on the outcome of another). y, the number of successes; is the random variable after n trials.

Example – use/non-use survey Selected 50 sites ( n ) at random (or systematically) with a study area. Visit each site once and ‘surveyed’ for species x Species was detected at 10 sites (y) Meet binomial assumptions: Sites selected without bias Surveys conducted using same methods Sites could only be used or not used (occupied) No knowledge of habitat differences or species preferences Sites are independent Additional assumption – perfect detection

Example – calculating the likelihood maximum p LikelihoodVarianceSE E E E E E E E E E E E E E E E E E E E E E+0000

Example – results MLE = 20% + 6% of the area is occupied

Link functions - adding covariates “Link” the covariates, the data (X), with the response variable (i.e., use or occupancy) Usually done with logit link: Nice properties: Constrains result 0< p i <1

Link functions - adding covariates “Link” the covariates, the data (X), with the response variable (i.e., use or occupancy) Usually done with logit link: Nice properties: Constrains result 0< p i <1  s can be -∞ <  < +∞ Additional assumption –  s are normally distributed

Link function Binomial likelihood: Substitute the link for p Voila! – logistic regression

Link function More than one covariate can be included Extend the logit (linear equation).  s are the estimated parameters (effects); estimated for each period or group constrained to be equal using the data ( x ij ).

Link function The use rates or real parameters of interest are calculated from the  s as in this equation. HUGE concept and applicable to EVERY estimator we examine. Occupancy and detection probabilities are replaced by the link function submodel of the covariate(s). Conceivably every sites has a different probability of use that is related to the value of the covariates.

Multinomial probability An example Adding a link function

Multinomial Distribution and Likelihoods Extension of the binomial coefficient with more than two possible mutually exclusive outcomes. Nearly always introduced by way of die tossing. Another example Multiple presence/absence surveys at multiple sites

Binomial Coefficient The binomial coefficient was the number of ways y successes could be obtained from the n trials Example 7 successes in 10 trials

Multinomial coefficient The multinomial coefficient or the number of possible outcomes for die tossing (6 possibilities): Example rolling each die face once in 6 trials:

Properties of multinomials Dependency among the counts. For example, if a die is thrown and it is not a 1, 2, 3, 4, or 5, then it must be a 6. FaceNumberVariable 110 y1y1 211 y2y2 313 y3y3 49 y4y4 58 y5y5 69 y6y6 TOTAL60 n

Multinomial pdf Probability an outcome or series of outcomes:

Die example 1 The probability of rolling a fair die ( p i = 1/6) six times ( n ) and turning up each face only once ( n i = 1) is:

Die example 1 Dependency

Example 2 Another example, the probability of rolling 2 – 2s, 3 – 3s, and 1 – 4 is:

Likelihood As you might have expected, the likelihood of the multinomial is of greater interest to us We frequently have data ( n, y i...m ) and are seeking to determine the model ( p i…m ). The likelihood for our example with the die is:

Log-likelihood This likelihood has all of the same properties we discussed for the binomial case. Usually solve to maximize the ln ( L )

Log-likelihood Ignoring the multinomial coefficient (constant)

Presence-absence surveys & multinomials Procedure: Select a sample of sites Conduct repeated presence-absence surveys at each site Usually temporal replication Sometimes spatial replication Record presence or absence of species during survey

Encounter histories for each site & species Encounter history matrix Each row represents a site Each column represents a sampling occasion. On each occasion each species ‘1’ if encountered (captured) ‘0’ if not encountered. Occasion Site No

Encounter history - example For sites sampled on 3 occasions there are 8 (=2 m = 2 3 ) possible encounter histories 10 sites were sampled 3 times (not enough for a good estimate) 1 – Detected during survey 0 – Not-detected during survey Separate encounter history for each species yiyi Encounter History

Encounter history - example Each capture history is a possible outcome, Analogous to one face of the die ( n i ). Data consist of the number of times each capture history appears ( y i ). yiyi Encounter History

Encounter history - example Each encounter history has an associated probability ( p i ) Each p ij can be different yiyi Encounter History

Log-likelihood example Log-likelihood Calculate log of the probability of encounter history (ln( P i )) Multiply ln( P i ) by the number of times observed ( y i ) Sum the products

Link function in binomial Binomial likelihood: Substitute the link for p Voila! – logistic regression

Multinomial with link function Substitute the logit link for the p i

But wait a minute! Is Pr(Occupancy) = Pr(Encounter)?

Probability of encounter includes both detection and use (occupancy). Occupancy analysis estimates each thus providing conditional estimates of use of sites. yiyi Encounter History Sites known to be used Absent or Not detected?