Likelihood Methods in Ecology

Likelihood Methods in Ecology
Lecture 1 - Likelihood Estimation C. D. Canham Likelihood Methods in Ecology May 29 – June 2, 2017 Fort Collins, Colorado Instructors: Charles Canham And Patrick Martin

Lecture 1 - Likelihood Estimation
C. D. Canham Daily Schedule Morning 8:30 – 9:30 Lecture 9:30 – 10:30 Case Study and Discussion 10:30 – 12:00 Lab Afternoon 1:00 – 2:00 Lecture 2:00 – 5:00 Lab Please coordinate rides from your dorm with others…

Course Outline Statistical Inference using Likelihood
Lecture 1 - Likelihood Estimation C. D. Canham Course Outline Statistical Inference using Likelihood Principles and practice of maximum likelihood estimation Know your data – choosing appropriate likelihood functions Formulate statistical models as alternate hypotheses Find the ML estimates of the parameters of your models Compare alternate models and choose the most parsimonious Evaluate individual models Advanced topics Likelihood is much more than a set of statistical methods... (it can completely change the way you ask and answer questions…)

Lecture 1 An Introduction to Likelihood Estimation
Lecture 1 - Likelihood Estimation C. D. Canham Lecture 1 An Introduction to Likelihood Estimation Probability and probability density functions Maximum likelihood estimates (versus traditional “method of moment” estimates) Statistical inference Classical “frequentist” statistics : Limitations and mental gyrations... The “likelihood” alternative: Basic principles and definitions Model comparison as a generalization of hypothesis testing

A simple definition of probability for discrete events...
Lecture 1 - Likelihood Estimation C. D. Canham A simple definition of probability for discrete events... “...the ratio of the number of events of type A to the total number of all possible events (outcomes)...” The enumeration of all possible outcomes is called the sample space (S). If there are n possible outcomes in a sample space, S, and m of those are favorable for event A, then the probability of event, A is given as P{A} = m/n

Probability defined more generally...
Lecture 1 - Likelihood Estimation C. D. Canham Probability defined more generally... Consider an outcome X from some process that has a set of possible outcomes S: If X and S are discrete, then P{X} = X/S If X is continuous, then the probability has to be defined in the limit: Lecture 2 will go into much more detail about PDFs… Where g(x) is a probability density function (PDF)

The Normal Probability Density Function (PDF)
Lecture 1 - Likelihood Estimation C. D. Canham The Normal Probability Density Function (PDF) m = mean s2= variance Just because a function is commonly used as a PDF does not mean that it satisfies the critical properties of a proper PDF in all cases. In the labs we will go over examples of situations where the function fails the test of a true PDF.. Properties of a PDF: (1) 0 < prob(x) < 1 (2) ∫ prob(x) = 1

C. D. Canham Common PDFs... For continuous data: Normal Lognormal Gamma For discrete data: Poisson Binomial Multinomial Negative Binomial As an example, the graph shows the Poisson PDF, which typically applies to count data (small integer values) See McLaughlin (1993) “A compendium of common probability distributions” in the reading list

Why are PDFs important? Answer: because they are used to calculate likelihood… (And in that case, they are called “likelihood functions”)

Statistical “Estimators”
A statistical estimator is a function applied to a sample of data, and used to estimate an unknown population parameter (and an “estimate” is just the result of applying an “estimator” to a sample)

Properties of Estimators
Some desirable properties of “point estimators” (functions to estimate a fixed parameter) Bias: if the average error is zero, the estimate is unbiased Efficiency: an estimate with the minimum variance is the most efficient (note: the most efficient estimator is often biased) Consistency: As sample size increases, the probability of the estimate being close to the parameter increases Asymptotically normal: a consistent estimator whose distribution around the true parameter θ approaches a normal distribution with standard deviation shrinking in proportion to as the sample size n grows

Maximum likelihood (ML) estimates versus
Method of moment (MOM) estimates Bottom line: MOM was born in the time before computers, and was OK, ML needs computing power, but has more desirable properties…

Doing it MOM’s way: Central Moments

What’s wrong with MOM’s way?
Nothing, if all you are interested in is calculating properties of your sample… But MOM’s formulas are generally not the best way1 to infer estimates of the statistical properties of the population from which the sample was drawn… For example: Population variance (because the second central moment is a biased underestimate of the population variance) 1… in the formal terms of bias, efficiency, consistency, and asymptotic normality

The Maximum Likelihood alternative…
Going back to PDF’s: in plain language, a PDF allows you to calculate the probability that an observation will take on a value (x), given the underlying (true?) parameters of the population

C. D. Canham But there’s a problem… The PDF defines the probability of observing an outcome (x), given that you already know the true population parameter (θ) But we want to generate an estimate of θ, given our data (x) And, unfortunately, the two are not identical:

Fisher and the concept of “Likelihood”...
Lecture 1 - Likelihood Estimation C. D. Canham Fisher and the concept of “Likelihood”... The “Likelihood Principle” Emphasize the difference between likelihood (of the model, given the data) and probability (of the data, given the model) In plain English: “The likelihood (L) of the parameter estimates (θ), given a sample (x) is proportional to the probability of observing the data, given the parameters...” {and this probability is something we can calculate, using the appropriate underlying probability model (i.e. a PDF)}

C. D. Canham R.A. Fisher ( ) “Likelihood and Probability in R. A. Fisher’s Statistical Methods for Research Workers” (John Aldrich) A good summary of the evolution of Fisher’s ideas on probability, likelihood, and inference… Contains links to PDFs of Fisher’s early papers… A second page shows the evolution of his ideas through changes in successive editions of Fisher’s books… Fisher revolutionized both statistics and genetics in the 20th century. His statistical contributions began while still an undergraduate at Cambridge. Age 22

Calculating Likelihood and Log-Likelihood for Datasets
Lecture 1 - Likelihood Estimation C. D. Canham Calculating Likelihood and Log-Likelihood for Datasets From basic probability theory: If two events (A and B) are independent, then P(A,B) = P(A)P(B) More generally, for i = 1..n independent observations, and a vector X of observations (xi): Remember that g(x|theta) is a PDF specifying the probability of observing any given data point, given the parameters (theta) of the scientific model and the probability model. Thus, g(x) is the “probability model”, and it’s form depends on the nature of the data (i.e. need to use the appropriate PDF) where is the appropriate PDF But, logarithms are easier to work with, so...

C. D. Canham A simple example… A sample of 10 observations… Assume they are normally distributed, with an unknown population mean and standard deviation. What is the (log) likelihood that the mean is 4.5 and the standard deviation is 1.2? Do this in R… using dnorm

Likelihood “Surfaces”
Lecture 1 - Likelihood Estimation C. D. Canham Likelihood “Surfaces” The variation in likelihood for any given set of parameter values defines a likelihood “surface”... Consider the simpler regression model y = ax (+ error) If you calculate the probability of observing the actual values recorded in your sample for each of a range of possible parameter estimates of a (and an estimate of the other “shape” parameters of the PDF, such as the variance of the normal PDF), the set of probabilities for the different parameter values will form a “likelihood surface”… i.e. the likelihood of any given parameter value, given the data… For a model with just 1 parameter, the surface is simply a curve: (aka a “likelihood profile”)

“Support” and “Support Limits”
Lecture 1 - Likelihood Estimation C. D. Canham “Support” and “Support Limits” Log-likelihood = “Support” (Edwards 1992) Note that in contrast to a traditional confidence interval, support intervals are not explicit functions of sample size (i.e. they reflect variance in the data alone, not the sample size) The steepness of the peak of the likelihood surface determines the breadth of the support interval... Note that the steepness of the surface away from the MLE is also a measure of the variance (Fisher’s information)

Another (still somewhat trivial) example…
Lecture 1 - Likelihood Estimation C. D. Canham Another (still somewhat trivial) example… MOM vs ML estimates of the probability of survival for a population: Data: a quadrat in which 16 of 20 seedlings survived during a census interval. (Note that in this case, the quadrat is the unit of observation…, so sample size = 1) In this simple example, both the MOM and ML estimates are identical (0.8). But what would you use as the estimate of the probability of survival for a set of quadrats, each with different numbers of seedlings initially alive, and in which different numbers died during the census interval? likelihood i.e. Given N=20, x = 16, what is p? p <- seq(0,1,0.005) lh <- dbinom(16,20,p) plot(p,lh) p[which.max(lh)] p

A more interesting example
# Create some data (5 quadrats) N <- c(11,14,8,22,50) x <- c(8,7,5,17,35) x/N [1] # Calculate the log-likelihood for each # probability of survival p <- seq(0,1,0.005) log_likelihood <- rep(0,length(p)) for (i in 1:length(p)) { log_likelihood[i] <- sum(log(dbinom(x,N,p[i]))) } # Plot the likelihood profile plot(p,log_likelihood) # What probability of survival maximizes log likelihood? p[which.max(log_likelihood)] 0.685 # How does this compare to the average of the proportions across the 5 quadrats mean(x/N) 0.665

Focus in on the MLE… # what is the log-likelihood of the MLE?
max(log_likelihood) [1] Things to note about log-likelihoods: They should always be negative! (if not, you have a problem with your likelihood function) The absolute magnitude of the log-likelihood increases as sample size increases

An example with continuous data…
The normal PDF: x = observed m = mean s2= variance In R: dnorm(x, mean = 0, sd = 1, log = FALSE) > dnorm(2,2.5,1) [1] > dnorm(2,2.5,1,log=T) [1] > Problem: Now there are TWO unknowns needed to calculate likelihood (the mean and the variance)! Solution: treat the variance just like another parameter in the model, and find the ML estimate of the variance just like you would any other parameter… (this is exactly what you’ll do in the lab this morning…)

Likelihood Methods in Ecology

Similar presentations

Presentation on theme: "Likelihood Methods in Ecology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Likelihood Methods in Ecology

Similar presentations

Presentation on theme: "Likelihood Methods in Ecology"— Presentation transcript:

Similar presentations

About project

Feedback