Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Similar presentations


Presentation on theme: "Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”"— Presentation transcript:

1 Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm” Topics covered here: Classical inference (likelihood, frequentist) Bayesian inference (posterior distribution) Implementation in R (both MLE and MCMC) Case Study: logistic regression (not a HM) Case Study: Occupancy model (a HM)

2 Inference for statistical models Parametric inference: explicit probability assumptions about data. Inference proceeds assuming the model is truth. (not an approximation to truth, but actual truth) Two flavors – Classical inference – Bayesian inference

3 Bayesian vs. Classical/Frequentist Classical inference – Likelihood estimation (‘method of maximum likelihood’) – Frequentists use a relative frequency interpretation in which procedures are evaluated w.r.t. repeated realizations of the data. Probability is used to characterize how well procedures do, but not uncertainty about model parameters. Bayesian inference – Posterior inference: requires specification of a prior distribution – Bayesians make probability statements directly about model parameters, conditional on the single data set that you have

4 Notation

5 Classical inference You probably know this, but we review the basic ideas. And we show some technical elements in R to demystify what is being done in unmarked

6 What is the likelihood?

7 Example 1: Two independent binomial counts # 2 binomial observations y<- rbinom(2,size=10, p =.5) # The joint distribution function. As a function of y it gives # the probability of any two values of y1 and y2 jointdis<- function(data,K,p){ prod(dbinom(data, size=K, p=p)) } (jointdis(y, K=10, p =.5)) # also is the likelihood of p =.5 for the # given data, but it is NOT a probability for p. # Evaluate the likelihood for a grid of values of "p" p.grid<- seq(.01,.99,,200) likelihood<- rep(NA,200) for(i in 1:200){ likelihood[i]<- jointdis(y, K = 10, p=p.grid[i]) } # Plot the likelihood plot(p.grid,likelihood,xlab="p", ylab="likelihood")

8

9 Numerical maximization of the likelihood Numerical maximization of the likelihood was a HUGE change in applied statistics. Importance cannot be over-stated. Don’t need formulas (explicit estimators) Variances == numerically evaluated Can do “marginal likelihood” by integrating random effects out Don’t need a statistician to do things for you

10 Properties of MLEs

11 Other elements of classical inference

12 Parametric bootstrapping Obtain MLEs for a model Simulate data using those MLEs Obtain MLEs for simulated data Repeat many times Use the distribution of the MLEs of simulated data as an empirical estimate of the sampling distribution

13 Example 2: logistic regression

14 Ordinary logistic regression in R # -------------------------- Simulate data ---------------------------- # Create a covariate called vegHt nSites <- 100 set.seed(2014) # so that we all get the same values of vegHt vegHt <- runif(nSites, 1, 3) # uniform from 1 to 3 # Suppose that occupancy probability increases with vegHt # The relationship is described by an intercept of -3 and # a slope parameter of 2 on the logit scale # plogis is the inverse-logit (constrains us back to the [0-1] scale) psi <- plogis(-3 + 2*vegHt) # Now we go to 100 sites and observe presence or absence # Actually, let's just simulate the data z <- rbinom(nSites, 1, psi)

15 General strategy for likelihood estimation: Express the negative log-likelihood as an R function and then use the standard function optim (or nlm ) to minimize it: # Definition of negative log-likelihood. negLogLike <- function(beta, y, x) { beta0 <- beta[1] beta1 <- beta[2] psi <- plogis(beta0 + beta1*x) # inverse-logit likelihood <- psi^y * (1-psi)^(1-y) # same as: # likelihood <- dbinom(y, 1, psi) return(-sum(log(likelihood))) } # Look at (negative) log-likelihood for 2 parameter sets negLogLike(c(0,0), y=z, x=vegHt) negLogLike(c(-3,2), y=z, x=vegHt) # Lower is better!

16 # Let's minimize it formally by function minimisation starting.values <- c(beta0=0, beta1=0) opt.out <- optim(starting.values, negLogLike, y=z, x=vegHt, hessian=TRUE) (mles <- opt.out$par) # MLEs are pretty close to truth beta0 beta1 -2.539793 1.617025 # Alternative 1: Brute-force grid search for MLEs mat <- as.matrix(expand.grid(seq(-10,10,0.1), seq(-10,10,0.1))) # above: Can vary resolution nll <- array(NA, dim = nrow(mat)) for (i in 1:nrow(mat)){ nll[i] <- negLogLike(mat[i,], y = z, x = vegHt) } which(nll == min(nll)) mat[which(nll == min(nll)),] # Produce a likelihood surface, shown in Fig. 2-2. library(raster) r <- rasterFromXYZ(data.frame(x = mat[,1], y = mat[,2], z = nll)) mapPalette <- colorRampPalette(rev(c("grey", "yellow", "red"))) plot(r, col = mapPalette(100), main = "Negative log-likelihood", xlab = "Intercept (beta0)", ylab = "Slope (beta1)") contour(r, add = TRUE, levels = seq(50, 2000, 100))

17 # Alternative 2: Use canned R function glm as a shortcut (fm <- glm(z ~ vegHt, family = binomial)$coef) # Add 3 sets of MLEs into plot # 1. Add MLE from function minimisation points(mles[1], mles[2], pch = 1, lwd = 2) abline(mles[2],0) # Put a line through the Slope value lines(c(mles[1],mles[1]),c(-10,10)) # 2. Add MLE from grid search points(mat[which(nll == min(nll)),1], mat[which(nll == min(nll)),2], pch = 1, lwd = 2) # 3. Add MLE from glm function points(fm[1], fm[2], pch = 1, lwd = 2) # Note they are essentially all the same

18 Asymptotic variance/SE The hessian=TRUE option in the call to optim produces the Hessian matrix in the returned list opt.out, and so we can obtain the asymptotic standard errors (ASE) for the two parameters by doing this: Vc <- solve(opt.out$hessian) # Get variance-cov matrix ASE <- sqrt(diag(Vc)) # Extract asymptotic SEs print(ASE) beta0 beta1 0.8687444 0.4436064

19 Summary # Make a table with estimates, SEs, and 95% CI mle.table <- data.frame(Est=mles, ASE = sqrt(diag(solve(opt.out$hessian)))) mle.table$lower <- mle.table$Est - 1.96*mle.table$ASE mle.table$upper <- mle.table$Est + 1.96*mle.table$ASE mle.table Est ASE lower upper beta0 -2.539793 0.8687444 -4.2425320 -0.8370538 beta1 1.617025 0.4436064 0.7475564 2.4864933 # Plot the actual and estimated response curves plot(vegHt, z, xlab="Vegetation height", ylab="Occurrence probability") plot(function(x) plogis(beta0 + beta1 * x), 1.1, 3, add=TRUE, lwd=2) plot(function(x) plogis(mles[1] + mles[2] * x), 1.1, 3, add=TRUE, lwd=2, col="blue") legend(1.1, 0.9, c("Actual", "Estimate"), col=c("black", "blue"), lty=1, lwd=2)

20

21 Work session Different ways of obtaining MLEs: grid search, optim(), glm() Get the asymptotic SE (ASE) Plot a fitted response curve Bootstrap

22 Bootstrapping nboot <- 1000 # Obtain 1000 bootstrap samples boot.out <- matrix(NA, nrow=nboot, ncol=3) dimnames(boot.out) <- list(NULL, c("beta0", "beta1", "psi.bar")) for(i in 1:1000){ # Simulate data psi <- plogis(mles[1] + mles[2] * vegHt) z <- rbinom(M, 1, psi) # Fit model tmp <- optim(mles, negLogLike, y=z, x=vegHt, hessian=TRUE)$par psi.mean <- plogis(tmp[1] + tmp[2] * mean(vegHt)) boot.out[i,] <- c(tmp, psi.mean) }

23 Bootstrapping SE.boot <- sqrt(apply(boot.out, 2, var)) # Get bootstrap SE names(SE.boot) <- c("beta0", "beta1", "psi.bar") # 95% bootstrapped confidence intervals apply(boot.out,2,quantile,c(0.025,0.975)) beta0 beta1 psi.bar 2.5% -4.490565 0.8379983 0.5728077 97.5% -0.978901 2.5974839 0.7828377 # Boostrap SEs SE.boot beta0 beta1 psi.bar 0.89156946 0.45943765 0.05428008 # Compare these with the ASEs for regression parameters mle.table Est ASE lower upper beta0 -2.539793 0.8687444 -4.2425320 -0.8370538 beta1 1.617025 0.4436064 0.7475564 2.4864933

24 Part II: Hierarchical Models

25 Modeling species occurrence: Occupancy models

26 Modeling species abundance from counts: The N-mixture model

27 Likelihood inference for hierarchical models

28 Example: Occupancy model $\bullet$ {\bf Observation model:} \begin{equation} y_{i} \sim \mbox{Binomial}(J, p*z_{i}) \end{equation} $\bullet$ {\bf State model:} \[ z_{i} \sim \mbox{Bernoulli}(\psi_{i}) \] \[ \mbox{logit}(\psi_{i}) = \beta_0 + \beta_{1} x_{i} \] $\bullet$ What is the marginal likelihood for $y$?

29 Example: Occupancy model

30 Doing it in R nSites <- 100 vegHt <- runif(nSites, 1, 3) # uniform from 1 to 3 psi <- plogis(-3 + 2*vegHt) # Now we simulate true presence/absence for 100 sites z <- rbinom(nSites, 1, psi) ## Now generate observations p<- 0.6 J<- 3 # sample each site 3 times y<-rbinom(nSites,J,p*z) # This is the negative log-likelihood. negLogLikeocc <- function(beta, y, x,J) { beta0 <- beta[1] beta1 <- beta[2] p<- plogis(beta[3]) psi <- plogis(beta0 + beta1*x) marg.likelihood <- dbinom(y, J,p)*psi + ifelse(y==0,1,0)*(1-psi) return(-sum(log(marg.likelihood))) } starting.values <- c(beta0=0, beta1=0,logitp=0) opt.out <- optim(starting.values, negLogLikeocc, y=y, x=vegHt,J=J,hessian=TRUE)

31 N-mixture model

32 Continuous case: numerical integration

33 Snowshoe hare data, see Royle and Dorazio (2008, chapter 6) # FREQUENCIES captured 0, J=14 times: nx<-c(14,34, 16, 10, 4, 2, 2,0,0,0,0,0,0,0,0) nind<-sum(nx) J<-14 Mhlik<-function(parms){ mu<-parms[1] sigma<-exp(parms[2]) il<-rep(NA,J+1) for(k in 0:J){ il[k+1]<-integrate( function(x){ dbinom(k,J,plogis(x))*dnorm(x,mu,sigma) },lower=-Inf,upper=Inf)$value } -1*( sum(nx*log(il)) ) } tmp<-nlm(Mhlik,c(-1,-1 ),hessian=TRUE) sqrt(diag(solve(tmp$hessian)))

34 Part III: Bayesian inference

35 Bayes’ rule

36 Bayes’ rule, continued

37

38 Bayesian inference

39 The Posterior distribution

40 Computing the posterior distribution 1. Do the math. Recognize the mathematical form of the posterior as a standard named distribution that we can compute moments of. 2. Monte Carlo approximation -- draw samples from the posterior distribution and quantify posterior features by summarizing the samples. Markov chain Monte Carlo (MCMC).

41 Computing the posterior distribution

42 Computing the posterior distribution: MCMC

43 How to do Bayesian analysis: MCMC

44 The Metropolis Algorithm

45

46

47 Illustration

48 Illustration of MCMC using Metropolis Algorithm # 2 binomial observations y<- rbinom(2,size=10, p =.5) # The joint distribution function. As a function of data it gives # the probability of any two values of data=c(y1, y2) jointdis<- function(data,K,p){ prod(dbinom(data, size=K, p=p)) } (jointdis(y, K=10, p =.5)) # also happens to be the likelihood of the value p =.5 for the # given data, but it is NOT a probability for p. # Evaluate the likelihood for a grid of values of "p" p.grid<- seq(.1,.9,,200) likelihood<- rep(NA,200) for(i in 1:200){ likelihood[i]<- jointdis(y, K = 10, p=p.grid[i]) } # Plot the likelihood plot(p.grid,likelihood,xlab="p", ylab="likelihood")

49 Illustration of MCMC using Metropolis Algorithm

50

51

52 ## Do 10000 MCMC iterations using the metropolis algorithm ## Assume uniform prior which is beta(1,1) mcmc.iters<- 100000 out<- rep(NA,mcmc.iters) # starting value p<-.2 for(i in 1:mcmc.iters){ # use a uniform candidate generator. This is not efficient p.cand <- runif(1,0,1) r<- posterior(y,K=10,p=p.cand,a=1,b=1)/posterior(y,K=10,p=p,a=1,b=1) # generate a uniform r.v. and compare with "r", this imposes the # correct probability of acceptance if(runif(1) < r) # This is how you “do something” with probability r p<- p.cand out[i]<- p }

53 Likelihood vs. posterior

54 The posterior of a function of a model parameter

55 Remarks on the Metropolis Algorithm Heuristic: This algorithm has us simulate candidate values somehow, even arbitrarily, and then accept values that have higher posterior probability The long-run frequency of ``accepted'' values is that of the target posterior density! Note: If the prior is constant, this MCMC calculation is based on repeated evaluations of the likelihood only. So, if you write a function to do MLE you can also do MCMC.

56 Summary thoughts on Bayesian/classical inference

57 Idealized Structure of Workshop/Book Introduction to a class of models Likelihood analysis of models in unmarked Stressing consistent work flow and ease of doing standard things like prediction and model selection Bayesian analysis in BUGS Illustration of a type of model that can't be done (easily, or in unmarked ) using likelihood methods.


Download ppt "Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”"

Similar presentations


Ads by Google