Lecture 4 Model Selection and Multimodel Inference

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

1 Model Evaluation and Selection. 2 Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models.
Analysis of variance and statistical inference.
Brief introduction on Logistic Regression
Mean, Proportion, CLT Bootstrap
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Day 6 Model Selection and Multimodel Inference
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Model Assessment, Selection and Averaging
Visual Recognition Tutorial
458 Model Uncertainty and Model Selection Fish 458, Lecture 13.
Chapter 4 Multiple Regression.
Introduction to statistical estimation methods Finse Alpine Research Center, September 2010.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt.
Regression and Correlation Methods Judy Zhong Ph.D.
Bayesian Model Comparison and Occam’s Razor Lecture 2.
Likelihood Methods in Ecology June 2 nd – 13 th, 2008 New York, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Charles Yackulic.
Lecture 4 Model Formulation and Choice of Functional Forms: Translating Your Ideas into Models.
Day 7 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
The Method of Likelihood Hal Whitehead BIOL4062/5062.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Lecture 4 Model Selection and Multimodel Inference.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Lecture 3 Hypothesis Testing and Statistical Inference using Likelihood: The Central Role of Models Likelihood Methods in Ecology April , 2011 Granada,
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
Lecture 5 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
Some dates check out “outline version 3.0.pdf” Return reviews to reviewees (use track changes and “comments) by 20 Sep – send also to Lee Hsiang Revised.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Theories and Hypotheses. Assumptions of science A true physical universe exists Order through cause and effect, the connections can be discovered Knowledge.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.
Lecture 6 Your data and models are never perfect… Making choices in research design and analysis that you can defend.
Machine Learning 5. Parametric Methods.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Lecture 4 Model Selection and Multimodel Inference
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Further Inference in the Multiple Regression Model
Understanding Standards Event Higher Statistics Award
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Model Comparison.
Bayesian Model Comparison and Occam’s Razor
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Choosing a test: ... start from thinking whether our variables are continuous or discrete.
Lecture 3 - Hypothesis Testing and Inference with Likelihood
Simple Linear Regression
Lecture 4 Model Selection and Multimodel Inference
Lecture 4 Model Selection and Multimodel Inference
Lecture 3 - Hypothesis Testing and Inference with Likelihood
Wildlife Population Analysis
Presentation transcript:

Lecture 4 Model Selection and Multimodel Inference C. D. Canham Lecture 4 Model Selection and Multimodel Inference

Lecture 4 - Model Selection C. D. Canham Topics Model selection: how do you choose between (and compare) alternate models? Multimodel inference: how do you combine information from more than 1 model?

Comparing alternative models Lecture 4 - Model Selection C. D. Canham Comparing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support one model more than a competing model?” Strength of evidence (support) for a model is relative: Relative to other models: As models improve, support may change. Relative to data at hand: As the data improve, support may change.

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham But is likelihood enough? The challenge of parsimony The importance of seeking simple answers... “It will not be sufficient, when faced with a mass of observations, to plead special creation, even though, as we shall see, such a hypothesis commands a higher numerical likelihood than any other.” (Edwards, 1992, pg. 1, in explaining the need for a rigorous basis for scientific inference, given uncertainty in nature...) Edwards delves into likelihood precisely because he feels that the concept of probability is too limited to serve as the sole basis for scientific inference His comment on special creation as the highest numerical likelihood can be put in the context of a “full” model, in which every observation is described by a model with n parameters for n observations – everything just is exactly as the creator desired... It is exactly this line of reasoning that led Occam to propose his razor...

Models, Truth, and “Full Reality” (The Burnham and Anderson view...) Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Models, Truth, and “Full Reality” (The Burnham and Anderson view...) “We believe that “truth” (full reality) in the biological sciences has essentially infinite dimension, and hence ... cannot be revealed with only ... finite data and a “model” of those data... ... We can only hope to identify a model that provides a good approximation to the data available.” (Burnham and Anderson 2002, pg. 20)

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The “full” model Everything is the way it is because it is… In statistical terms, this is simply a model with as many parameters as observations i.e.: xi = θi This will always be the model with the highest likelihood! (but it won’t be the most parsimonious)…

Parsimony and Ockham’s razor... Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham Parsimony and Ockham’s razor... William of Ockham (1285-1349): “Pluralitas non est ponenda sine neccesitate” “entities should not be multiplied unnecessarily” In our context: if the likelihood of two different models, given the observed data, is similar, Ockham’s razor would favor the model that had the fewest necessary parts... “Parsimony: ... 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor” (Merriam-Webster Online Dictionary)

A more general framework for model comparison: Information theory Lecture 4 - Model Selection C. D. Canham A more general framework for model comparison: Information theory “Reality” = “Truth” = Unknowable (or at least too much trouble to find…) Models are approximations of reality, and we’d like to know how “close” they are to reality… The “distance” between a model and reality is defined by the “Kullback-Leibler Information” (K-L distance) Unfortunately, K-L distance can only be directly computed in hypothetical cases where reality is known.. See Chapter 2 of Burnham and Anderson for discussion and details…

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham How much information is lost when using a simple model to approximate reality? Answer: the Kullback-Leibler Distance (generally unknowable) More Practical Answer: Akaike’s Information Criterion (AIC) identifies the model that minimizes KL distance Then, the best model is the one that loses the smallest amount of information while using the smallest number of parameters... Kullback-Leibler Distance is a measure of this information loss. It is generally unknowable, but Aikaike provided an way to identify the model that minimizes KL distance:

Akaike’s contribution (1973) Lecture 4 - Model Selection C. D. Canham Akaike’s contribution (1973) Akaike (1973) proposed “an information criterion” (AIC) (but now often called an Akaike Information Criterion) that relates likelihood to K-L distance, and includes an explicit term for model complexity… ^ This is an estimate of the expected, relative distance between the fitted model and the unknown true mechanism that generated the observed data. K=number of estimated parameters

General guidelines for use of AIC Lecture 4 - Model Selection C. D. Canham General guidelines for use of AIC We select the model with smallest value of AIC (i.e. closest to “truth”). AIC will identify the best model in the set, even if all the models are poor! It is the researcher’s (your) responsibility that the set of candidate models includes well founded, realistic models.

Lecture 4 - Model Selection C. D. Canham AIC for small samples Unless the sample size (n) is large with respect to the number of estimated parameters (K), use of AICc is recommended. Generally, you should use AICc when the ratio of n/K is small (less than ~ 40), based on K from the global (most complicated) model. Use AIC or AICc consistently in an analysis rather than mix the two criteria.

Penalty for adding parameters - small sample sizes - How much does likelihood have to improve to justify adding one more parameter – as a function of sample size and model complexity? Using AICc Note: converges on 1 as sample size >> 100

But what about cases with very large sample sizes? Does it make intuitive sense that a model that is only 1 unit of likelihood better, for a sample with 10000 observations, should be selected? The Schwarz criterion (also known as the Bayesian Information Criterion) So the penalty is simply a function of sample size, and increases as a function of the natural log of n (divided by 2)

Penalty for adding parameters - large sample sizes - So the penalty is simply a function of sample size (regardless of base model complexity), and increases as a function of the natural log of n (divided by 2)

Some Rough Rules of Thumb Lecture 4 - Model Selection C. D. Canham Some Rough Rules of Thumb Differences in AIC (Δi’s) can be used to interpret strength of evidence for one model vs. another. A model with a Δ value within 1-2 of the best model has substantial support in the data, and should be considered along with the best model. A Δ value within only 4-7 units of the best model has considerably less support. A Δ value > 10 indicates that the worse model has virtually no support and can be omitted from further consideration.

Comparing models with different PDFs Lecture 4 - Model Selection C. D. Canham Comparing models with different PDFs LRTs and AIC can be used as one basis for selecting the “best” PDF for a given dataset and model, But more generally, an examination of the distribution of the residuals should guide the choice of the appropriate PDF There will be cases where different PDFs are appropriate for different models applied to the same dataset Example: neighborhood competition models where residuals shift from lognormally to normally distributed as the models are improved by additional terms

Strength of evidence for alternate models: Akaike weights Lecture 4 - Model Selection C. D. Canham Strength of evidence for alternate models: Akaike weights Akaike weights (wi) are the weight of evidence in favor of model i being the actual best model for the situation at hand given that one of the N models must be the best model for that set of N models. Akaike weights for all models combined should add up to 1. where

Lecture 4 - Model Selection C. D. Canham Uses of Akaike weights “Probability” that the candidate model is the best model. Relative strength of evidence (evidence ratios). Variable selection—which independent variable has the greatest influence? Model averaging.

Lecture 4 - Model Selection C. D. Canham An example... The Data: xi = measurements of DBH on 50 trees yi = measurements of crown radius on those trees The Scientific Models: yi = b xi + e [1 parameter (b)] yi = a + b xi + e [2 parameters (a, b)] yi = a + b xi + γ xi2 + e [3 parameters (a, b, γ )] The Probability Model: e is normally distributed, with mean = 0 and variance estimated from the observed variance of the residuals...

Lecture 4 - Model Selection C. D. Canham Back to the example….. Akaike weights can be interpreted as the estimated probability that model i is the best model for the data at hand, given the set of models considered. Weights > 0.90 indicate that robust inferences can be made using just that model.

Akaike weights and the relative importance of variables Lecture 4 - Model Selection C. D. Canham Akaike weights and the relative importance of variables For nested models, estimates of relative importance of predictor variables can be made by summing the Akaike weights of variables across all the models where the variables occur. Variables can be ranked using these sums. The larger this sum of weights, the more important the variable is.

Example: detecting density dependence Lecture 4 - Model Selection C. D. Canham Example: detecting density dependence Source: Brook, B.W. and C.J.A. Bradshaw. 2006. Strength of evidence for density dependence in abundance time series of 1198 species. Ecology 87:1445-1451.

Ambivalence about selecting a best model to use for inference… Lecture 4 - Model Selection C. D. Canham Ambivalence about selecting a best model to use for inference… The inability to identify a single best model is not a defect of the AIC method. It is an indication that the data are not adequate to reach strong inference. What is to be done?? MULTIMODEL INFERENCE AND MODEL AVERAGING

Lecture 4 - Model Selection C. D. Canham Multimodel Inference If one model is clearly the best (wi>0.90) then inference can be made based on this best model. Weak strength of evidence in favor of one model suggests that a different dataset may support one of the alternate models. Designation of a single best model is often unsatisfactory because the “best” model is highly variable. We can compute a weighted estimate of the parameter and the predicted value using Akaike weights.

Akaike Weights and Multimodel Inference Lecture 4 - Model Selection C. D. Canham Akaike Weights and Multimodel Inference Estimate parameter values for the models with at least some measurable support. Estimate weighted average of parameters across those models. Only applicable to linear models. For non-linear models, we can generate weighted averages of the predicted response value for given values of the predictor variables.

Multimodel Inference: An example Lecture 4 - Model Selection C. D. Canham Multimodel Inference: An example Neighborhood models of tree growth: Can we use MMI to improve parameter estimates for individual terms in the model? (not easily, given non-linearities in this model) Can we use MMI to improve predictions of growth using a weighted suite of alternate models? (yes, but is it worth the effort?) See: Papaik, M. J., and C. D. Canham. 2006. Multi-model analysis of tree competition along environmental gradients in southern New England forests. Ecological Applications 16:1880-1892.

Summary: Steps in Model Selection Lecture 4 - Model Selection C. D. Canham Summary: Steps in Model Selection Develop candidate models based on biological knowledge. Take observations (data) relevant to predictions of the model. Use data to obtain MLE of parameters of the alternate models. Evaluate strength of evidence for alternate models using AIC and Akaike weights. …Multimodel Inference? Do you agree with Burnham and Anderson that MMI is generally preferable to “best-model inference”?

Bias and Uncertainty in Model Selection Lecture 4 - Model Selection C. D. Canham Bias and Uncertainty in Model Selection Model Selection Bias: Chance inclusion of meaningless variables in a model will produce a biased underestimate of the variance, and a corresponding exaggeration of the precision of the model (the problem with “fishing expeditions”) Model Selection Uncertainty: The fact that we are using data (with uncertainty) to both estimate parameters and to select the best model necessarily introduces uncertainty into the model selection process See discussion on pages 43-47 of Burnham and Anderson

Lecture 3 - Hypothesis Testing and Inference with Likelihood C. D. Canham The brave new world… Science is the development of simplified models as explanations (approximations) of reality… The “quality” of the explanation (the model) will be a balance of many factors (both quantitative and qualitative)