Model Comparison.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Statistical Estimation and Sampling Distributions
Day 6 Model Selection and Multimodel Inference
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
The Simple Regression Model
Topic 2: Statistical Concepts and Market Returns
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Correlation and Regression Analysis
Lecture 4 Model Selection and Multimodel Inference
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
The Triangle of Statistical Inference: Likelihoood
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Probability Distributions and Dataset Properties Lecture 2 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.
Introduction to Linear Regression
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Lecture 4 Model Selection and Multimodel Inference.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 15 Multiple Regression Model Building
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Lecture 4 Model Selection and Multimodel Inference
Chapter 4. Inference about Process Quality
8-1 of 23.
Inference and Tests of Hypotheses
Chapter 11: Simple Linear Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Further Inference in the Multiple Regression Model
Statistics in MSmcDESPOT
The Maximum Likelihood Method
CJT 765: Structural Equation Modeling
Chapter 11 Simple Regression
Understanding Standards Event Higher Statistics Award
Correlation and Simple Linear Regression
Statistical Process Control
The Maximum Likelihood Method
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Model Comparison: some basic concepts
Correlation and Simple Linear Regression
Integration of sensory modalities
One-Way Analysis of Variance
Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Lecture 4 Model Selection and Multimodel Inference
Mathematical Foundations of BME Reza Shadmehr
Chapter 9 Hypothesis Testing: Single Population
Lecture 4 Model Selection and Multimodel Inference
Multiple Regression Berlin Chen
Applied Statistics and Probability for Engineers
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
Wildlife Population Analysis
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Model Comparison

Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?” Strength of evidence (support) for a model is relative: Relative to other models: As models improve, support may change. Relative to data at hand: As the data improve, support may change.

Assessing alternative models Likelihood ratio tests. Akaike’s Information Criterion (AIC).

Recall the Likelihood Axiom “Within the framework of a statistical model, a set of data supports one statistical hypothesis better than other if the likelihood of the first hypothesis, on the data, exceeds the likelihood of the second hypothesis”. (Edwards 1972)

Likelihood ratio tests Statistical evidence is only relative, that is, it only applies to one model (hypothesis) in comparison with another. The likelihood ratio LA(x) /LB(x) measures the strength of evidence favoring hypothesis A over hypothesis B. Likelihood ratio tests tell us something about the strength of evidence for one model vs. another. If the ratio is very large, hypothesis A did a much better than B in predicting which value X would take, and the observation X=x is very strong evidence for A versus B. Likelihood ratio tests apply to pairs of hypotheses tested using the same dataset.

Likelihood ratio tests Ratios of log-likelihoods (R) follow a chi-square distribution with degrees of freedom equal to the difference in the number of parameters between models A and B.

An example The Data: xi = measurements of DBH on 50 adult trees yi = measurements of crown radius on those trees The Scientific Models: yi = b xi + e Linear relationship, with 1 parameters (b) and an error term (e). yi = a + b xi + e Linear relationship, with 2 parameters (a, b) and an error term (e). yi = a + b + λ xi2 +e Non-linear relationship with three parameters and an error term (e). The Probability Model: e is normally distributed, with mean = E[X] and variance estimated from the observed variance of the residuals.

Procedure Initialize parameter estimates. Using parameter estimation routine, find the parameter values that maximize the likelihood given the model and a normal error structure. Calculate difference in likelihood between models. Conduct likelihood ratio tests. Choose best model of the three candidate models.

Remember Parsimony The question is: Is the more complicated model BETTER than the simpler model?

Results For model 2 to be better than model 1, twice the difference in likelihoods between the models must be greater than the value of the chi-square distribution with 1 degree of freedom at p = 0.05 χ2(df 1) = 3.84 χ2(df 2) = 5.99

Results Model 2 > Model 3 > Model 1

This is from day 2, the exercise that you did

Model comparison Is the two population model better?

Model selection when probability distributions differ by model The model selection framework allows error structure to vary over models included in the set of candidate models but….. No component part of the likelihood function can be dropped. Scientific model must remain constant over the models being compared. We must adjust for the different number of parameters in each probability model.

An example The Data: xi = measurements of DBH on 50 trees yi = measurements of crown radius on those trees The Scientific Model: yi = a + b xi + e [2 parameters (a, b)] The Probability Models: is normally distributed, with E[x] predicted by the model and variance estimated from the observed variance of the residuals. is lognormally distributed, with E[x] predicted by the model and variance estimated from the observed variance of the residuals.

Back to the example The normal and lognormal have an equal number of parameters so we can compare the likelihoods directly. In this case, the normal probability model is supported by the data.

A second example The Data: xi = measurements of DBH on 50 trees. yi = counts of seedlings produced by trees. The Scientific Model: yi = STR*(DBH/30)b + e exponential relationship, with 1 parameter (b) and an error term (e) The Probability Models: Data follow a Poisson distribution, with E[x] and variance = λ Data follow a Negative binomial distribution with E[x] =m and variance = m + m2/k where k is the clumping parameter.

Back to the example The binomial requires estimation of one extra parameter, k generally known as the clumping parameter. Thus, twice the difference in likelihoods between the two models must be greater than χ2(df 1) = 3.84.

Information theory Information theory Probability Communication Statistics Economics Mathematics Computer Science Physics Communication

Kullblack-Leibler Information (aka. distance between 2 models) If f(x) denotes reality, we can calculate the Information lost when we use g(x) to approximate reality as: This number is the distance between reality and the model.

Interpretation of Kullblack-Leibler Information (aka Interpretation of Kullblack-Leibler Information (aka. distance between 2 models) 5 10 15 20 GAMMA 130 260 390 520 650 Count f(x) Truth 5 10 15 20 WEIBULL 130 260 390 520 650 g2(x) 5 10 15 20 LOGNORMAL 130 260 390 520 650 Approximations to truth g1(x) Measures the (asymmetric) distance between two models. Minimizing the information lost when using g(x) to approximate f(x) is the same as maximizing the likelihood.

Kullblack-Leibler Information and Truth TRUTH IS A CONSTANT Then, the relative directed distance between truth and model g

Interpretation of Kullblack-Leibler Information (aka Interpretation of Kullblack-Leibler Information (aka. distance between 2 models) Minimizing KL is the same as maximizing entropy. We want a model that does not respond to randomness but does respond to information. We maximize entropy subject to the constraints of the model used to capture information in the data. By maximizing entropy, subject to a constraint, we leave only the information supported by the data. The model does not respond to noise

Akaike’s Information Criterion Akaike defined “an information criterion” that related K-L distance and the maximized log-likelihood as follows: This is an estimate of the expected, relative distance between the fitted model and the unknown true mechanism that generated the observed data. K=number of estimable parameters ^

Information and entropy (noise)

A refresher on Shannon’s diversity index So you have been exposed to entropy theory when you looked at Shannon’s diversity index. This means in the blue it means that you maximize diversity (right) when an individual has equal probability of belonging to one species.

AIC and statistical entropy

Akaike’s Information Criterion AIC has a built in penalty for models with larger numbers of parameters. Provides implicit tradeoff between bias and variance. ^

Akaike’s Information Criterion We select the model with smallest value of AIC. This is the model “closest” to full reality from the set of models considered. Models not in the set are not considered. AIC will select the best model in the set, even if all the models are poor! It is the researcher’s (your) responsibility that the set of candidate models includes well founded, realistic models.

Akaike’s Information Criterion ^ Estimates the expected, relative distance between the fitted model and the unknown true mechanism that generated the observed data. The best model is the one with the lowest AIC. K = number of estimable parameters Built-in penalty for greater number of parameters.

AIC and small samples Unless the sample size (n) is large with respect to the number of estimated parameters (K), use of AICc is recommended. Generally, you should use AICc when the ratio of n/K is small (less than 40). Use AIC or AICc consistently in an analysis rather than mix the two criteria. Use the value of K for the global (most complicated model).

Some Rough Rules of Thumb Differences in AIC (Δi’s) can be used to interpret strength of evidence for one model vs. another. A Δ value within 1-2 of the best model has substantial and should be considered along with the best model. A Δ value within 4-7 of the best model has considerably less support. A Δ value > 10 that of the best model has virtually no support and can be omitted from further consideration.

Akaike weights Akaike weights (wi) are considered as the weight of evidence in favor of model i being the actual best model for the situation at hand given that one of the R models must be the best model for that set of R models. where Akaike weights for all set of models considered should add up to 1.

Uses of Akaike weights “Probability” that the candidate model is the best model. Relative strength of evidence (evidence ratios). Variable selection—which independent variable has the greatest influence? Model averaging.

An example The Data: xi = measurements of DBH on 50 trees yi = measurements of crown radius on those trees The Scientific Models: yi = b xi + e [1 parameter (b)] yi = a + b xi + e [2 parameters (a, b)] yi = a + b xi + γ xi2 + e [3 parameters (a, b, γ )] The Probability Model: e is normally distributed, with mean = 0 and variance estimated from the observed variance of the residuals...

Back to the example Akaike weights can be interpreted as the estimated probability that model i is the best model for the data at hand, given the set of models considered. Weights > 0.90 indicate strong inferences can be made using just that model.

Evidence ratios Evidence ratios represent the evidence about fitted models as to which is better in an information sense. These ratios do not depend on the full set of models.

Strength of evidence: AIC ratios Very strong evidence that models 2 and 3 are better models than model 1 but the ratio of model 2 to 3 is low suggesting data do not support strong inference.

Akaike weights and relative variable importance Estimates of relative importance of predictor variables can be made by summing the w of variables across all the models where the variables occur. Variables can be ranked using these sums. The larger this sum of weights, the more important the variable is.

MULTIMODEL INFERENCE AND MODEL AVERAGING Ambivalence The inability to identify a single best model is not a defect of the AIC method. It is an indication that the data are not adequate to reach strong inference. What is to be done?? MULTIMODEL INFERENCE AND MODEL AVERAGING

Strength of evidence: AIC ratios Hard to choose between model 2 and model 3 because of the low value of evidence ratio.

Multimodel Inference If one model is clearly the best (wi>0.90) then inference can be made based on this best model. Weak strength of evidence in favor of one model suggests that a different dataset may support one of the alternate models. Designation of a single best model is often unsatisfactory because the “best” model is highly variable. We can compute a weighted estimate of the parameter using Akaike weights.

Akaike Weights and Multimodel Inference Estimate parameter values for the two most likely models. Estimate weighted average of parameters across supported models. Only applicable to linear models (Jensen’s ineq). For non-linear models, we can average the predicted response value for given values of the predictor variables.

Akaike Weights and Multimodel Inference Estimate of parameter A = (0.73*1.04) +(0.27*1.31)= 1.11 Estimate of parameter B = (0.73*2.1) +(0.27*1.2)= Estimate of parameter C= (0.73*0) +(0.27*3)=

Model uncertainty Different datasets are likely to yield different parameter estimates. Variance around parameter estimates is calculated using the dataset at hand and is an underestimate of the true variance because it does not consider model uncertainty. Ideally, inferences should not be limited to one particular dataset. Can we make inferences that are applicable to a larger number of datasets?

Techniques to deal with model uncertainty Theoretical: Monte Carlo simulations. Empirical: Bootstrapping Use Akaike weights to calculate unbiased variance for the parameter estimates.

Summary: Steps in Model Selection Develop candidate models based on biological knowledge. Take observations (data) relevant to predictions of the model. Use data to obtain MLE of parameters. Evaluate evidence using AIC. Evaluate estimates of parameters relative to direct measurements. Are they reasonable and realistic?