Quiz 3
Model selection
Overview Objectives determine the “choice” of model Modeling for forecasting Likelihood ratio test Akaike Information Criterion (AIC) Bayes Information Criterion (BIC) Multi-model weighting
Readings Ecological Detective: pp Hobbs NT & Hilborn R (2006) Alternatives to statistical hypothesis testing in ecology: a guide to self teaching. Ecological Applications 16:5-19
Objectives of modelling and implications for model choice If we want to explore hypotheses: really complex models may be totally appropriate If we want to do forecasting: very simple models are usually best If we want to estimate uncertainty and do decision analysis: answer is less clear
Models for forecasting If we have too few parameters we can’t capture the underlying truth very well: error due to approximation If we have too many parameters we spend too much time fitting to minute vagaries in the data, and can’t estimate the parameters correctly: error due to estimation
Approximating the normal Sample a series of values from a normal distribution to create some simulated “data” Divide the data into equal intervals (“bins”) and use the mean within each bin to recreate the underlying distribution As we increase the number of bins (number of model parameters), there is a trade off between approximation and estimation error (This is a contrived example, the best model fit would be a normal distribution with parameters that are the mean and standard deviation of the data)
Approximation errorEstimation error Too few parameters to get a good fit to the truth. Too many parameters, model attempts to fit to every small deviation in the data. For more see Zucchini W (2000) An introduction to model selection. Journal of Mathematical Psychology 44:
Approximation error Estimation error Total error For more see Zucchini W (2000) An introduction to model selection. Journal of Mathematical Psychology 44:
Likelihood ratio test: nested models 4 parameters 3 parameters 2 parameters
Likelihood ratio test For nested models with more parameters, the log-likelihood (lnL) will usually be bigger (better fit). The likelihood ratio R (twice the difference between log-likelihood) follows a chi-square distribution with degrees of freedom equal to the difference in parameters between model A and model B.
Likelihood ratio test example B 1965 estimated (r, K, q, σ estimated) B 1965 = K (r, K, q, σ estimated) Modeln parslnL B 1965 = K B 1965 free Difference18.22 Likelihood ratio16.44 Chi-squared p-value In Excel, =CHISQ.DIST.RT(16.44,1) 2(lnL 1 -lnL 2 ) = 2( ) 9 Model selection.xlsx: sheet Likelihood ratio test
Problems with likelihood ratio Only valid for nested models so you cannot compare structurally different models Assumes the likelihoods are correct
Akaike Information Criterion (AIC) Origin of AIC: Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In International Symposium on Information Theory. 2nd Edn. Edited by B.N. Petran and F. Csaàki. Akadèemiai Kiadi, Budapest, Hungary. pp
AIC corrected for small samples AIC c Since the correction term approaches 0 as n increases, I advise always using AIC c Number of observations (data points) Number of model parameters Negative log-likelihood Correction term Origin of AIC c : Hurvich CM & Tsai C (1989) Regression and time series model selection in small samples. Biometrika 76:
Why use AIC c ? Can be used to compare structurally different models (non-nested models) Used to find weights to put on different models (evidence for models)
AIC c weights How different the log likelihood of model i is from the best model The weight assigned each alternative model by AIC c
Comparing models using AIC The model with the lower AIC (or AIC c ) is the better model Rule of thumb from Burnham and Anderson (2002): – ΔAIC ≤ 2: substantial support (evidence) – 4 ≤ ΔAIC ≤ 7: considerably less support – ΔAIC ≥ 10: essentially no support Classic book: Burnham KP & Anderson DR (2002) Model selection and multi-model inference. Springer, Berlin, Heidelberg, New York.
AIC review Number of model parameters Model i Data (same data for all models) Number of data points (same for all models) The very best model The relative weight to give to competing models Can replace AIC with AIC c For model M i
Note about AIC For AIC and all related information criteria, -lnL is the minimum value of the log likelihood, as evaluated at the best choice of parameters for that particular model. (i.e. at the maximum likelihood, or MLE, estimate)
AIC example B 1965 estimated (r, K, q, σ estimated) B 1965 = K (r, K, q, σ estimated) 9 Model selection.xlsx Model M i Params p i NLLAICΔAICexp(-0.5Δ i )Weight w i B 1965 free B 1965 =K Model M i Params p i Data nNLLAIC c ΔAIC c exp(-0.5Δ i )Weight w i B 1965 free B 1965 =K Usually do not report these columns since only relative values are meaningful AIC c AIC
The LRSG model (Lagged recruitment, survival, and growth model) Lagged Beverton-Holt recruitment depends on biomass L years ago; L = years from egg deposition until available to fishing Starting conditions, unfished Catches in year t Recruitment assumed to be Beverton-Holt with steepness h Combined survival and somatic growth Hilborn & Mangel (2002) Ecological Detective, pp
Find the biomass where surplus production is maximized (B MSY ) Substitute that into the model to get MSY LRSG model
Model M i ParamsNLLAIC c ΔAIC c Weight LogisticB 1965 free LogisticB 1965 =K LSRGB 1965 free LSRGB 1965 =K Logistic LSRG Conclusion: Best model is logistic with initial stock size free parameter, it has the lowest AIC c 9 Model selection.xlsx
AIC and different likelihoods AIC can be used to compare models with different likelihood functions, if all the constants are included in the likelihoods Ken Burnham Always constant Constant if σ known not estimated Burnham KP & DR Anderson (2002) Model selection and multi-model inference. Springer, New York. Section 6.7, pp
Key points on model selection Appropriate model depends on use Methods – Likelihood ratio for nested models – AIC c for different models – AIC c for models with different likelihoods provided all the constants like sqrt(2π)are included Issues – Are likelihoods correct Likelihood ratio and AIC c are developed from theory about model fit, not uncertainty
Bayes Information Criterion, BIC Model M i ParamsNLLBICΔBIC LogisticB 1965 free LogisticB 1965 = K LSRGB 1965 free LSRGB 1965 = K Burnham KP & DR Anderson (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods & Research 33: Number of model parameters Number of data points
Choosing a Criterion Options: R 2, LRT, other approaches Criteria: AIC, AICc, BIC, CAIC, CIC, DIC, EIC, FIC, GIC, HIC, ICOMP, JIC, KIC, NIC, OIC, PIC, QIC, QAIC, RIC, SIC, TIC, TAIC, WIC, YIC, ZIC, Bayes factor, Bayesian p-vals, Cross-validation Slide courtesy of Eric Ward, NOAA
Robustness and contradictory data
Robustness: Numerical Recipes: Chapter on “robust estimation” Contradictory data: Schnute JT & R Hilborn (1993) Analysis of contradictory data sources in fish stock assessment. CJFAS 50: Readings
Robustness In the real world, assumptions are not always met For instance, data may be mis-recorded, the wrong animal may be measured, the instrument may have failed, or some major assumption may have been wrong Outliers exist
Data 9 Robustness.xlsx
Data + contamination 9 Robustness.xlsx
Robust likelihoods
What is c?
Robust fit 9 Robustness.xlsx
Robust normal p = 1.00 p = 0.95p = 0.80