Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt.

Similar presentations


Presentation on theme: "Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt."— Presentation transcript:

1 Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt

2 Foundation: Theory, hypotheses, and models

3 Theory This is the link with science, which is about understanding how the world works

4 Theory “A set of propositions set out as an explanation.” “Theories are generalizations.” “Theories contain questions.” “Theories continually change…” (Ford, E. D. 2000. Scientific Method for Ecological Research. Cambridge University Press.)

5 Theory Example 1 – Wading bird foraging: –Ideal Free Distribution –Marginal Value Theorem –Scramble Competition

6 Theory Example 2 – Indigo Snake Habitat selection –Animal perception –Evolutionary Biology –Population Demography

7 Hypotheses Many views – confusing! A hypothesis is a statement derived from scientific theory that postulates something about how the world works A testable hypothesis is a hypothesis that can be falsified by a contradiction between a prediction derived from the hypothesis and data measured in the appropriate way

8 Hypotheses To use the Information-theoretic toolbox, we must be able to state a hypothesis as a statistical model (or more precisely an equation which allows us to calculate the maximum likelihood of the hypothesis)

9 Multiple Working Hypotheses We operate with a set of multiple alternative hypotheses (models) The many advantages include safeguarding objectivity, and allowing rigorous inference. Chamberlain (1890) Strong Inference - Platt (1964) Karl Popper (ca. 1960)– Bold Conjectures

10 Deriving the model set This is the tough part (but also the creative part) much thought needed, so don’t rush collaborate, seek outside advice, read the literature, go to meetings… How and When hypotheses are better than What hypotheses (strive to predict rather than describe)

11 Models – Indigo Snake example Study of indigo snake habitat use Response variable: home range size ln(ha) SEX Land cover – 2-3 levels (lC2) weeks = effort/exposure Science question: “Is there a seasonal difference in habitat use between sexes?”

12 Models – Indigo Snake example SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

13 SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2 Models – Indigo Snake example

14 SEX land cover weeks SEX + land cover SEX + weeks llc2 + weeks SEX + land cover + weeks SEX + land cover + SEX * land cover SEX + land cover + weeks +SEX * land cover Models – Indigo Snake example

15 Models – fish habitat use example Study of fish habitat use in salt marsh Response variable was density ln(fish m -2 +1) Habitat – vegetated or unvegetated Site – 7 impoundments Season – 4 seasons Science questions: –“Is there evidence for a difference in density between habitats?” –“Is there a seasonal difference in habitat use by resident marsh fish?”

16 Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

17 Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

18 Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

19 The importance of a priori thinking… You can’t go back home!

20 Modeling Trade-off between precision and bias Trying to derive knowledge / advance learning; not “fit the data” Relationship between data (quantity and quality) and sophistication of the model

21 Precision-Bias Trade-off Bias 2 Model Complexity – increasing umber of Parameters

22 Precision-Bias Trade-off Bias 2 variance Model Complexity – increasing umber of Parameters

23 Precision-Bias Trade-off Bias 2 variance Model Complexity – increasing umber of Parameters

24 Kullback-Leibler Information Basic concept from Information theory The information lost when a model is used to represent full reality Can also think of it as the distance between a model and full reality

25 Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

26 Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

27 Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

28 Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3 The relative difference between models is constant

29 Akaike’s Contributions Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K

30 Figured out how to estimate the relative K- L distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K Akaike’s Contributions

31 Figured out how to estimate the relative K- L distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K Akaike’s Contributions

32 I-T mechanics AICc i = -2*log e (Likelihood of model i given the data) + 2*K (n/(n-K-1)) or = AIC + 2*K*(K+1)/(n-K-1) (where K = the number of parameters estimated and n = the sample size)

33 I-T mechanics AICc min = AICc for the model with the lowest AICc value  i = AICc i – AICc min

34 I-T mechanics w i =Prob{g i | data}Model Probability (model probabilities) evidence ratio of model i to model j = w i / w j

35 I-T mechanics Least Squares Regression AIC = n log e (   ) + 2*K (n/(n-K-1)) Where    RSS / n (explain offset for constant part)

36 I-T mechanics Counting Parameters: K = number of parameters estimated Least Square Regression K = number of parameters + 2 (for intercept & 

37 I-T mechanics Counting Parameters: K = number of parameters estimated Logistic Regression K = number of parameters + 1 (for intercept 

38 I-T mechanics Counting Parameters: Non-identifiable parameters

39 Comparing Models

40 Combined model weight = 0.995

41 Comparing Models Evidence Ratio = 4.52

42 Comparing Models

43 Evidence Ratio = 3.03

44 Comparing Models Evidence Ratio =4.28 (.34+.22+.14+.08) / (.11+.04+.02+.01)

45 Generalized Linear Models

46 Mathematical details Three parts to a GLM –Link function –linear equation –error distribution

47 Mathematical details General Linear Models – linear regression and ANOVA –Link function – Identity link –linear equation –error distribution – Normal Distribution (Gaussian) Y =   +  1 X 1 +  2 X 2 + 

48 Mathematical details Logistic Regression –Link function - Logit link: ln (  / (1-  ) ) –linear equation –error distribution – Binomial Distribution Logit(  ) =   +  1 X 1 +  2 X 2 + 

49 Mathematical details What types of models can be compared within a single I-T analysis? –Data must be fixed (including response) –Must be able to calculate maximum likelihood –(ways to deal with quasi-likelihood) –Models do not need to be nested –In some cases AIC is additive

50 Model Fitting Preliminaries Understanding the data/variables Avoid data dredging! safe data screening practices Detect outliers, scale issues, collinearity Tools in R

51 –Generalized linear models lm glm –Packages Design Package –FE Harrell. 2001. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer. CAR package –Fox, J. 2002. An R and S-plus Companion to Applied Regression. Sage Publications.

52 Tools in R –Model formula Ex) –Output summary(model4) model4$aic Model4$coefficients model4 <- glm(help~age2 + sex + mom_dad + suburb + brdeapp + matepp + density + I(density^2), family=binomial,data=choices)

53 Tools in R Fitting the model set – –R program does the work Trouble-shooting Export results

54 Fish Example

55 Model Checking –Global model must fit –Models used for inference must meet assumptions, –Look for numerical problems Tools in R

56 Fish Example

57 Interpretation of I-T results

58 Interpretation of models for inference Case 1: One or a few models best models Examining model parameters and predictions –Effects –Prediction graphing results –nomograms –Presenting Results Anderson, D. R., W. A. Link, D. H. Johnson, and K. P. Burnham. 2001. Suggestions for presenting the results of data analysis. Journal of Wildlife Management 65:373-378.

59 Tools Calculations in Excel AICc, Model weights, model likelihood, evidence ratios Sorting the models by evidence (exciting concept) Model weights, evidence ratios, relative variable importance

60 Fish Example

61 Model selection uncertainty Model-average prediction Model-average parameter estimates Multi-model Inference

62 Model Averaging Predictions

63 Model-averaged prediction Model Averaging Predictions

64 Prediction from model i Model Averaging Predictions

65 Weight model i Model Averaging Predictions

66 Model-averaged parameter estimate Model Averaging Parameters

67 Unconditional Variance Estimator

68

69 Snake Example

70 Multi-model Inference

71


Download ppt "Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt."

Similar presentations


Ads by Google