Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt.

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Analysis of variance and statistical inference.

Brief introduction on Logistic Regression

Correlation and regression

Forecasting Using the Simple Linear Regression Model and Correlation

Day 6 Model Selection and Multimodel Inference

Practical Model Selection and Multi-model Inference using R Modified from on a presentation by : Eric Stolen and Dan Hunt.

Econ 140 Lecture 81 Classical Regression II Lecture 8.

Bivariate Regression Analysis

1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce

Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.

Chapter 10 Simple Regression.

Statistics for Managers Using Microsoft® Excel 5th Edition

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.

Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.

Chapter Topics Types of Regression Models

Introduction to statistical estimation methods Finse Alpine Research Center, September 2010.

Linear Regression Example Data

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.

© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.

Pertemua 19 Regresi Linier

Chapter 7 Forecasting with Simple Regression

Introduction to Regression Analysis, Chapter 13,

Simple Linear Regression Analysis

Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.

Lecture 4 Model Selection and Multimodel Inference

Regression and Correlation Methods Judy Zhong Ph.D.

Eric D. Stolen InoMedic Health Applications, Ecological Program, Kennedy Space Center, Florida NASA Environmental Management Branch.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

The Method of Likelihood Hal Whitehead BIOL4062/5062.

Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.

BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.

OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.

© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.

Lecture 4 Model Selection and Multimodel Inference.

Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.

Research & Experimental Design Why do we do research History of wildlife research Descriptive v. experimental research Scientific Method Research considerations.

Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.

BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.

Discussion of time series and panel models

Regression Analysis Part C Confidence Intervals and Hypothesis Testing

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.

STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.

Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.

Lecture 10 Introduction to Linear Regression and Correlation Analysis.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Model Selection Information-Theoretic Approach UF 2015 (25 minutes) Outline: Why use model selection Why use model selection AIC AIC AIC weights and model.

Quiz 3. Model selection Overview Objectives determine the “choice” of model Modeling for forecasting Likelihood ratio test Akaike Information Criterion.

Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”

Statistics and probability Dr. Khaled Ismael Almghari Phone No:

Chapter 15 Multiple Regression Model Building

Lecture 4 Model Selection and Multimodel Inference

AP Statistics Chapter 14 Section 1.

Statistics for Managers using Microsoft Excel 3rd Edition

STAT120C: Final Review.

Model Comparison.

Simple Linear Regression

Lecture 4 Model Selection and Multimodel Inference

Lecture 4 Model Selection and Multimodel Inference

Wildlife Population Analysis

Presentation transcript:

Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt

Foundation: Theory, hypotheses, and models

Theory This is the link with science, which is about understanding how the world works

Theory “A set of propositions set out as an explanation.” “Theories are generalizations.” “Theories contain questions.” “Theories continually change…” (Ford, E. D Scientific Method for Ecological Research. Cambridge University Press.)

Theory Example 1 – Wading bird foraging: –Ideal Free Distribution –Marginal Value Theorem –Scramble Competition

Theory Example 2 – Indigo Snake Habitat selection –Animal perception –Evolutionary Biology –Population Demography

Hypotheses Many views – confusing! A hypothesis is a statement derived from scientific theory that postulates something about how the world works A testable hypothesis is a hypothesis that can be falsified by a contradiction between a prediction derived from the hypothesis and data measured in the appropriate way

Hypotheses To use the Information-theoretic toolbox, we must be able to state a hypothesis as a statistical model (or more precisely an equation which allows us to calculate the maximum likelihood of the hypothesis)

Multiple Working Hypotheses We operate with a set of multiple alternative hypotheses (models) The many advantages include safeguarding objectivity, and allowing rigorous inference. Chamberlain (1890) Strong Inference - Platt (1964) Karl Popper (ca. 1960)– Bold Conjectures

Deriving the model set This is the tough part (but also the creative part) much thought needed, so don’t rush collaborate, seek outside advice, read the literature, go to meetings… How and When hypotheses are better than What hypotheses (strive to predict rather than describe)

Models – Indigo Snake example Study of indigo snake habitat use Response variable: home range size ln(ha) SEX Land cover – 2-3 levels (lC2) weeks = effort/exposure Science question: “Is there a seasonal difference in habitat use between sexes?”

Models – Indigo Snake example SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2 Models – Indigo Snake example

SEX land cover weeks SEX + land cover SEX + weeks llc2 + weeks SEX + land cover + weeks SEX + land cover + SEX * land cover SEX + land cover + weeks +SEX * land cover Models – Indigo Snake example

Models – fish habitat use example Study of fish habitat use in salt marsh Response variable was density ln(fish m -2 +1) Habitat – vegetated or unvegetated Site – 7 impoundments Season – 4 seasons Science questions: –“Is there evidence for a difference in density between habitats?” –“Is there a seasonal difference in habitat use by resident marsh fish?”

Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

The importance of a priori thinking… You can’t go back home!

Modeling Trade-off between precision and bias Trying to derive knowledge / advance learning; not “fit the data” Relationship between data (quantity and quality) and sophistication of the model

Precision-Bias Trade-off Bias 2 Model Complexity – increasing umber of Parameters

Precision-Bias Trade-off Bias 2 variance Model Complexity – increasing umber of Parameters

Precision-Bias Trade-off Bias 2 variance Model Complexity – increasing umber of Parameters

Kullback-Leibler Information Basic concept from Information theory The information lost when a model is used to represent full reality Can also think of it as the distance between a model and full reality

Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3

Kullback-Leibler Information Truth / reality G 1 (best model in set) G2G2 G3G3 The relative difference between models is constant

Akaike’s Contributions Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K

Figured out how to estimate the relative K- L distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K Akaike’s Contributions

Figured out how to estimate the relative K- L distance between models in a set of models Figured out how to link maximum likelihood estimation theory with expected K-L information An (Akaike’s) Information Criteria AIC = -2 log e ( L {model i } | data) + 2K Akaike’s Contributions

I-T mechanics AICc i = -2*log e (Likelihood of model i given the data) + 2*K (n/(n-K-1)) or = AIC + 2*K*(K+1)/(n-K-1) (where K = the number of parameters estimated and n = the sample size)

I-T mechanics AICc min = AICc for the model with the lowest AICc value  i = AICc i – AICc min

I-T mechanics w i =Prob{g i | data}Model Probability (model probabilities) evidence ratio of model i to model j = w i / w j

I-T mechanics Least Squares Regression AIC = n log e (   ) + 2*K (n/(n-K-1)) Where    RSS / n (explain offset for constant part)

I-T mechanics Counting Parameters: K = number of parameters estimated Least Square Regression K = number of parameters + 2 (for intercept & 

I-T mechanics Counting Parameters: K = number of parameters estimated Logistic Regression K = number of parameters + 1 (for intercept 

I-T mechanics Counting Parameters: Non-identifiable parameters

Comparing Models

Combined model weight = 0.995

Comparing Models Evidence Ratio = 4.52

Comparing Models

Evidence Ratio = 3.03

Comparing Models Evidence Ratio =4.28 ( ) / ( )

Generalized Linear Models

Mathematical details Three parts to a GLM –Link function –linear equation –error distribution

Mathematical details General Linear Models – linear regression and ANOVA –Link function – Identity link –linear equation –error distribution – Normal Distribution (Gaussian) Y =   +  1 X 1 +  2 X 2 + 

Mathematical details Logistic Regression –Link function - Logit link: ln (  / (1-  ) ) –linear equation –error distribution – Binomial Distribution Logit(  ) =   +  1 X 1 +  2 X 2 + 

Mathematical details What types of models can be compared within a single I-T analysis? –Data must be fixed (including response) –Must be able to calculate maximum likelihood –(ways to deal with quasi-likelihood) –Models do not need to be nested –In some cases AIC is additive

Model Fitting Preliminaries Understanding the data/variables Avoid data dredging! safe data screening practices Detect outliers, scale issues, collinearity Tools in R

–Generalized linear models lm glm –Packages Design Package –FE Harrell Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer. CAR package –Fox, J An R and S-plus Companion to Applied Regression. Sage Publications.

Tools in R –Model formula Ex) –Output summary(model4) model4$aic Model4$coefficients model4 <- glm(help~age2 + sex + mom_dad + suburb + brdeapp + matepp + density + I(density^2), family=binomial,data=choices)

Tools in R Fitting the model set – –R program does the work Trouble-shooting Export results

Fish Example

Model Checking –Global model must fit –Models used for inference must meet assumptions, –Look for numerical problems Tools in R

Fish Example

Interpretation of I-T results

Interpretation of models for inference Case 1: One or a few models best models Examining model parameters and predictions –Effects –Prediction graphing results –nomograms –Presenting Results Anderson, D. R., W. A. Link, D. H. Johnson, and K. P. Burnham Suggestions for presenting the results of data analysis. Journal of Wildlife Management 65:

Tools Calculations in Excel AICc, Model weights, model likelihood, evidence ratios Sorting the models by evidence (exciting concept) Model weights, evidence ratios, relative variable importance

Fish Example

Model selection uncertainty Model-average prediction Model-average parameter estimates Multi-model Inference

Model Averaging Predictions

Model-averaged prediction Model Averaging Predictions

Prediction from model i Model Averaging Predictions

Weight model i Model Averaging Predictions

Model-averaged parameter estimate Model Averaging Parameters

Unconditional Variance Estimator

Snake Example

Multi-model Inference