Generalized Linear Models II Distributions, link functions, diagnostics (linearity, homoscedasticity, leverage)

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Advertisements

Chapter 2 Describing Contingency Tables Reported by Liu Qi.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
Discrete Probability Distributions
How do I know which distribution to use?
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Modeling Wim Buysse RUFORUM 1 December 2006 Research Methods Group.
Final Review Session.
Log-linear and logistic models
Stat 112: Notes 2 This class: Start Section 3.3. Thursday’s class: Finish Section 3.3. I will and post on the web site the first homework tonight.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Logistic Regression and Generalized Linear Models:
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Notes – Chapter 17 Binomial & Geometric Distributions.
Binomial Distributions Calculating the Probability of Success.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062.
Linear Model. Formal Definition General Linear Model.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Lec. 08 – Discrete (and Continuous) Probability Distributions.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Notes – Chapter 17 Binomial & Geometric Distributions.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Probability distributions and likelihood
Covariance/ Correlation
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models.
Discrete Probability Distributions
Generalized Linear Models
Chapter 12: Regression Diagnostics
Generalized Linear Models
Covariance/ Correlation
Covariance/ Correlation
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Scatter Plots of Data with Various Correlation Coefficients
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
When You See (This), You Think (That)
Major Topics first semester by chapter
Covariance/ Correlation
Major Topics first semester by chapter
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Generalized Additive Model
Statistics 101 Chapter 8 Section 8.1 c and d.
Presentation transcript:

Generalized Linear Models II Distributions, link functions, diagnostics (linearity, homoscedasticity, leverage)

Dichotomous key: picking a distribution for your data

Discrete or continuous? Possible values: 0/1 or 0,1,2,… etc. Binomial (logistic regression) 0/1 Range of data -  to +  0,1,2,… Discrete Continuous Poisson or Binomial Normal Gamma or Inverse-Gaussian >0 to +  Check for overdispersion Poisson ok Resid. deviance ~= Resid. df (~  n-p ) Compare fit w/ quasi-poisson or Quasi-binomial or negative binomial Resid. deviance >> Resid. df (~  n-p ) Check residuals for normality Check s.dev. residuals for normality If distributional checks fail examine the data/residuals and try to determine source of deviance! Bimodality? Linearity? Fat tails? Excess zeros? Check Resid. deviance = Resid. df (~  n-p ) again and compare s.dev. resids to normality Common distributions (But see next slide for others And additional details)

Possible values: 0/1 Bernoulli(successs/failure, logistic regresion?) -  to +  Discrete Continuous Geometric (# trials to 1 st success) Poisson (#successes in large # trials) Negative Binomial (#trials to n th success or over-dispersed Poisson) Exponential(time to 1 st success) Gamma(time to n th success) Inverse-Gaussian( 1/x is normal) >0 to +  0,1,2,… infinity Normal Binomial (# successes in fixed # trials) Multinomial(more than 2 categories, fixed # trials) 0,1,2,… N (known) 0 to 1 Beta(fraction of total, proportions) Check out Wikipedia pages for each distribution for more info!

As sample sizes get large, many distributions converge on the normal distribution See, e.g. stribution stribution

Group exercise Get a partner Describe a real dataset to your partner Partner picks a potentially appropriate distribution Switch roles Repeat!

Link Functions Enforce appropriate range for expected response (e.g. 0,1 for ‘probability of success’, >0 for counts, etc) Linearize relationship between expected response and predictors G(E(y)) = b 0 + b 1 x 1 + b 2 x 2 + etc Be careful to interpret coefficients properly given a link function! E(y) =G -1 ( b 0 + b 1 x 1 + b 2 x 2 + etc) E.g. LinkConstraintInverse LogE(y)>0 LogitE(y) in (0,1) See Table 15.1 in GLM chapter for lots more!

Canonical link functions

Sample problems for count data Binomial vs. poisson 202/poiss_bin.html

Leverage (see diagnostic plots & websites on next slide) Xxx et al 2006 PLoS Biology

R: example GLM with data #read in data bd=read.csv("c:/marm/teaching/293qe/bat_lambda.csv") str(bd);head(bd) #What not to do- run models blindly! b1=glm(Lambda~PreWNS_Pop,family=Gamma,data=bd);summary(b1) #What to do - plot data plot(Lambda~PreWNS_Pop,data=bd) #What does it suggest would be a good idea? bd$Lpop=log(bd$PreWNS_Pop) plot(Lambda~Lpop,data=bd) b1=glm(Lambda~Lpop,family=Gamma,data=bd);summary(b1) b2=glm(Lambda~Lpop+Species,family=Gamma,data=bd);summary(b2) b3=glm(Lambda~Lpop*Species,family=Gamma,data=bd);summary(b3) anova(b1,b2,b3,test="Chisq") AIC(b1,b2,b3) plot(b3)