Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Correlation and regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Part V The Generalized Linear Model Chapter 16 Introduction.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Final Review Session.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Gl
Simple Linear Regression Analysis
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Contrasts Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Logistic Regression and Generalized Linear Models:
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Comparing Two Samples Harry R. Erwin, PhD
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Generalized Linear Models II Distributions, link functions, diagnostics (linearity, homoscedasticity, leverage)
Summary of Remainder Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Experimental Design and Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Linear Model. Formal Definition General Linear Model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Analysis of Residuals ©2005 Dr. B. C. Paul. Examining Residuals of Regression (From our Previous Example) Set up your linear regression in the Usual manner.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Binary Response Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Multiple Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Inference Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Chapter 17.1 Poisson Regression Classic Poisson Example Number of deaths by horse kick, for each of 16 corps in the Prussian army, from 1875 to 1894.
Remembering way back: Generalized Linear Models Ordinary linear regression What if we want to model a response that is not Gaussian?? We may have experiments.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
More on data transformations No recipes, but some advice.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Basic Estimation Techniques
Generalized Linear Models
Analysis of Variance Harry R. Erwin, PhD
Log Linear Modeling of Independence
CHAPTER 29: Multiple Regression*
What is Regression Analysis?
Chapter 14 Inference for Regression
Presentation transcript:

Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Freund, RJ, and WJ Wilson (1998) Regression Analysis, Academic Press. Gentle, JE (2002) Elements of Computational Statistics. Springer. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

Introduction These four demonstration sessions of this class address special types of data: –Counts –Proportions –Survival analysis –Binary responses

Frequencies and Proportions With frequency data, we know how often something happened, but not how often it didn’t happen. With proportion data (next week), we know how often it didn’t happen.

Count Data Linear regression assumes constant variance and normal errors. This is not appropriate for count data: 1.Counts are non-negative. 2.Response variance usually increases with the mean. 3.Errors are not normally distributed. 4.Zeros are hard to transform.

Handling Count Data in R Use a glm with family=poisson. –This sets errors to Poisson, so variance is proportional to the mean. –This sets link to log, so fitted values are positive. Book example If you have overdispersion (residual deviance greater than residual degrees of freedom), use family=quasipoisson.

Analysis of Count Data Book example (230ff) –Use of table() –Use of tapply() –fitting the glm with family = poisson. –refitting with family = quasipoisson. –three and four-way interactions –model simplification –documentation

Contingency Tables Risk of data aggregation over important explanatory variables (nuisance variables) Book example (234ff) –The saturated model –Remove the N-way interaction and see if it was significant. –If the N-way interaction is significant, go no further. –Then remove the scientifically interesting interaction and see if it is significant. –You have to check the nuisance variables first!

ANCOVA with Counts Book example (237ff) –plotting and use of split to gain insight. –analysis—testing for the need for different slopes. –use of predict() to draw lines through the plot.

Frequency Distributions Book example (240ff) –testing for independence –use of table() –use of dpois() –plotting and interpretation use the negative binomial distribution for data with variance much greater than the mean use the binomial distribution for data with variance less than the mean