Workshop in R and GLMs: #4 Diane Srivastava University of British Columbia

Slides:

Advertisements

Similar presentations

Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia

Advertisements

Lecture 11 (Chapter 9).

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Brief introduction on Logistic Regression

Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression.

HSRP 734: Advanced Statistical Methods July 24, 2008.

Logistic Regression Example: Horseshoe Crab Data

Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia

Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression.

Regression designs Growth rate Y 110 Plant size X1X1 X Y

Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Introduction the General Linear Model (GLM) l what “model,” “linear” & “general” mean l bivariate, univariate & multivariate GLModels l kinds of variables.

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.

Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.

Logistic Regression and Generalized Linear Models:

Simple Linear Regression

Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.

7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.

© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.

Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.

Analyzing Continuous and Categorical IVs Simultaneously Analysis of Covariance.

Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.

ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Statistical Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:

Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.

Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.

PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.

Generalized linear MIXED models

A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.

Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.

Multiple regression.

General Linear Model.

Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.

Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.

Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =

Statistics……revisited

Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.

© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.

Remembering way back: Generalized Linear Models Ordinary linear regression What if we want to model a response that is not Gaussian?? We may have experiments.

1 Forecasting/ Causal Model MGS Forecasting Quantitative Causal Model Trend Time series Stationary Trend Trend + Seasonality Qualitative Expert.

1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.

 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?

Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.

Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.

Logistic regression.

A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.

Logistic Regression APKC – STATS AFAC (2016).

Regression Scientific

CHAPTER 7 Linear Correlation & Regression Methods

S519: Evaluation of Information Systems

Generalized Linear Models (GLM) in R

Simple Linear Regression

Introduction to logistic regression a.k.a. Varbrul

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Quantitative Methods What lies beyond?.

CHAPTER 29: Multiple Regression*

Multiple comparisons - multiple pairwise tests - orthogonal contrasts

Statistics review Basic concepts: Variability measures Distributions

DCAL Stats Workshop Bodo Winter.

Scientific Practice Regression.

Regression designs Y X1 Plant size Growth rate 1 10

Quantitative Methods What lies beyond?.

Quadrat sampling Quadrat shape Quadrat size Lab Regression and ANCOVA

Regression designs Y X1 Plant size Growth rate 1 10

Presentation transcript:

Workshop in R and GLMs: #4 Diane Srivastava University of British Columbia

Exercise 1.Fit the binomial glm survival = size*treat 2. Fit the bionomial glm parasitism = size*treat 3. Predict what size has 50% parasitism in treatment “0”

Predicting size for p=0.5, treat=0 Output from logistic regression with logit link: predicted log e (p/1-p) = a+bx So when p=0.5, solve log(1)=a+bx

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** size <2e-16 *** treat size:treat What is equation for treat 0? treat 1?

Rlecture.csv 3.12

Model simplification 1.Parsimonious/ Logical sequence (e.g. highest order interactions first) 2. Stepwise sequence 3. Bayesian comparison of candidate models (not covered)

Plant size Logit parasitism Plant size Logit parasitism ANCOVA: Difference between categories…. Constant, doesn’t depend on size Depends on size size*treat ns size*treat sig

Deletion tests How to change your model quickly: model2<-update(model1,~.-size:treat) How to do a deletion test: anova(reduced model, full model, test="Chi") 1.Test for interaction in logit parasitism ANCOVA If not sig, remove and continue. If sig, STOP! 2. Test covariate If not sig, remove and continue. If sig, put back and continue 3. Test main effect

Code for “parasitism” analysis > ds<-read.table(file.choose(), sep=",", header=TRUE); ds > attach(ds) > par<-cbind(parasitism, 100-parasitism); par > m1<-glm(par~size*treat, data=ds, family=binomial) > summary(m1) > m2<-update(m1, ~.-size:treat) > summary(m2) > anova(m2,m1, test="Chi") > m3<-update(m2, ~.-size) > anova(m3,m2, test="Chi") > m3<-update(m2, ~.-treat) > anova(m3,m2, test="Chi")

Context (often) matters! What is the p-value for treat in: size+treat? treat? Stepwise regression: step(model)

Jump height (how high ball can be raised off the ground) Feet off ground Total SS = 11.11

X variableparameterSSF 1,13 p Height < of player

X variableparameterSSp Weight < of player F 1,13

Why do you think weight is + correlated with jump height?

An idea Perhaps if we took two people of identical height, the lighter one might actually jump higher? Excess weight may reduce ability to jump high…

lighter heavier X variableparameterSSF p Height < Weight <0.0001

Heavy people often tall (tall people often heavy) Tall people can jump higher People light for their height can jump a bit more Weight Height Jump + + -

Species.txt Rothamsted Park Grass experiment started in 1856

Exercise (species.txt) diane<-read.table(file.choose(), header=T); diane; attach(diane) Univariate trends: plot(Species~Biomass) plot(Species~pH) Combined trends: plot(Species~Biomass, type="n"); points(Species[pH=="high"]~Biomass[pH=="high"]); points(Species[pH=="mid"]~Biomass[pH=="mid"], pch=16); points(Species[pH=="low"]~Biomass[pH=="low"], pch=0)

Exercise (species.txt) 1. With a normal distribution, fit pH*Biomass check model dignostics test interaction for significance 2. With a poisson distribution, fit pH *Biomass check model dignostics test interaction for significance

Moral of the story: Make sure you KNOW what you are modelling!

Exercise (species.txt) 1. Fit glm: Species~pH, family=gaussian 2. Test if low and mid pH have the same effect this is a planned comparison

Further reading Statistics: An Introduction using R (M.J. Crawley, Wiley publishers) Extending the linear model with R (JJ Faraway, Chapman & Hall/CRC)

Code for “Species” analysis > m1<-glm(Species~pH*Biomass, family=gaussian, data=diane) > summary(m1) > m2<-update(m1, ~.-pH:Biomass) > anova(m2,m1, test="Chi") > par(mfrow=c(2,2)); plot(m1) > m3<-glm(Species~pH*Biomass, family=poisson, data=diane) > m4<-update(m3, ~.-pH:Biomass) > anova(m4,m3, test="Chi") > par(mfrow=c(2,2)); plot(m3) >PH<-(pH!="high")+0 > m5<-glm(Species~pH, family=gaussian, data=diane) > m6<-update(m5, ~.-pH+PH) > anova(m6,m5, test="Chi")