(& Generalized Linear Models)

Slides:



Advertisements
Similar presentations
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Advertisements

Analysis of Variance (ANOVA). Hypothesis H 0 :  i =  G H 1 :  i | (  i   G ) Logic S 2 within = error variability S 2 between = error variability.
Lecture Data Mining in R 732A44 Programming in R.
Multinomial Logistic Regression David F. Staples.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Logistic Regression and Generalized Linear Models:
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Assessing Survival: Cox Proportional Hazards Model
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Environmental Modeling Basic Testing Methods - Statistics III.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Logistic Regression and Odds Ratios Psych DeShon.
Logistic Regression. What is the purpose of Regression?
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Stats Methods at IC Lecture 3: Regression.
BINARY LOGISTIC REGRESSION
Logistic regression.
Data Analytics – ITWS-4600/ITWS-6600
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
EHS Lecture 14: Linear and logistic regression, task-based assessment
Chapter 12 Simple Linear Regression and Correlation
assignment 7 solutions ► office networks ► super staffing
Jefferson Davis Research Analytics
CHAPTER 7 Linear Correlation & Regression Methods
An Interactive Tutorial for SPSS 10.0 for Windows©
Notes on Logistic Regression
Logistic Regression CSC 600: Data Mining Class 14.
Logistic Regression Logistic Regression is used to study or model the association between a binary response variable (y) and a set of explanatory variables.
Mixed models and their uses in meta-analysis
Correlation and regression
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Generalized Linear Models (GLM) in R
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Welcome to the class! set.seed(843) df <- tibble::data_frame(
Chapter 12 Simple Linear Regression and Correlation
Regression Transformations for Normality and to Simplify Relationships
Basic Introduction LOGISTIC REGRESSION
Logistic Regression with “Grouped” Data
Logistic Regression.
Presentation transcript:

(& Generalized Linear Models) Nathaniel MacHardy Fall 2014

Why care about data management? Everything in R is technically data “classic” data: tables Model Results Functions

Cooking with Data Simple recipes we’ll learn today Subset according to some rule Code a variable into simpler categories Merge data on a key value (ID) Recode particular values of a variable

Recipe 1: Subset object.to.subset[subset instructions] By index vector[ c(1,2,3) ] data [ 2, 4 ] data [ 3, ] data [ , 2 ] By name cars[ c(“speed”,”dist”) ] By logical (1:3)[ c(TRUE,FALSE,TRUE) ] By expression cars$dist[ cars$speed== 7]

Operators: Generate TRUE/FALSE Feed these into square brackets! foods %in% c(“pizza”,”pie”) age > 18 eyecolor == “red”

Basic Functions: Combine/Transform c() combine data.frame() bind into data frame rbind() stack data merge() merge data by key t() swap rows/columns

Functions II: Shortcuts # the long way to classify cat <- rep(NA,length(cat)) # empty vector cat[values<30] <- “high” cat[values>=30 & values<70] <- “medium” cat[values>=70] <- “high” table(cat) high low medium 90 90 120

Recipe 2: Classifying Data with Cut cat <- cut(values,c(0,30,70,100)) table(cat) (0,30] (30,70] (70,100] 90 120 90 cat <- cut(values,c(0,30,70,100), labels=c(“low”,”med”,”high”)) # if desired

Recipe 3: Merge Data health <- data.frame( id=c(“a”,”b”,”c”), age=c(33,56,13) ) out <- data.frame( id=c(“b”,”c”,”a”), outcome=c(0,0,1)) merged <- merge(health,out,by=“id”)

Recipe 4: Replacing Data What if you want to recode part of an object? Easy: subset the “destination” of <- x <- c(1,NA,0,NA,3,NA,4,6) # replace NA x[is.na(x)] <- 0 y <- c(1,1,4,10,99,100) # clip range y[y>10] <- 10

Part II: Generalized Linear Models

Why care about Generalized Linear Models (GLMs)? Almost all Epi models are GLMs Odds ratios (logistic) Risk ratios (log-binomial) “Fancy” models are usually just extensions of GLMs GWR (geographically weighted regression) adds a “weights matrix” MCMC (Markov Chain Monte Carlo) solves GLMs with priors

Basic Parts of GLMs Y: Outcome, Dependent Variable X: Predictor(s), Covariates, Exposure(s), Independent Variable(s) Y = mX + b

Specifying a model in R A “formula” with Outcome on left Exposures on right Y ~ X

Example: Using glm() glm(uptake ~ Type) data(CO2); attach(CO2) Call: glm(formula = uptake ~ Type) Coefficients: (Intercept) TypeMississippi 33.54 -12.66 Degrees of Freedom: 83 Total (i.e. Null); 82 Residual Null Deviance: 9707 Residual Deviance: 6341 AIC: 607.6

Example: Using glm() glm(uptake ~ type + conc) data(CO2); attach(CO2) Call: glm(formula = uptake ~ Type + conc) Coefficients: (Intercept) TypeMississippi conc 25.83005 -12.65952 0.01773 Degrees of Freedom: 83 Total (i.e. Null); 81 Residual Null Deviance: 9707 Residual Deviance: 4056 AIC: 572.1

More info: glm() object m1 <- glm(uptake ~ type + conc) summary(m1) Coefficients: Estimate Std. Error t value Pr(>|t|) (intercept) 25.830052 1.579918 16.349 < 2e-16 *** Mississippi -12.659524 1.544261 -8.198 3.06e-12 *** conc 0.017731 0.002625 6.755 2.00e-09 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

More info: glm() object > confint(m1) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 22.7334700 28.92663347 TypeMississippi -15.6862201 -9.63282749 conc 0.0125859 0.02287528

Getting Epidemiological Measures Specify the family argument glm(y~x) # default linear, risk difference glm(y~x, family=binomial(“logit”) # odds ratio For odds ratio and risk ratio, make sure to exponentiate the output with exp() first R gives you the beta coefficient

Lab: Practice with GLMs You'll learn more about model-building in courses What goes in the model (confounders) Choosing a “best” model Practice data management Preparing data for analyses Formatting/extracting results