Download presentation
Presentation is loading. Please wait.
1
(& Generalized Linear Models)
Nathaniel MacHardy Fall 2014
2
Why care about data management?
Everything in R is technically data “classic” data: tables Model Results Functions
3
Cooking with Data Simple recipes we’ll learn today
Subset according to some rule Code a variable into simpler categories Merge data on a key value (ID) Recode particular values of a variable
4
Recipe 1: Subset object.to.subset[subset instructions]
By index vector[ c(1,2,3) ] data [ 2, 4 ] data [ 3, ] data [ , 2 ] By name cars[ c(“speed”,”dist”) ] By logical (1:3)[ c(TRUE,FALSE,TRUE) ] By expression cars$dist[ cars$speed== 7]
5
Operators: Generate TRUE/FALSE
Feed these into square brackets! foods %in% c(“pizza”,”pie”) age > 18 eyecolor == “red”
6
Basic Functions: Combine/Transform
c() combine data.frame() bind into data frame rbind() stack data merge() merge data by key t() swap rows/columns
7
Functions II: Shortcuts
# the long way to classify cat <- rep(NA,length(cat)) # empty vector cat[values<30] <- “high” cat[values>=30 & values<70] <- “medium” cat[values>=70] <- “high” table(cat) high low medium
8
Recipe 2: Classifying Data with Cut
cat <- cut(values,c(0,30,70,100)) table(cat) (0,30] (30,70] (70,100] cat <- cut(values,c(0,30,70,100), labels=c(“low”,”med”,”high”)) # if desired
9
Recipe 3: Merge Data health <- data.frame(
id=c(“a”,”b”,”c”), age=c(33,56,13) ) out <- data.frame( id=c(“b”,”c”,”a”), outcome=c(0,0,1)) merged <- merge(health,out,by=“id”)
10
Recipe 4: Replacing Data
What if you want to recode part of an object? Easy: subset the “destination” of <- x <- c(1,NA,0,NA,3,NA,4,6) # replace NA x[is.na(x)] <- 0 y <- c(1,1,4,10,99,100) # clip range y[y>10] <- 10
11
Part II: Generalized Linear Models
12
Why care about Generalized Linear Models (GLMs)?
Almost all Epi models are GLMs Odds ratios (logistic) Risk ratios (log-binomial) “Fancy” models are usually just extensions of GLMs GWR (geographically weighted regression) adds a “weights matrix” MCMC (Markov Chain Monte Carlo) solves GLMs with priors
13
Basic Parts of GLMs Y: Outcome, Dependent Variable
X: Predictor(s), Covariates, Exposure(s), Independent Variable(s) Y = mX + b
14
Specifying a model in R A “formula” with Outcome on left
Exposures on right Y ~ X
15
Example: Using glm() glm(uptake ~ Type) data(CO2); attach(CO2)
Call: glm(formula = uptake ~ Type) Coefficients: (Intercept) TypeMississippi Degrees of Freedom: 83 Total (i.e. Null); 82 Residual Null Deviance: Residual Deviance: AIC: 607.6
16
Example: Using glm() glm(uptake ~ type + conc) data(CO2); attach(CO2)
Call: glm(formula = uptake ~ Type + conc) Coefficients: (Intercept) TypeMississippi conc Degrees of Freedom: 83 Total (i.e. Null); 81 Residual Null Deviance: Residual Deviance: AIC: 572.1
17
More info: glm() object
m1 <- glm(uptake ~ type + conc) summary(m1) Coefficients: Estimate Std. Error t value Pr(>|t|) (intercept) < 2e-16 *** Mississippi e-12 *** conc e-09 *** Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
18
More info: glm() object
> confint(m1) Waiting for profiling to be done... 2.5 % % (Intercept) TypeMississippi conc
19
Getting Epidemiological Measures
Specify the family argument glm(y~x) # default linear, risk difference glm(y~x, family=binomial(“logit”) # odds ratio For odds ratio and risk ratio, make sure to exponentiate the output with exp() first R gives you the beta coefficient
20
Lab: Practice with GLMs
You'll learn more about model-building in courses What goes in the model (confounders) Choosing a “best” model Practice data management Preparing data for analyses Formatting/extracting results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.