Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jefferson Davis Research Analytics

Similar presentations


Presentation on theme: "Jefferson Davis Research Analytics"— Presentation transcript:

1 Jefferson Davis Research Analytics
R Bootcamp Day 3 Jefferson Davis Research Analytics

2 Day 2 stuff From yesterday and the day before
R values have types/classes such as numeric, character, logical, dataframes, and matrices. Much of R functionality is in libraries For help on a function run ? t.test() from the R console. The plot() function will usually do something useful.

3 R: Common stats functions
Common statistical tests are very straightforward in R. Let's try one on yesterday's dataset cars of car speeds and stopping distances from the 1920s. head(cars) speed dist

4 R: Common stats functions
Here's a t-test that the mean of the speeds in cars is not 12. t.test(cars$speed, mu=12) One Sample t-test data: cars$speed t = , df = 49, p-value = 3.588e-05 alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x

5 R: Common stats functions
We can change the parameters of t-test. t.test(cars$speed, mu=12, alternative="less", conf.level=.99) One Sample t-test data: cars$speed t = , df = 49, p-value = 1 alternative hypothesis: true mean is less than percent confidence interval: Inf sample estimates: mean of x

6 R: Common stats functions
Anything you would see in a year long stats sequence will have an implentation in R. chisq.test() #Chi-squared prop.test() #Proportions test binom.test() #Exact binomial test ks.test() #Kolmogorov–Smirnov sd() #Standard deviation cor() #Correlation

7 R: Linear regression Regression analysis is one of the most popular and important tools in statistics. If R goofed here, it would be worthless. R uses the function lm() for linear models. The regression formula is given in Wilkinson-Rogers notation Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2) (The letter I)

8 R: Linear regression Regression analysis is one of the most important tools in statistics. R uses Wilkinson-Rogers notation to to specify linear models. So a model such as yi = β0 + β1 xi1 + εi Shows up in the R syntax as y ~ x1 Let's review this syntax. (Tables from

9 R: Linear regression Predictor terms Wilkinson Notation Intercept
1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2)

10 R: Linear regression yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors
Model Wilkinson Notation yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors y ~ x1 + x2 yi = β1 xi1 + β2 xi2 + εi Two predictors and no intercept y ~ x1 + x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors with the interaction term y ~ x1 * x2 y ~ x1 + x2 + x1:x2 yi = β0 + β1 (xi1 + xi2 ) + εi Regressing on the sum of predictors y ~ I(x1 + x2) yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + εi Three predictors with one interaction y ~ x1 * x2 + x3

11 R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi β7 xi1 xi2xi3+ εi Three predictors, all interaction terms yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi εi Three predictors, all two-way interaction terms.

12 R: Linear regression Model terms Wilkinson Notation
yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept y ~ x1*x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi β7 xi1 xi2xi3+ εi Three predictors, all interaction terms y ~ x1 * x2 * x3 yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi εi Three predictors, all two-way interaction terms y ~ x1 * x2 * x3 – x1:x2:x3

13 R: Linear regression R uses the function lm() for linear models.
Generic syntax lm(DV ~ IV1, NAME_OF_DATAFRAME) The above tells R that to regress the dependent variable (DV) onto independent variable IV1. We can include other variables and interaction effects. lm(DV ~ IV1 + IV2 + IV1*IV2, NAME_OF_DATAFRAME)

14 R: Linear regression Let's do an example using the cars data set. How about regressing stopping distance on speed. lm(dist ~ speed, cars) Call:lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed To work more let's store this in a variable car.fit <- lm(dist ~ speed, cars)

15 R: Linear regression summary(car.fit) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * speed e-12 ***

16 R: Linear regression We can also look at individual fields of the lm object. car.fit$coefficients (Intercept) speed car.fit$residuals[1:3] car.fit$fitted.values[1:3]

17 R: Linear regression Plot the fit plot(cars$speed, cars$dist, xlab = "distance", ylab = "speed") abline(car.fit, col="red")

18 R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

19 R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

20 R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

21 R: Mixed models It doesn't seem crazy to fit a slope but use a random effect for intercept. fmOrthF <- lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )

22 R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

23 R: Mixed models Let's take a look at a mixed model. We need a more complex dataset. We use a subset of the Orthodont data set from the Nonlinear Mixed-Effects Models (nlme) library. library(nlme) head(Orthodont) Grouped Data: distance ~ age | Subject distance age Subject Sex M01 Male M01 Male M01 Male M01 Male

24 R: Mixed models OrthoFem <- Orthodont[Orthodont$Sex == "Female", ] plot(OrthoFem)

25 R: Mixed models In fact, it isn't crazy. summary(fmOrthF) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: Fixed effects: distance ~ age Value Std.Error DF t-value p-value (Intercept) age Correlation: (Intr)age

26 R: Conditional trees At this point, I tag Olga in.


Download ppt "Jefferson Davis Research Analytics"

Similar presentations


Ads by Google