Jefferson Davis Research Analytics

R values have types/classes such as numeric, character, logical, dataframes, and matrices. Much of R functionality is in libraries For help on a function run ? t.test() from the R console. The plot() function will usually do something useful.

Common statistical tests are very straightforward in R. Let's try one on yesterday's dataset cars of car speeds and stopping distances from the 1920s. head(cars) speed dist

Here's a t-test that the mean of the speeds in cars is not 12. t.test(cars$speed, mu=12) One Sample t-test data: cars$speed t = , df = 49, p-value = 3.588e-05 alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x

We can change the parameters of t-test. t.test(cars$speed, mu=12, alternative="less", conf.level=.99) One Sample t-test data: cars$speed t = , df = 49, p-value = 1 alternative hypothesis: true mean is less than percent confidence interval: Inf sample estimates: mean of x

Anything you would see in a year long stats sequence will have an implentation in R. chisq.test() #Chi-squared prop.test() #Proportions test binom.test() #Exact binomial test ks.test() #Kolmogorov–Smirnov sd() #Standard deviation cor() #Correlation

7 R: Linear regression Regression analysis is one of the most popular and important tools in statistics. If R goofed here, it would be worthless. R uses the function lm() for linear models. The regression formula is given in Wilkinson-Rogers notation Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2) (The letter I)

8 R: Linear regression Regression analysis is one of the most important tools in statistics. R uses Wilkinson-Rogers notation to to specify linear models. So a model such as yi = β0 + β1 xi1 + εi Shows up in the R syntax as y ~ x1 Let's review this syntax. (Tables from

1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2)

Model Wilkinson Notation yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors y ~ x1 + x2 yi = β1 xi1 + β2 xi2 + εi Two predictors and no intercept y ~ x1 + x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors with the interaction term y ~ x1 * x2 y ~ x1 + x2 + x1:x2 yi = β0 + β1 (xi1 + xi2 ) + εi Regressing on the sum of predictors y ~ I(x1 + x2) yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + εi Three predictors with one interaction y ~ x1 * x2 + x3

11 R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi β7 xi1 xi2xi3+ εi Three predictors, all interaction terms yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi εi Three predictors, all two-way interaction terms.

yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept y ~ x1*x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi β7 xi1 xi2xi3+ εi Three predictors, all interaction terms y ~ x1 * x2 * x3 yi = β0 + β1 xi1 + β2 xi2 + β3 xi β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi εi Three predictors, all two-way interaction terms y ~ x1 * x2 * x3 – x1:x2:x3

Generic syntax lm(DV ~ IV1, NAME_OF_DATAFRAME) The above tells R that to regress the dependent variable (DV) onto independent variable IV1. We can include other variables and interaction effects. lm(DV ~ IV1 + IV2 + IV1*IV2, NAME_OF_DATAFRAME)

14 R: Linear regression Let's do an example using the cars data set. How about regressing stopping distance on speed. lm(dist ~ speed, cars) Call:lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed To work more let's store this in a variable <- lm(dist ~ speed, cars)

15 R: Linear regression summary( Call: lm(formula = dist ~ speed, data = cars) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * speed e-12 ***

16 R: Linear regression We can also look at individual fields of the lm object.$coefficients (Intercept) speed$residuals[1:3]$fitted.values[1:3]

17 R: Linear regression Plot the fit plot(cars$speed, cars$dist, xlab = "distance", ylab = "speed") abline(, col="red")

21 R: Mixed models It doesn't seem crazy to fit a slope but use a random effect for intercept. fmOrthF <- lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )

23 R: Mixed models Let's take a look at a mixed model. We need a more complex dataset. We use a subset of the Orthodont data set from the Nonlinear Mixed-Effects Models (nlme) library. library(nlme) head(Orthodont) Grouped Data: distance ~ age | Subject distance age Subject Sex M01 Male M01 Male M01 Male M01 Male

24 R: Mixed models OrthoFem <- Orthodont[Orthodont$Sex == "Female", ] plot(OrthoFem)

25 R: Mixed models In fact, it isn't crazy. summary(fmOrthF) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: Fixed effects: distance ~ age Value Std.Error DF t-value p-value (Intercept) age Correlation: (Intr)age

26 R: Conditional trees At this point, I tag Olga in.

