Jefferson Davis Research Analytics

Slides:

Advertisements

Similar presentations

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.

Advertisements

1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.

Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.

BCOR 1020 Business Statistics Lecture 28 – May 1, 2008.

Chapter Topics Types of Regression Models

Student’s t statistic Use Test for equality of two means

Simple Linear Regression and Correlation

Correlation and Linear Regression

Simple Linear Regression Models

 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.

7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.

Inferences for Regression

Lecture 5 Linear Mixed Effects Models

Ch4 Describing Relationships Between Variables. Pressure.

Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.

June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.

Introduction to R Jefferson Davis Research Analytics.

> xyplot(cat~time|id,dd,groups=ses,lty=1:3,type="l") > dd head(dd) id ses cat pre post time

Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.

Linear Regression Hypothesis testing and Estimation.

Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.

Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.

Jefferson Davis Research Analytics

Inference about the slope parameter and correlation

The simple linear regression model and parameter estimation

EXCEL: Multiple Regression

Unit 3: Inferences about a Single Mean (1 Parameter models)

REGRESSION G&W p

Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3

Chapter 12 Simple Linear Regression and Correlation

CHAPTER 7 Linear Correlation & Regression Methods

business analytics II ▌assignment one - solutions autoparts 

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample

Applied Biostatistics: Lecture 2

B&A ; and REGRESSION - ANCOVA B&A ; and

Regression Chapter 6 I Introduction to Regression

Correlation and Regression

Mixed models and their uses in meta-analysis

Correlation and regression

CHAPTER 12 More About Regression

Chapter 11 Simple Regression

Checking Regression Model Assumptions

Simple Linear Regression - Introduction

CHAPTER 29: Multiple Regression*

Chapter 12 Regression.

Checking Regression Model Assumptions

Bayesian style dependent tests

Chapter 12 Simple Linear Regression and Correlation

Multiple Regression Models

Regression Chapter 8.

Correlation and Regression

CHAPTER 12 More About Regression

Simple Linear Regression

CHAPTER 12 More About Regression

Estimating the Variance of the Error Terms

Inference for Regression

Introduction to Regression

Andrea Friese, Silvia Artuso, Danai Laina

Presentation transcript:

Jefferson Davis Research Analytics R Bootcamp Day 3 Jefferson Davis Research Analytics

Day 2 stuff From yesterday and the day before R values have types/classes such as numeric, character, logical, dataframes, and matrices. Much of R functionality is in libraries For help on a function run ? t.test() from the R console. The plot() function will usually do something useful.

R: Common stats functions Common statistical tests are very straightforward in R. Let's try one on yesterday's dataset cars of car speeds and stopping distances from the 1920s. head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10

R: Common stats functions Here's a t-test that the mean of the speeds in cars is not 12. t.test(cars$speed, mu=12) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 3.588e-05 alternative hypothesis: true mean is not equal to 12 95 percent confidence interval: 13.89727 16.90273 sample estimates: mean of x 15.4

R: Common stats functions We can change the parameters of t-test. t.test(cars$speed, mu=12, alternative="less", conf.level=.99) One Sample t-test data: cars$speed t = 4.5468, df = 49, p-value = 1 alternative hypothesis: true mean is less than 12 99 percent confidence interval: -Inf 17.19834 sample estimates: mean of x 15.4

R: Common stats functions Anything you would see in a year long stats sequence will have an implentation in R. chisq.test() #Chi-squared prop.test() #Proportions test binom.test() #Exact binomial test ks.test() #Kolmogorov–Smirnov sd() #Standard deviation cor() #Correlation

R: Linear regression Regression analysis is one of the most popular and important tools in statistics. If R goofed here, it would be worthless. R uses the function lm() for linear models. The regression formula is given in Wilkinson-Rogers notation Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2) (The letter I)

R: Linear regression Regression analysis is one of the most important tools in statistics. R uses Wilkinson-Rogers notation to to specify linear models. So a model such as yi = β0 + β1 xi1 + εi Shows up in the R syntax as y ~ x1 Let's review this syntax. (Tables from https://www.mathworks.com/help/stats/wilkinson-notation.html)

R: Linear regression Predictor terms Wilkinson Notation Intercept 1 (Default) No intercept -1 x1 x1, x2 x1 + x2 x1, x2, x1x2 x1*x2 (or x1 + x2 + x1:x2) x1x2 x1:x2 x12, x1 x1^2 x1 + x2 I(x1 + x2)

R: Linear regression yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors Model Wilkinson Notation yi = β0 + β1 xi1 + β2 xi2 + εi Two predictors y ~ x1 + x2 yi = β1 xi1 + β2 xi2 + εi Two predictors and no intercept y ~ x1 + x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors with the interaction term y ~ x1 * x2 y ~ x1 + x2 + x1:x2 yi = β0 + β1 (xi1 + xi2 ) + εi Regressing on the sum of predictors y ~ I(x1 + x2) yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + εi Three predictors with one interaction y ~ x1 * x2 + x3

R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms.

R: Linear regression Model terms Wilkinson Notation yi = β1 xi1 + β2 xi2 + β3 xi1 xi2 + εi Two predictors, no intercept y ~ x1*x2 - 1 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + β7 xi1 xi2xi3+ εi Three predictors, all interaction terms y ~ x1 * x2 * x3 yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi1 xi2 + β5 xi1 xi3 + β6 xi2 xi3 + εi Three predictors, all two-way interaction terms y ~ x1 * x2 * x3 – x1:x2:x3

R: Linear regression R uses the function lm() for linear models. Generic syntax lm(DV ~ IV1, NAME_OF_DATAFRAME) The above tells R that to regress the dependent variable (DV) onto independent variable IV1. We can include other variables and interaction effects. lm(DV ~ IV1 + IV2 + IV1*IV2, NAME_OF_DATAFRAME)

R: Linear regression Let's do an example using the cars data set. How about regressing stopping distance on speed. lm(dist ~ speed, cars) Call:lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed -17.579 3.932 To work more let's store this in a variable car.fit <- lm(dist ~ speed, cars)

R: Linear regression summary(car.fit) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 ***

R: Linear regression We can also look at individual fields of the lm object. car.fit$coefficients (Intercept) speed -17.579095 3.932409 car.fit$residuals[1:3] 1 2 3 3.849460 11.849460 -5.947766 car.fit$fitted.values[1:3] 1 2 3 -1.849460 -1.849460 9.947766

R: Linear regression Plot the fit plot(cars$speed, cars$dist, xlab = "distance", ylab = "speed") abline(car.fit, col="red")

R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

R: Mixed models It doesn't seem crazy to fit a slope but use a random effect for intercept. fmOrthF <- lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject )

R: Linear regression Class lm object have their own overloaded plot() function plot(car.fit)

R: Mixed models Let's take a look at a mixed model. We need a more complex dataset. We use a subset of the Orthodont data set from the Nonlinear Mixed-Effects Models (nlme) library. library(nlme) head(Orthodont) Grouped Data: distance ~ age | Subject distance age Subject Sex 1 26.0 8 M01 Male 2 25.0 10 M01 Male 3 29.0 12 M01 Male 4 31.0 14 M01 Male

R: Mixed models OrthoFem <- Orthodont[Orthodont$Sex == "Female", ] plot(OrthoFem)

R: Mixed models In fact, it isn't crazy. summary(fmOrthF) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik 149.2183 156.169 -70.60916 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 2.06847 0.7800331 Fixed effects: distance ~ age Value Std.Error DF t-value p-value (Intercept) 17.372727 0.8587419 32 20.230440 0 age 0.479545 0.0525898 32 9.118598 0 Correlation: (Intr)age -0.674

R: Conditional trees At this point, I tag Olga in.