Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height Histogram Continuous Scatter Predictor Variable (X-Axis) Parents Height Categorical Boxplot Gender Regression Model Linear Regression
Correlation
Correlation Matrix
Analytics & History: 1st Regression Line http://galton.org/cgi-bin/searchImages/search/pearson/vol3a/pages/vol3a_0019.htm The first “Regression Line”
Describing a Straight Line Regression coefficient for the predictor Gradient (slope) of the regression line Direction/strength of relationship b0 Intercept (value of Y when X = 0) Point at which the regression line crosses the Y- axis (ordinate) Slide 5
Which line fits the best?
Sum of Squares Total sum of squares Model sum of squares Residual sum of squares F R2
Sum of Squares SST SSR SSM Total variability (variability between scores and the mean). SSR Residual/error variability (variability between the regression model and the actual data). SSM Model variability (difference in variability between the model and the mean). Slide 8
Testing the Model: ANOVA SST Total Variance in the Data SSM Improvement Due to the Model SSR Error in Model If the model results in better prediction than using the mean, then we expect SSM to be much greater than SSR
Linear Model - Regression lm() function – lm stands for ‘linear model’. Model <-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action)) model.1 <- lm(childHeight~father, data = heights)
Correlation
Model 1
Testing the Model: R2 R2 The proportion of variance accounted for by the regression model. The Pearson Correlation Coefficient Squared Slide 15
Residuals
Prediction predict(model.1) heights$model1 <- predict(model.1)
Compare Models 0.385 Model 1 2 12 3 4 Intercept 40.1 46.6 22.6 22.63 22.64 Father 0.385 0.36 0.01 Mom 0.314 0.29 NA midparentHeight 0.637 0.538 R-squares 0.070 0.0395 0.105 0.102 0.1033 r 0.27 0.2 0.32 R^2 0.073 0.04
Box Plot http://web.anglia.ac.uk/numbers/graphsCharts.html
Descriptive Stats: Box Plot
Regression: Children Heights~Gender model.5 <- lm(childHeight~gender, data = h)
Linear Regression Comparison Model 1 2 12 3 4 5 6 7 Intercept 40.1 46.6 22.6 64.1 16.5 Father 0.385 0.36 x 0.39 Mom 0.314 0.29 0.31 midparentHeight 0.637 0.538 0.687 Gender 5.13 5.21 R-squares 0.070 0.0395 0.105 0.102 0.1033 0.5137 0.632 0.634 r 0.27 0.2 0.32 0.717 R^2 0.073 0.04
Model Specification & Prediction Outcome = (Model) + Error Height = 16.5 + 0.39*father + 0.21mother + 5.21Gender + error Gender: Male: 1 Female: 0