Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6: Linear Regression and Correlation

Similar presentations


Presentation on theme: "Lecture 6: Linear Regression and Correlation"— Presentation transcript:

1 Lecture 6: Linear Regression and Correlation
Nigel Rozario, MS Jie Zhou, MS H. James Norton, PhD 10/17/2013

2 Introduction

3 Example Points on this line; (0,2) (1,7) (2, 12)

4 Linear Regression Linear Regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. Linear regression has many practical uses. - Prediction, forecasting - quantify the strength of the relationship between y and the Xj

5 Assumptions L: Linear (in parameters) I: Independent N: Normality
E: The variances of are equal X: - The regressors xi are assumed to be error-free, that is they are not contaminated with measurement errors.

6 Correlation Pearson’s product moment correlation coefficient
Assumptions: x and y values follows bivariate normal For ordinal data not normally distributed, use Spearman’s Correlation

7

8

9

10 Caution!

11 Method of Least Squares
Let’s use an example.. Revised SAT ranking UNC-Chapel Hill Journalism Professor Phil Meyer used statistical techniques (least-square regression) to adjust for different SAT participation rates for the 50 states and the District of Columbia. In essence, the technique adjusts the data to reflect what the SAT scores would likely be if the same percentage of students in all states took the tests.

12 Data spreadsheet State Raw_Score Taking_Test Orig_Rank Adjusted_Rank
Adjusted Score New Hampshire 921 0.75 28 1 993 Iowa 1093 0.05 2 990 North Dakota 1073 0.06 3 981 Kansas 1039 0.1 4 Illinois 1006 0.16 10 5 978 Minnesota 1023 0.12 7 6 976 Montana 982 0.22 19 974 Connecticut 897 0.81 33 8 North Carolina 844 0.57 48 49 898 South Carolina 832 0.58 51 50 887 Oregon 922 0.54 27 9 972 Massachusetts 896 0.79 35 971 Wisconsin 0.11 11 Colorado 859 0.29 23 12 969 Tennessee 1015 13 968 Nebraska 1024 14 966 Maryland 904 0.64 32 15 965 Washington 913 0.49 31 16 957 New Jersey 886 0.74 39 17 Vermont 890 0.68 37 18 955

13 R2 shows the amount of variance of Y explained by X.
Outcome variable (Y) This is the p-value of the model. It tests whether R2 is different from 0. Root mean squared error, is the SD of the regression. The closer to zero better the fit. R2 shows the amount of variance of Y explained by X. Two-tail P-value test the hypothesis that each coefficient is different from 0 Predictor variable (X) Expected Score = *Taking_Test

14 Expected Score (North Carolina) = 1020.61-220.51x(Taking_Test)
= Residual (or error) = Raw Score – Expected Score = Percentage of students who took the test only partly explains what’s the SAT score for each state

15 Another Example Sbp Age 120 18 130 33 134 27 148 58 110 20 137 30

16 Expected sbp = x (age) when age = 30, expected sbp = x (30) = 129 Residual = observed sbp – expected sbp = = 8

17 Multiple Linear Regression
Data (First 10 observations) age bmi sbp 28 24.33 111 26 25.09 101 31 26.61 120 18 32.26 158 50 22.71 125 42 36.48 166 20 25.18 114 29 21.91 143 35 29.41 47 27.28 133 R2 shows the amount of variance of SBP explained by Age & BMI Two-tail P-value test the hypothesis that each coefficient is different from 0 Reference: Biostatistics: A guide to design, analysis and discovery, 2nd ed [Forthofer, Lee, Hernandez]

18 Pearson Correlation Coefficients, N = 50 Prob > |r| under H0: Rho=0
bmi age sbp Predicted SBP= xAge + 1.3xBMI When Age=28 and BMI=24.33 Predicted SBP= x(28) + 1.3x(24.33) = = Residual = Predicted SBP - Observed SBP = – 111 = 5.95

19 Conclusion Simple Linear Regression: one covariate x
Multivariate Linear Regression : multiple covariates X For the previous first example, other factors might influence the SAT scores : - Percentage of parents have college education - The cost on education per student for each state Adding more covariates, R2 always goes up. This brings up another statistics topic - Goodness of Fit test (GOF)

20 Questions or Comments? Questions or Comments?


Download ppt "Lecture 6: Linear Regression and Correlation"

Similar presentations


Ads by Google