Download presentation
Presentation is loading. Please wait.
Published byCassandra Walton Modified over 9 years ago
1
Correlation & Regression
2
Correlation Measure the strength of linear relation between 2 random variables (X & Y)Measure the strength of linear relation between 2 random variables (X & Y) = Corr(X,Y) = Cov(X,Y)/δxδy = Corr(X,Y) = Cov(X,Y)/δxδy = E[(X-μ x )(Y-μ y )]/[(X- μ x ) 2 (Y- μ y ) 2 ] 1/2 = E[(X-μ x )(Y-μ y )]/[(X- μ x ) 2 (Y- μ y ) 2 ] 1/2 Standardized Cov(X,Y) so -1 1Standardized Cov(X,Y) so -1 1
3
Strength of = -1 Perfect Negative linear relation = -1 Perfect Negative linear relation = 1 Perfect Positive linear relation = 1 Perfect Positive linear relation = 0 No linear relation = 0 No linear relation As | | increases so does the strength of the relationshipAs | | increases so does the strength of the relationship
4
Sample Cov(X,Y) = 1/(n-1) (x i - x)(y i - y)Cov(X,Y) = 1/(n-1) (x i - x)(y i - y) Corr(X,Y) = r =Corr(X,Y) = r = (x i - x)(y i - y)/[ (x i - x) 2 (y i - y) 2 ] 1/2 (x i - x)(y i - y)/[ (x i - x) 2 (y i - y) 2 ] 1/2
5
Hypothesis Test Null: H 0 : = 0Null: H 0 : = 0 Alternative: H A : 0; reject H 0 if t>t n-2, /2Alternative: H A : 0; reject H 0 if t>t n-2, /2 Alternative: H A : > 0; reject H 0 if t > t n-2, Alternative: H A : > 0; reject H 0 if t > t n-2, Alternative: H A : < 0; reject H 0 if t < -t n-2, Alternative: H A : < 0; reject H 0 if t < -t n-2,
6
Rank Correlation (Spearman’s) Sample Correlation (r) can be affected by extreme observationsSample Correlation (r) can be affected by extreme observations Spearman’s RankSpearman’s Rank –1 st rank xi and yi then calculate sample correlation of these ranks –r s = 1- [6( d 2 )/n(n 2 -1)] –Where d i = the differences of the ranked pairs
7
Linear Regression Find/Define relationship between dependent variable and independent variableFind/Define relationship between dependent variable and independent variable Use independent variable to explain the behavior of the dependent variableUse independent variable to explain the behavior of the dependent variable Separate variation in the data into explained variation and unexplained variation (noise)Separate variation in the data into explained variation and unexplained variation (noise) Predict the value of the dependent variable given a value for the independent variablePredict the value of the dependent variable given a value for the independent variable
8
Linear Regression Model Predict Y given XPredict Y given X E(Y|X=x) = 0 + 1 xE(Y|X=x) = 0 + 1 x Y = 0 + 1 x i + iY = 0 + 1 x i + i Assumptions:Assumptions: – I are random variables –E[ i ] = 0 –E[ i i ] = δ 2 –E[ i k ] = 0 i k; they are uncorrelated
9
Sum of Squares Total Sum of Squares =Total Sum of Squares = Regression sum of squares + Error sum of squares SST = SSR + SSESST = SSR + SSE (y i - y) 2 = (y i - y) 2 + e 2 i (y i - y) 2 = (y i - y) 2 + e 2 i
10
Coefficient of Determination (R 2 ) Measures how well x explain the variation in YMeasures how well x explain the variation in Y R 2 = SSR/SST = 1- SSE/SST = r 2R 2 = SSR/SST = 1- SSE/SST = r 2 R 2 measures the explained variation in the dataR 2 measures the explained variation in the data
11
Confidence Interval Error Variance: S 2 e = e 2 i /(n-2) = SSE/(n-2)Error Variance: S 2 e = e 2 i /(n-2) = SSE/(n-2) Unbiased Estimate of δ 2 b : S 2 b = S 2 e / (x i - x) 2Unbiased Estimate of δ 2 b : S 2 b = S 2 e / (x i - x) 2 t = (b- )/S bt = (b- )/S b C.I. for Regression Slope =C.I. for Regression Slope = b-t n-2, /2 S b < < b+t n-2, /2 S b b-t n-2, /2 S b < < b+t n-2, /2 S b
12
Regression Slope Tests H 0 : = 0 or H 0 : 0 vs. H 1 : > 0H 0 : = 0 or H 0 : 0 vs. H 1 : > 0 Reject H 0 if (b- )/S b > t n-2, Reject H 0 if (b- )/S b > t n-2, H 0 : = 0 or H 0 : 0 vs. H 1 : < 0H 0 : = 0 or H 0 : 0 vs. H 1 : < 0 Reject H 0 if (b- )/S b < -t n-2, Reject H 0 if (b- )/S b < -t n-2, H 0 : = 0 vs. H 1 : 0H 0 : = 0 vs. H 1 : 0 Reject H 0 if (b- )/S b > t n-2, or (b- )/S b t n-2, or (b- )/S b < -t n-2,
13
SAS: Inches-Centimeter Data Height;Data Height; Input inches centimeter;Input inches centimeter; Datalines;Datalines; 12.5412.54 25.0825.08 2460.962460.96 410.16410.16 512.7512.7 1640.641640.64 717.78717.78 820.32820.32 1948.261948.26 1025.41025.4 2050.82050.8 2563.52563.5 ; Proc Plot Data=Height;Proc Plot Data=Height; Plot inches*centimeter;Plot inches*centimeter; Proc Corr Data=Height;Proc Corr Data=Height; Title 'Correlation Matrix of Inches vs. Centimeter';Title 'Correlation Matrix of Inches vs. Centimeter'; Var inches centimeter;Var inches centimeter; Proc Reg Data=Height;Proc Reg Data=Height; Title 'Regression Line for Inches-Centimeter Data';Title 'Regression Line for Inches-Centimeter Data'; Model inches=centimeter;Model inches=centimeter; Plot Predicted.*centimeter = 'P'Plot Predicted.*centimeter = 'P' U95M.*centimeter = '-' L95M.*centimeter = '_'U95M.*centimeter = '-' L95M.*centimeter = '_' inches*centimeter = '*' / overlay;inches*centimeter = '*' / overlay; Plot Residual.*centimeter = 'o';Plot Residual.*centimeter = 'o'; Quit;Quit;
14
SAS: GRE – GPA Data Data GRE_GPA;Data GRE_GPA; Input GRE GPA;Input GRE GPA; Datalines;Datalines; 2100421004 19203.819203.8 22903.822903.8 15803.915803.9 14003.7714003.77 13003.9513003.95 20203.820203.8 10603.5410603.54 1500315003 1900419004 19003.719003.7 18003.518003.5 2200422004 19903.5119903.51 2000420004 16503.816503.8 16403.7516403.75 18003.918003.9 23003.9123003.91 20003.7520003.75 20003.920003.9 ; Proc Plot Data=GRE_GPA;Proc Plot Data=GRE_GPA; Plot GRE*GPA;Plot GRE*GPA; Proc Corr Data=GRE_GPA;Proc Corr Data=GRE_GPA; Title 'Correlation Matrix of GRE vs. GPA';Title 'Correlation Matrix of GRE vs. GPA'; Var GRE GPA;Var GRE GPA; Proc Reg Data=GRE_GPA;Proc Reg Data=GRE_GPA; Title 'Regression Line for GRE-GPA Data';Title 'Regression Line for GRE-GPA Data'; Model GPA=GRE;Model GPA=GRE; Plot Predicted.*GRE = 'P'Plot Predicted.*GRE = 'P' U95M.*GRE = '-' L95M.*GRE = '_'U95M.*GRE = '-' L95M.*GRE = '_' GPA*GRE = '*' / overlay;GPA*GRE = '*' / overlay; Plot Residual.*GRE = 'o';Plot Residual.*GRE = 'o'; Quit;Quit;
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.