Download presentation
Presentation is loading. Please wait.
Published byOsborn Osborne Modified over 9 years ago
2
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR –REG –GLM Assumptions of linear correlation/regression Model II regression
3
Xuhua Xia Introduction Correlation –Bivariate correlation –Multiple correlation –Partial correlation –Canonical correlation Regression –Simple regression –Multiple regression –Nonlinear regression (1857-1936) (1822-1911) (1890-1962)
4
Xuhua Xia Regression Coefficient 11444 22111 33000 44111 55444 Sum15 10 X Y Change Y to 3, 4, 5, 6, 7 for students to recompute a and b.
5
Xuhua Xia Least-squares method Least-square estimate of the sample mean
6
yx ŷabxx Qyyabxx Q a yabxx Q b yabxxxx yabxx yabxxxx yabxx iii ii iii ii iii ii iii ii () ( ) [()] [( [()]() [()] [()]() () 22 20 20 0 0 0 0 0 2 2 ynaa y n y yybxxxx yyxxbxx b yyxx xx i i iii iii ii i ; [()]() ()()() ( ) () Least-Square Estimation of Regression Coefficient A trick to simplify the estimation ŷ i
7
Xuhua Xia Maximum Likelihood Method R. A. Fisher Estimation of proportion of males (p) of a fish species in a pond: Two samples are taken, one with 10 fish with 5 males and other with 12 fish but only 3 males
8
Xuhua Xia Correlation & Regression Coefficients 1544-4 2411 33000 4211 5144-4 Sum15 10 -10 X Y
9
Xuhua Xia Regression Coefficient 11444 21142 33000 45142 55444 Sum15 101612 X Y
10
Xuhua Xia The Beetle Experiment
11
Xuhua Xia Regression Coefficient
12
Xuhua Xia 0 1 2 3 4 5 6 7 8 02468 X Y Total deviation Explained deviation Unexplained Deviation Partition of variance
13
Xuhua Xia ANOVA test in regression Perform an ANOVA significance test. Partition of SS in Regression
14
Xuhua Xia /* Weight loss (in mg) of 9 batches of 25 Tribolium beetles after six days of starvation at nine different humidities*/ data beetle; input Humidity WtLoss @@; cards; 0 8.98 12 8.14 29.5 6.67 43 6.08 53 5.9 62.5 5.83 75.5 4.68 85 4.2 93 3.72 ; proc reg; Title ‘Simple linear regression of WtLoss on Humidity’; model WtLoss=Humidity / R CLM alpha = 0.01 CLI ; plot WtLoss *Humidity / conf ; plot WtLoss *Humidity / pred ; plot residual.*Humidity ; run; proc glm; model WtLoss=Humidity; Title ‘Simple linear regression of WtLoss on Humidity’; run; SAS Program Listing
15
Xuhua Xia Dependent Variable: WTLOSS Sum of Mean Source DF Squares Square F Value Prob>F Model 1 23.51449 23.51449 267.183 0.0001 Error 7 0.61606 0.08801 C Total 8 24.13056 Root MSE 0.29666 R-square 0.9745 Dep Mean 6.02222 Adj R-sq 0.9708 C.V. 4.92614 (=100*Root MSE / Mean) Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 8.704027 0.19156450 45.437 0.0001 HUMIDITY 1 -0.053222 0.00325603 -16.346 0.0001 SAS Output
16
Xuhua Xia 0 2 4 6 8 10 12 14 16 051015 X Y Confidence Limits for MSESS X
17
Xuhua Xia 0 2 4 6 8 10 12 14 16 051015 X Y Confidence Limits for Y MSE SS X nX i - Mean X
18
WtLoss = 8.704 -0.0532Humidity WtLoss 3 4 5 6 7 8 9 10 Humidity 0102030405060708090100 /* 99% CL of predicted means, equivalent to Predicted t ,df SE (See Eq)*/ plot WtLoss *Humidity / conf ;
19
WtLoss 2 3 4 5 6 7 8 9 10 Humidity 0102030405060708090100 /* 99% CL of prediction intervals, equivalent to Predicted t ,df STD (with n = 1 in Eq) */ plot WtLoss *Humidity / pred ;
20
Xuhua Xia Regression summary
21
Xuhua Xia Assumptions The regression model Yi = + Xi + i Assumptions –The error term has a mean = 0, is independent and normally distributed at each value of X, and have the same variance at each value of X (homoscedasticity). –Y is linearly related to X –There is negligible error (e.g., measurement error) for X. (Model II regression)
22
Xuhua Xia More plot functions data WtLoss; input Humidity WtLoss; cards; 0.008.98 12.008.14 29.506.67 43.006.08 53.005.90 62.505.83 75.504.68 85.004.20 93.003.72 ; proc reg; model WtLoss=Humidity / alpha=0.01; plot WtLoss*Humidity / pred; plot residual.*predicted. / symbol='.'; Title ‘Simple linear regression of WtLoss on Humidity’; run;
23
Xuhua Xia data My3D ; input X Y Z; datalines; 25.7142835490.25 26.47058341117.0667 27.27272332564.3333 27.7777736122.5 28.57142351579.9 29.41176342258.2424 30.30303333814.5185 31.253212411.4167 31.428573557.5833 32.35294344679 33.33333332690.8125 34.285713522243.1667 34.375322103.2255 35.29411347455.1 35.48387312639.0833 36.3636333905.9688 37.5327211.1458 38.235293411885.5 38.70967312685.4815 39.3939333457.75 4030885 40.6253210263.5313 41.93548314492.141 42.42424331594 43.333333010838.6333 ; proc g3d; scatter X*Y=Z; run; 3D Scatter plot
24
Xuhua Xia Spurious Correlation 10041.7887110000 20096.1752320000 10041.7887210000 30083.8478330000 20096.1752120000 40014.8096540000 50060.0323450000 60043.2171660000 20096.1752320000 50060.0323450000 10041.7887210000 10041.7887110000 70096.1250870000 50060.0323250000 80064.3763980000 90094.3248990000 100034.394010100000 110066.015510110000 Liquor Cons N. Church City Size
25
Xuhua Xia Spurious Correlation data Liquor; input Liquor Church PopSize @@; datalines; 10041.7887 1 10000 20096.1752 3 20000 10041.7887 2 10000 30083.8478 3 30000 20096.1752 1 20000 40014.8096 5 40000 50060.0323 4 50000 60043.2171 6 60000 20096.1752 3 20000 50060.0323 4 50000 10041.7887 2 10000 10041.7887 1 10000 70096.1250 8 70000 50060.0323 2 50000 80064.3763 9 80000 90094.3248 9 90000 100034.3940 10 100000 110066.0155 10 110000 ; proc reg; model Liquor = PopSize; run; proc reg; model Liquor = PopSize / NoInt; run; Forcing the intercept through the origin leads to different computation of SS m and SS t which will be sumsq instead of devsq, i.e., One can use the adjusted R 2 to choose the model.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.