Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR –REG –GLM Assumptions of linear correlation/regression Model II regression
Xuhua Xia Introduction Correlation –Bivariate correlation –Multiple correlation –Partial correlation –Canonical correlation Regression –Simple regression –Multiple regression –Nonlinear regression ( ) ( ) ( )
Xuhua Xia Regression Coefficient Sum15 10 X Y Change Y to 3, 4, 5, 6, 7 for students to recompute a and b.
Xuhua Xia Least-squares method Least-square estimate of the sample mean
yx ŷabxx Qyyabxx Q a yabxx Q b yabxxxx yabxx yabxxxx yabxx iii ii iii ii iii ii iii ii () ( ) [()] [( [()]() [()] [()]() () ynaa y n y yybxxxx yyxxbxx b yyxx xx i i iii iii ii i ; [()]() ()()() ( ) () Least-Square Estimation of Regression Coefficient A trick to simplify the estimation ŷ i
Xuhua Xia Maximum Likelihood Method R. A. Fisher Estimation of proportion of males (p) of a fish species in a pond: Two samples are taken, one with 10 fish with 5 males and other with 12 fish but only 3 males
Xuhua Xia Correlation & Regression Coefficients Sum X Y
Xuhua Xia Regression Coefficient Sum X Y
Xuhua Xia The Beetle Experiment
Xuhua Xia Regression Coefficient
Xuhua Xia X Y Total deviation Explained deviation Unexplained Deviation Partition of variance
Xuhua Xia ANOVA test in regression Perform an ANOVA significance test. Partition of SS in Regression
Xuhua Xia /* Weight loss (in mg) of 9 batches of 25 Tribolium beetles after six days of starvation at nine different humidities*/ data beetle; input Humidity WtLoss cards; ; proc reg; Title ‘Simple linear regression of WtLoss on Humidity’; model WtLoss=Humidity / R CLM alpha = 0.01 CLI ; plot WtLoss *Humidity / conf ; plot WtLoss *Humidity / pred ; plot residual.*Humidity ; run; proc glm; model WtLoss=Humidity; Title ‘Simple linear regression of WtLoss on Humidity’; run; SAS Program Listing
Xuhua Xia Dependent Variable: WTLOSS Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-square Dep Mean Adj R-sq C.V (=100*Root MSE / Mean) Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP HUMIDITY SAS Output
Xuhua Xia X Y Confidence Limits for MSESS X
Xuhua Xia X Y Confidence Limits for Y MSE SS X nX i - Mean X
WtLoss = Humidity WtLoss Humidity /* 99% CL of predicted means, equivalent to Predicted t ,df SE (See Eq)*/ plot WtLoss *Humidity / conf ;
WtLoss Humidity /* 99% CL of prediction intervals, equivalent to Predicted t ,df STD (with n = 1 in Eq) */ plot WtLoss *Humidity / pred ;
Xuhua Xia Regression summary
Xuhua Xia Assumptions The regression model Yi = + Xi + i Assumptions –The error term has a mean = 0, is independent and normally distributed at each value of X, and have the same variance at each value of X (homoscedasticity). –Y is linearly related to X –There is negligible error (e.g., measurement error) for X. (Model II regression)
Xuhua Xia More plot functions data WtLoss; input Humidity WtLoss; cards; ; proc reg; model WtLoss=Humidity / alpha=0.01; plot WtLoss*Humidity / pred; plot residual.*predicted. / symbol='.'; Title ‘Simple linear regression of WtLoss on Humidity’; run;
Xuhua Xia data My3D ; input X Y Z; datalines; ; proc g3d; scatter X*Y=Z; run; 3D Scatter plot
Xuhua Xia Spurious Correlation Liquor Cons N. Church City Size
Xuhua Xia Spurious Correlation data Liquor; input Liquor Church PopSize datalines; ; proc reg; model Liquor = PopSize; run; proc reg; model Liquor = PopSize / NoInt; run; Forcing the intercept through the origin leads to different computation of SS m and SS t which will be sumsq instead of devsq, i.e., One can use the adjusted R 2 to choose the model.