Partial Regression Plots. Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Topic 15: General Linear Tests and Extra Sum of Squares.
CS Example: General Linear Test (cs2.sas)
EPI 809/Spring Probability Distribution of Random Error.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Example: Price Analysis for Diamond Rings in Singapore Variables Response Variable: Price in Singapore dollars (Y) Explanatory Variable: Weight of diamond.
EPI809/Spring Testing Individual Coefficients.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Distribution of X: (nknw096) data toluca; infile 'H:\CH01TA01.DAT'; input lotsize workhrs; seq=_n_; proc print data=toluca; run; Obslotsizeworkhrsseq
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Topic 18: Model Selection and Diagnostics
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Lecture 4 SIMPLE LINEAR REGRESSION.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be.
Nonlinear Models. Learning Example: knnl533.sas Y = relative efficiency of production of a new product (1/expected cost) X 1 : Location A : X 1 = 1, B:
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
1 Fourier Series (Spectral) Analysis. 2 Fourier Series Analysis Suitable for modelling seasonality and/or cyclicalness Identifying peaks and troughs.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Hormone Example: nknw892.sas Y = change in growth rate after treatment Factor A = gender (male, female) Factor B = bone development level (severely depressed,
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
Bread Example: nknw817.sas Y = number of cases of bread sold (sales) Factor A = height of shelf display (bottom, middle, top) Factor B = width of shelf.
ANOVA: Graphical. Cereal Example: nknw677.sas Y = number of cases of cereal sold (CASES) X = design of the cereal package (PKGDES) r = 4 (there were 4.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Sigmoidal Response (knnl558.sas). Programming Example: knnl565.sas Y = completion of a programming task (1 = yes, 0 = no) X 2 = amount of programming.
Example 1 (knnl917.sas) Y = months survival Treatment (3 levels)
EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Multiple Regression Example A hospital administrator wished to study the relation between patient satisfaction (Y) and the patient’s age (X 1 ), severity.
Experimental Statistics - week 9
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Interview Example: nknw964.sas Y = rating of a job applicant Factor A represents 5 different personnel officers (interviewers) n = 4.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Stats Methods at IC Lecture 3: Regression.
Multiple Linear Regression.
بحث في التحليل الاحصائي SPSS بعنوان :
Regression.
Regression.
Multiple Linear Regression
Regression.
Regression Chapter 8.
Regression.
Regression.
REGRESSION DIAGNOSTICS
Presentation transcript:

Partial Regression Plots

Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income (in $1000) X 2 = risk aversion score (0 – 10)

Life Insurance: Input, diagnostics title1 h=3 'Insurance'; data insurance; infile 'I:\My Documents\Stat 512\CH10TA01.DAT'; input income risk amount; run; proc print data=insurance; run; *diagnostics; title2 h=2 'residual plots'; symbol1 v=circle c=black; proc reg data=insurance; model amount = income risk/r p; plot r.*(p. income risk); run;

Life Insurance: output, diagnostics

Life Insurance: Initial Regression Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept <.0001 income <.0001 risk

Life Insurance: Scatter plot title2 h=2 'Scatterplot'; proc sgscatter data=insurance; matrix income risk amount; run;

Life Insurance – Residual Plots

Life Insurance – Residual Plots (cont)

Life Insurance: Partial Regression Plots (1) proc reg data=insurance; model amount=income risk/partial; run;

Life Insurance: Partial Regression Plots (1)

Life Insurance: Partial Regression Plots (2) risk title1 h=3 'Partial residual plot'; title2 h=2 'for risk'; symbol1 v=circle i=rl; axis1 label=(h=2 'Risk Aversion Score'); axis2 label=(h=2 angle=90 'Amount of Insurance'); proc reg data=insurance; model amount risk = income; output out=partialrisk r=resamt resrisk; proc gplot data=partialrisk; plot resamt*resrisk / haxis=axis1 vaxis=axis2 vref = 0; run;

Life Insurance: Partial Regression Plots (2) risk (cont)

Life Insurance: Partial Regression Plots (2) income axis3 label=(h=2 'Income'); title2 h=2 'for income'; proc reg data=insurance; model amount income = risk; output out=partialincome r=resamt resinc; proc gplot data=partialincome; plot resamt*resinc / haxis=axis3 vaxis=axis2 vref = 0; run;

Life Insurance: Partial Regression Plots (2) income (cont)

Life Insurance: Quadratic title1 'Quadratic model'; title2 ''; data quad; set insurance; sinc = income; proc standard data=quad out=quad mean=0; var sinc; data quad; set quad; incomesq = sinc*sinc; proc corr data=quad; var amount risk income incomesq; run; proc reg data=quad; model amount = income risk incomesq; run;

Life Insurance: Quadratic (regression) Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept <.0001 income <.0001 risk <.0001 incomesq <.0001

Life Insurance: Quadratic (residual plots)

Life Insurance: normality Original ModelWith Quadratic Term

Types of Outliers

Life Insurance: Studentized Residuals (nknw364.sas) proc reg data=quad; model amount=income risk incomesq/r; output out = diag r=resid student=student; run; proc print data=diag; run;

Life Insurance: Studentized Residuals (cont) Output Statistics Obs Dependent Variable Predicted Value Std Error Mean Predict Residual Std Error Residual Student Residual |******| | | |* | | | | | | | | |* | | |* | | | | | | | | |* | | |* | | | | | *| | | | | | *| | | |** | | *| | | | | | | |

Life Insurance: Studentized Residuals (cont) Obsincomeriskamountsincincomesqresidstudent

Life Insurance: Studentized Deleted Residuals proc reg data=quad; model amount=income risk incomesq/r influence; output out = diag1 r=resid rstudent=rstudent; run; proc print data=diag1; run;

Studentized Deleted Residuals (cont) Obs Dependent Variable Predicted Value RStudent Hat Diag H

Studentized Deleted Residuals (cont) Sum of Residuals0 Sum of Squared Residuals Predicted Residual SS (PRESS)

Studentized Deleted Residuals (cont) Obsincomeriskamountsincincomesqresidrstudent

Studentized Deleted Residuals: w/o square Obs Dependent Variable Predicted Value RStudent Hat Diag H

/r vs. /influence /r keyword /influence keyword Obs Dependent Variable Predicted Value Std Error Mean Predict Residual Std Error Residual Student Residual bar graph Cook's D ObsResidualRStudent Hat Diag H Cov Ratio DFFITS DFBETAS all parameters

Hat Matrix Diagnosis, DFFITS ObsResidualRStudentHat Diag HCov RatioDFFITS

Cook’s Distance, DFBetas, Cov Ratio Obs Cook's D Cov Ratio DFFITS DFBETAS Interceptincomeriskincomesq

Life Insurance: Multicollinearity proc reg data=quad; model amount=income risk incomesq/tol vif; run; Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|Tolerance Variance Inflation Intercept < income < risk < incomesq <

Body Fat: Multicollinearity (nknw260b.sas) data bodyfat; infile 'I:\My Documents\Stat 512\CH07TA01.DAT'; input skinfold thigh midarm fat; proc print data=bodyfat; run; proc reg data=bodyfat; model fat=skinfold thigh midarm/vif tol; run; Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|ToleranceVariance Inflation Intercept skinfold thigh midarm

Blood Pressure Example: Background (nknw406.sas) Researching the relationship between blood pressure in healthy women ages 20 – 60. Y = diastolic blood pressure (diast) X = age n = 54

Blood Pressure: input data pressure; infile ‘H:\My Documents\Stat 512\CH11TA01.DAT'; input age diast; proc print data=pressure; run; title1 h=3 'Blood Pressure'; title2 h=2 'Scatter plot'; symbol1 v=circle i=sm70 c=purple; axis1 label=(h=2); axis2 label=(h=2 angle=90); proc sort data=pressure; by age; proc gplot data=pressure; plot diast*age; run;

Blood Pressure: Scatterplot

Blood Pressure: regression (unweighted) proc reg data=pressure; model diast=age / clb; output out=diag r=resid; run; Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept < age <

Blood Pressure: Residual Plots data diag; set diag; absr=abs(resid); sqrr=resid*resid; title2 h=2 'residual abs(resid) squared residual plots vs. age'; proc gplot data=diag; plot (resid absr sqrr)*age/haxis=axis1 vaxis=axis2; run;

Blood Pressure: Residual Plots (cont)

Blood Pressure: computing weights proc reg data=diag; model absr=age; output out=findweights p=shat; data findweights; set findweights; wt=1/(shat*shat);

Blood Pressure: computing weights if using resid 2 proc reg data=diag; model sqrr=age; output out=findweights p=shat2; data findweights; set findweights; wt=1/shat2;

Blood Pressure: weighted regression proc reg data=findweights; model diast=age / clb p; weight wt; output out = weighted r = resid p = predict; run; Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept < age <

Blood pressure: Comparison Normal Regression Weighted Regression Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept < age < Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept < age <

Blood Pressure: new residuals data graphtest; set weighted; resid1 = sqrt(wt)*resid; title2 h=2 'Weighted data - residual plot'; symbol1 v=circle i=none color=red; proc gplot data=graphtest; plot resid1*predict/vref=0 haxis=axis1 vaxis=axis2; run;

Blood Pressure: new residuals

Biased vs. Unbiased Estimators

Body Fat Example (ridge.sas) n = 20 healthy female subjects ages of 25 – 34 Y = body fat (fat) X 1 = triceps skinfold thickness (skinfold) X 2 = thigh circumference (thigh) X 3 = midarm circumference (midarm) Previous Conclusion: Problem with multicollinearity Good model with a) thigh only or with b) midarm and skinfold only

Body Fat Example: Regression (input) data bodyfat; infile 'I:\My Documents\Stat 512\CH07TA01.DAT'; input skinfold thigh midarm fat; proc print data=bodyfat; run; proc reg data=bodyfat; model fat=skinfold thigh midarm; run;

Body Fat Example: Regression (output) Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept skinfold thigh midarm

Body Fat Example: Scatter plot

Body Fat Example: Correlation proc corr data=bodyfat noprob;run; Pearson Correlation Coefficients, N = 20 skinfoldthighmidarmfat skinfold thigh midarm fat

Body Fat Example: Ridge trace title1 h=3 'Ridge Trace'; title2 h=2 'Body Fat Example'; axis1 label=(h=2); axis2 label= (h=2 angle=90); symbol1 v = S i = none c = black; symbol2 v = T i = none c = red; symbol3 v = M i = none c = green; proc reg data = bodyfat outvif outest = bfout ridge = 0 to.1 by 0.002; model fat = skinfold thigh midarm / noprint; plot / ridgeplot nomodel nostat; run;

Body Fat Example: Ridge trace (cont)

Body Fat Example: VIF factors title2 h=2 'Variance Inflation Factors'; proc gplot data = bfout; plot (skinfold thigh midarm)* _RIDGE_ / overlay haxis=axis1 vaxis=axis2; where _TYPE_ = 'RIDGEVIF'; run;

Body Fat Example: VIF factors (cont) proc print data = bfout; var _RIDGE_ skinfold thigh midarm; where _TYPE_ = 'RIDGEVIF'; Obs_RIDGE_skinfoldthighmidarm

Body Fat Example: Parameters title2 'Parameter Estimates'; proc print data = bfout; var _RIDGE_ _RMSE_ Intercept skinfold thigh midarm; where _TYPE_ = 'RIDGE'; run; Obs_RIDGE__RMSE_Interceptskinfoldthighmidarm