Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Linear regression models
Correlation and Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
Lecture 5 Correlation and Regression
Correlation & Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction to Linear Regression
Correlation & Regression
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Stats Methods at IC Lecture 3: Regression.
Chapter 12: Regression Diagnostics
6-1 Introduction To Empirical Models
Simple Linear Regression
Simple Linear Regression
Linear Regression and Correlation
Presentation transcript:

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11 Outline Confidence interval and prediction interval. Regression Assumptions. Checking Assumptions (model adequacy). Correlation. Influential observations.

Prediction Our regression model is Number Repair of components time i xi yi 1 1 23 2 2 29 3 4 64 4 4 72 5 4 80 6 5 87 7 6 96 8 6 105 9 8 127 10 8 119 11 9 145 12 9 149 13 10 165 14 10 154 so that the average value of the response at X=x is

The estimated average response at X=x is therefore The expected value! This quantity is a statistic, a random variable, hence it has a sampling distribution. Regression Assumptions Normal Distribution for  Sample estimate, and associated variance: A (1-a)100% CI for the average response at X=x is therefore:

Prediction and Predictor Confidence The best predictor of an individual response y at X=x, yx,pred, is simply the average response at X=x. Random variables -- they vary from sample-to-sample. Variance associated with an individual prediction is larger than that for the mean value! Why? Hence the predicted value is also a random variable. A (1-a)100% CI for an individual response at X=x:

Prediction band - what would we expect for one new observation. Confidence band - what would we expect for the mean of many observations taken at the value of X=x.

Regression Assumptions and Lack of Fit Regression Model Assumptions Effect additivity (multiple regression) Normality of the residuals Homoscedasticity of the residuals Independence of the residuals

Additivity Additivity assumption. “The expected value of an observation is a weighted linear combination of a number of factors.” Which factors? (model uncertainty) number of factors in the model interactions of factors powers or transformations of factors

Homoscedasticity and Normality Observations never equal their expected values. No systematic biases. Homoscedasticity assumption. The unexplained component has a common variance for all values i. Normality assumption. The unexplained component has a normal distribution.

Independence Independence assumption. Responses in one experimental unit are not correlated with, affected by, or related to, responses for other experimental units.

Correlation Coefficient A measure of the strength of the linear relationship between two variables. Product Moment Correlation Coefficient In SLR, r is related to the slope of the fitted regression equation. r2 (or R2) represents that proportion of total variability of the Y-values that is accounted for by the linear regression with the independent variable X. R2: Proportion of variability in Y explained by X.

Properties of r 1. r lies between -1 and +1. r > 0 indicates a positive linear relationship. r < 0 indicates a negative r = 0 indicates no linear relationship. r = 1 indicates perfect linear relationship. 2. The larger the absolute value of r, the stronger the linear relationship. 3. r2 also lies between 0 and 1.

Checking Assumptions How well does the model fit? Do predicted values seem to be placed in the middle of observed values? Do residuals satisfy the regression assumptions? (Problems seen in plot of X vs. Y will be reflected in residual plot.) y Constant variance? Regularities suggestive of lack of independence or more complex model? Poorly fit observations? x

Studentized residuals (ei) Model Adequacy Studentized residuals (ei) Allows us to gauge whether the residual is too large. It should have a standard normal distribution, hence it is very unlikely that any studentized residual will be outside the range [-3,3]. MSE(I) is the calculated MSE leaving observation i out of the computations. hi is the ith diagonal of the projection matrix for the predictor space (ith hat diagonal element).

Normality of residuals Kolmogorov-Smirnov Test Shapiro-Wilks Test (n<50) D’Agostino’s Test (n³50) Formal Goodness of fit tests: All quite conservative - they fail to reject the hypothesis of normality more often than they should. Graphical Approach: Quantile-quantile plot (qq-plot) 1. Compute and sort the simple residuals e[1],e[2],…e[n]. 2. Associate to each residual a standard normal quantile [z[i]=normsinv((i-.5)/n)]. 3. Plot z[I] versus e[I]. Compare to 45o line.

Influence Diagnostics (Ways to detect influential observations) Does a particular observation consisting of a pair of (X,Y) values (a case) have undue influence on the fit of the regression model? i.e. what cases are greatly affecting the estimates of the p regression parameters in the model. (For simple linear regression p=2.) Standardized/Studentized Residuals. The ei are used to detect cases that are outlying with respect to their Y values. Check cases with |ei| > 2 or 3. Hat diagonal elements. The hi are used to detect cases that are outlying with respect to their X values. Check cases with hi > 2p/n.

Dffits. Measures the influence that the ith case has on the ith fitted value. Compares the ith fitted value with the ith fitted value obtained by omitting the ith case. Check cases for which |Dffits|>2Ö(p/n). Cook’s Distance. Similar to Dffits, but considers instead the influence of the ith case on all n fitted values. Check when Cook’s Dist > Fp,n-p,0.50. Covariance Ratio. The change in the determinant of the covariance matrix that occurs when the ith case is deleted. Check cases with |Cov Ratio  1| ³ 3p/n. Dfbetas. A measure of the influence of the ith case on each estimated regression parameter. For each regression parameter, check cases with |Dfbeta| > 2/Ön.

Cutoffs: Hat=0.29, CovRatio=0.43, Dffits=0.76, Dfbetas=0.53

Obs 5 Obs 1 Obs 2