Download presentation
Presentation is loading. Please wait.
Published byEmily Carter Modified over 5 years ago
1
Chapter 4, Regression Diagnostics Detection of Model Violation
Lecture 13 Chapter 4, Regression Diagnostics Detection of Model Violation In Chapter 2, we study SLR models In Chapter 3, we study MLR models We got many useful results about estimation and statistical inferences However, all these results are VALID only when Some Required Assumptions are Satisfied. Questions: 1. What are these required assumptions? 2. How to detect whether these assumptions are violated or not? 3. What would happen if these assumptions are violated? 2/19/2019 ST3131, Lecture 13
2
Answer to Question 3: when some required assumptions are violated,
1). Theories are NO LONGER 2). Applications will lead to results. We will see some examples later. Thus we need to answer Questions 1 and 2. AIMs of Chapter 4: 1). State the 2). Study the to detect model violations. 2/19/2019 ST3131, Lecture 13
3
1. The Linearity Assumption (about the form of the model )
Standard Regression Assumptions: 1. The Linearity Assumption (about the form of the model ) 2. The Measurement Error Assumption (about the measurement errors) 3. The Predictor Assumption (about the predictor variables) 4. The Observation Assumption (about the observation) The Form Assumption: the i-th observation can be written as Detection Method: For SLR (p=1), use of Y against X to detect the linearity. A linear scatter plot ensures linearity. For MLR (p>1), it is a difficult task, we may be able to use the of Y against X1, X2, …, Xp. 2/19/2019 ST3131, Lecture 13
4
1. Assumption: are normally distributed.
2. The Measurement Error Assumption iid means Independently Identically Distributed. This assumption implies 4 sub-Assumptions: Assumption: are normally distributed. Detection Method: plot of residuals. Assumption: have mean 0. Detection Method: Assumption: have the same but variance. When this assumption is violated, the problem is called heterogeneity or heteroscedasticity problem. Detection Method: see Chapter 7. Assumption: are independent of each other. When this assumption is violated, the problem is called autocorrelation problem Detection Method: see Chapter 8. 2/19/2019 ST3131, Lecture 13
5
3. The Predictor Assumption contains 3 sub-assumptions:
a). The Non-random Assumption: X1, X2, …, Xp are non-random. are assumed to be nonrandom or selected in advance. Design Data : Non-design Data or observational data: When this assumption is violated, all inferences are valid, conditional to the observed data. In this course, we assume this condition is always satisfied. Detection Method: beyond our consideration in this course. b). The Without Measurement Error Assumption: X1, X2, …, Xp can be accurately observed, or can be measured without errors. Detection Method: beyond our consideration in this course. In this course, we assume this condition is always satisfied. This assumption is hardly satisfied. If violated, will affect the residual variances, coefficient estimation and fitted values. 2/19/2019 ST3131, Lecture 13
6
The Linearly Independence Assumption: X1, X2, …, Xp are assumed to be linearly independent of each other. This assumption guarantees the of the Least Squares estimates of the regression coefficients. When this assumption is violated, there are multiple solutions for the least squares estimates of the regression coefficients. Detection Method: check if the design matrix is of full rank. 4. The Observation Assumption: all observations are equally reliable, play approximately equal role in determining the regression results and influencing conclusions. 2/19/2019 ST3131, Lecture 13
7
Consequences of the Violations of the Assumptions:
In general, violations of the assumptions invalidate the inference or conclusions too much. However, will distort the conclusions. Thus, we should study how to detect these violations. Let us see some examples below: the Anscombe’s Quartet Data Y1 X1 Y2 X2 Y3 X3 Y4 X4 2/19/2019 ST3131, Lecture 13
8
2/19/2019 ST3131, Lecture 13
9
2/19/2019 ST3131, Lecture 13
10
2/19/2019 ST3131, Lecture 13
11
2/19/2019 ST3131, Lecture 13
12
2/19/2019 ST3131, Lecture 13
13
2/19/2019 ST3131, Lecture 13
14
Methods of Detecting Violations Using .
Using some statistical measures (we will learn some of them soon) Combining a) and b). Most of the above methods for detecting assumption violations are Residual-Based methods or use Residual Plots. The latter can reveal many features about the data that might be missed or overlooked using just summary statistics, e.g., many widely-used statistics, such as, the correlation coefficients, the regression coefficients, etc. based on all the 4 data sets of the Anscombe Data are the same. Thus, we need study the residuals. Various Types of Residuals are a). Ordinary Residuals b). Standardized Residuals c). Studentized Residuals (Interval and External) 2/19/2019 ST3131, Lecture 13
15
1. Ordinary Residuals 2/19/2019 ST3131, Lecture 13
16
2. The Standardized Residuals
2/19/2019 ST3131, Lecture 13
17
a). Internally Studentized Residuals
b). Externally Studentized Residuals 2/19/2019 ST3131, Lecture 13
18
Standard Regression Assumptions: a). about the form of the model
Summary Standard Regression Assumptions: a). about the form of the model b). about the measurement errors c). about the predictor variables d). about the observations II Examples of the Anscombe’s Quartet Data show that a). Gross Violations of assumptions will lead to serious problems b). Summary statistics may miss or overlook the features of the data. III Type of Residuals a). Ordinary b). Standardized c). Studentized 2/19/2019 ST3131, Lecture 13
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.