SW388R6 Data Analysis and Computers I Slide 1 Testing Assumptions of Linear Regression Detecting Outliers Transforming Variables Logic for testing assumptions.

Slides:



Advertisements
Similar presentations
Topic 9: Remedies.
Advertisements

One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
Detecting univariate outliers Detecting multivariate outliers
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Multiple Regression – Assumptions and Outliers
Topic 3: Regression.
Regression Diagnostics Checking Assumptions and Data.
Business Statistics - QBM117 Statistical inference for regression.
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.
Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Testing Assumptions of Linear Regression
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Logistic Regression – Complete Problems
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
Assumptions of multiple regression
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Practice Problems Homework Problems Using Scripts.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Hierarchical Binary Logistic Regression
Stepwise Multiple Regression
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
Slide 1 Hierarchical Multiple Regression. Slide 2 Differences between standard and hierarchical multiple regression  Standard multiple regression is.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Problem 3.26, when assumptions are violated 1. Estimates of terms: We can estimate the mean response for Failure Time for problem 3.26 from the data by.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW318 Social Work Statistics Slide 1 One-way Analysis of Variance  1. Satisfy level of measurement requirements  Dependent variable is interval (ordinal)
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Simple linear regression Tron Anders Moger
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Assumption of normality
Regression Analysis Simple Linear Regression
Simple Linear Regression
Multiple Regression – Split Sample Validation
Diagnostics and Remedial Measures
Problem 3.26, when assumptions are violated
Presentation transcript:

SW388R6 Data Analysis and Computers I Slide 1 Testing Assumptions of Linear Regression Detecting Outliers Transforming Variables Logic for testing assumptions

SW388R6 Data Analysis and Computers I Slide 2 Assumptions of regression  Based on information from the data set 2001WorldFactbook.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use.05 for alpha in the regression analysis and.01 for the diagnostic tests.  A simple linear regression between "population growth rate" [pgrowth] and "birth rate" [birthrat] will satisfy the regression assumptions if we choose to interpret which of the following models.  1 The original variables including all cases  2 The original variables excluding extreme outliers  3 The transformed variables including all cases  4 The transformed variables excluding extreme outliers ***  5 The quadratic model including all cases  6 The quadratic model excluding extreme outliers  7 None of the proposed models satisfies the assumptions  The transformed variables excluding extreme outliers is the correct answer. [Feedback: 4743 characters]  TESTING MODEL: ORIGINAL VARIABLES, USING ALL CASES  The linear regression of "birth rate" [birthrat] by "population growth rate" [pgrowth] satisfied one of the regression assumptions (independence of errors). The Durbin- Watson statistic (1.93) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.  However, three assumptions were violated (linearity, homogeneity of error variance, and normality of the residuals). The lack of fit test (F(157, 59) = 1.78, p =.006) indicated that the assumption of linearity was violated. The Breusch-Pagan test (Breusch-Pagan(1) = , p <.001) indicated that the assumption of homogeneity of error variance was violated. The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(218) = 0.81, p <.001) indicated that the assumption of normality of errors was violated.  TESTING MODEL: ORIGINAL VARIABLES, OMITTING EXTREME OUTLIERS  One extreme outliers were found in the data. Montserrat was an extreme outlier (the cook's distance ( ) was larger than the cutoff value of ,the leverage ( ) was larger than the cutoff value of and the studentized residual (-9.173) was smaller than the cutoff value of -4.0).  The linear regression of birthrat by "population growth rate" [pgrowth] satisfied two of the regression assumptions (linearity and independence of errors). The lack of fit test (F(156, 59) = 0.94, p =.617) indicated that the assumption of linearity was satisfied. The Durbin-Watson statistic (2.01) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.  However, two assumptions were violated (homogeneity of error variance and normality of the residuals). The Breusch-Pagan test (Breusch-Pagan(1) = 29.24, p <.001) indicated that the assumption of homogeneity of error variance was violated. The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(217) = 0.97, p <.001) indicated that the assumption of normality of errors was violated.  SELECTING A TRANSFORMATION  The logarithm of "birth rate" [LG_birthrat] with a value of for the Shapiro-Wilk statistic was the transformation that was most normal for the dependent variable "birth rate" [birthrat].  The logarithm of "population growth rate" [LG_pgrowth] with a value of for the Shapiro-Wilk statistic was the transformation that best approximated a normal distribution for the independent variable "population growth rate" [pgrowth].  TESTING MODEL: TRANSFORMED VARIABLES, INCLUDING ALL CASES  The linear regression of logarithm of "birth rate" [LG_birthrat] by logarithm of "population growth rate" [LG_pgrowth] satisfied two of the regression assumptions (linearity and independence of errors). The lack of fit test (F(157, 59) = 1.38, p =.080) indicated that the assumption of linearity was satisfied. The Durbin-Watson statistic (1.94) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.  However, two assumptions were violated (homogeneity of error variance and normality of the residuals). The Breusch-Pagan test (Breusch-Pagan(1) = 29.02, p <.001) indicated that the assumption of homogeneity of error variance was violated. The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(218) = 0.96, p <.001) indicated that the assumption of normality of errors was violated.  TESTING MODEL: TRANSFORMED VARIABLES, EXCLUDING EXTREME OUTLIERS  One extreme outliers were found in the data. Montserrat was an extreme outlier (the cook's distance ( ) was larger than the cutoff value of ,the leverage ( ) was larger than the cutoff value of and the studentized residual (-9.173) was smaller than the cutoff value of -4.0).  The linear regression of logarithm of "birth rate" [LG_birthrat] by logarithm of "population growth rate" [LG_pgrowth] satisfied all of the regression assumptions (linearity, homogeneity of error variance, normality of the residuals, and independence of errors).  The lack of fit test (F(156, 59) = 1.14, p =.288) indicated that the assumption of linearity was satisfied. The Breusch-Pagan test (Breusch-Pagan(1) = 0.82, p =.367) indicated that the assumption of homogeneity of error variance was satisfied. The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(217) = 0.99, p =.357) indicated that the assumption of normality of errors was satisfied. The Durbin-Watson statistic (1.96) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.

SW388R6 Data Analysis and Computers I Slide 3 Run the script - 1 Select Run Script from the Utilities menu.

SW388R6 Data Analysis and Computers I Slide 4 Run the script - 2 Navigate to the folder where you downloaded the script. Click on the Run button to run the script. Highlight the script (.SBS) file to run.

SW388R6 Data Analysis and Computers I Slide 5 Assumption of linearity - 1 Highlight the dependent variable in the list of variables. Click on the arrow button to move the variable to the text box for the dependent variable.

SW388R6 Data Analysis and Computers I Slide 6 Assumption of linearity - 1 Highlight the independent variable in the list of variables. Click on the arrow button to move the variable to the list box for the independent variable.

SW388R6 Data Analysis and Computers I Slide 7 Initial test of conformity to assumptions Run the regression with all cases to test the initial conformity to the assumptions.

SW388R6 Data Analysis and Computers I Slide 8 The Durbin-Watson statistic (1.93) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.

SW388R6 Data Analysis and Computers I Slide 9 The lack of fit test (F(157, 59) = 1.78, p =.006) indicated that the assumption of linearity was violated.

SW388R6 Data Analysis and Computers I Slide 10 The Breusch-Pagan test (Breusch- Pagan(1) = , p <.001) indicated that the assumption of homogeneity of error variance was violated.

SW388R6 Data Analysis and Computers I Slide 11 The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(218) = 0.81, p <.001) indicated that the assumption of normality of errors was violated.

SW388R6 Data Analysis and Computers I Slide 12 One extreme outliers were found in the data. Montserrat was an extreme outlier (the cook's distance ( ) was larger than the cutoff value of ,the leverage ( ) was larger than the cutoff value of and the studentized residual (-9.173) was smaller than the cutoff value of - 4.0).

SW388R6 Data Analysis and Computers I Slide 13 The script will remove the extreme outliers by clicking on the Exclude extreme outliers button. We could exclude the cases one at a time by selecting the case in the list of cases included and clicking on the arrow button, or we can use the script.

SW388R6 Data Analysis and Computers I Slide 14 Case number 136, Montserrat, is added to the list of cases to exclude.

SW388R6 Data Analysis and Computers I Slide 15 To see whether or not removing the outlier resolves the violation of assumptions, run the regression again. Run the regression with all cases to test the initial conformity to the assumptions.

SW388R6 Data Analysis and Computers I Slide 16 This is an example of a strong linear relationship. The red lowess (loess in SPSS) smoother is almost completely straight throughout the range of the data. The rate of change in the dependent variable is the same for all values of the independent variable.

SW388R6 Data Analysis and Computers I Slide 17 Removing the one extreme outlier solved the violation of the assumption of linearity. The lack of fit test (F(156, 59) = 0.94, p =.617) indicated that the assumption of linearity was satisfied.

SW388R6 Data Analysis and Computers I Slide 18 The Durbin-Watson statistic (2.01) fell within the acceptable range from 1.50 to 2.50, indicating that the assumption of independence of errors was satisfied.

SW388R6 Data Analysis and Computers I Slide 19 The Breusch-Pagan test (Breusch- Pagan(1) = 29.24, p <.001) indicated that the assumption of homogeneity of error variance was violated.

SW388R6 Data Analysis and Computers I Slide 20 The Shapiro-Wilk test of studentized residuals (Shapiro-Wilk(217) = 0.97, p <.001) indicated that the assumption of normality of errors was violated.

SW388R6 Data Analysis and Computers I Slide 21 Since removing outliers did not solve all of our violations, we will try transformations of the variables. We restore all of the cases to the analysis by clicking on the Include all cases button.

SW388R6 Data Analysis and Computers I Slide 22 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 23 There is a statistical procedure named the Box-Cox transformation which SPSS does not compute and which I have not added to the script. However, we can use the test of normality as a surrogate. As the statistical value of the Shapiro-Wilk statistic gets larger, it is associated with a higher probability. We will select the transformation with the largest Shapiro- Wilk statistic as the transformation which best “normalizes” the variable, provided it is at least 0.01 larger than the statistical value for the untransformed variable. For this variable, we would choose the Logarithmic transformation. Choosing one transformation does not mean that is is particularly effective, only that it is better than the others.

SW388R6 Data Analysis and Computers I Slide 24 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 25 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 26 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 27 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 28 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 29 First, click on the dependent variable to select it. Click on the Test normality button.

SW388R6 Data Analysis and Computers I Slide 30 Since removing outliers did not solve all of our violations, we will try transformations of the variables. We restore all of the cases to the analysis by clicking on the Include all cases button.