8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.

Slides:



Advertisements
Similar presentations
Statistical Methods Lecture 28
Advertisements

Brief introduction on Logistic Regression
Copyright © 2010 Pearson Education, Inc. Slide
SW388R6 Data Analysis and Computers I Slide 1 Testing Assumptions of Linear Regression Detecting Outliers Transforming Variables Logic for testing assumptions.
Get it Straight!! Chapter 10
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Assumption of normality
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Re-expressing the data
Re-expressing the Data: Get It Straight!
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs., April 8th
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics Checking Assumptions and Data.
1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Measures of Association Deepak Khazanchi Chapter 18.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
A Further Look at Transformations
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Assumption of Homoscedasticity
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
Assumption of linearity
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
Regression and Correlation Methods Judy Zhong Ph.D.
Chapter 12-2 Transforming Relationships Day 2
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Bivariate Data Analysis Bivariate Data analysis 4.
12/14/2015Slide 1 The dependent variable, poverty, is plotted on the vertical axis. The independent variable, enrolPop, is plotted on the horizontal axis.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Chapter 5 Lesson 5.4 Summarizing Bivariate Data 5.4: Nonlinear Relationships and Transformations.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Statistical Data Analysis - Lecture /04/03
Assumption of normality
Regression Analysis Simple Linear Regression
Chapter 8 Part 2 Linear Regression
Regression model Y represents a value of the response variable.
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
I271b Quantitative Methods
The greatest blessing in life is
Lecture 6 Re-expressing Data: It’s Easier Than You Think
Presentation transcript:

8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the assumption of linearity in the scatterplot of the raw data and the residual plot, the spread of the residuals is equal for all of the predicted values in the residual plot, and there are no outliers impacting the linear model. When the relationship we are analyzing does not meet these criteria, the use of regression analysis can still be justified if re- expressing one or both variables: reduces the non-linear pattern in the scatterplot, equalizes the variance in the residual plot, and reduces the distance of outliers from the other cases in the distributions.

8/7/2015Slide 2 Clues that re-expression might be effective in linearizing the relationship are: the identification of influential cases, severe skewing of one or both variables (outside the range from -1.0 to +1.0), and when Spearman's rho greater than Pearson's r. There is no guarantee that re-expression will produce a scatterplot that satisfies the assumptions of linear regression. When it does not we are left with the choice of determining that the violations are not of serious consequence, or choosing an alternative strategy for modeling the relationship. To solve these problems, we will first assess the conformity of the relationship to regression assumptions. Second, we will examine the criteria that suggest that re-expression might be effective. Third, we will examine the model using re-expressed variables to assess conformity to regression assumptions.

8/7/2015Slide 3 We will use a new strategy for identifying outliers that we may consider omitting from the analysis – Cook’s distance. Cook’s distance combines information about standardized residuals and leverage for independent variables so we can use one measure instead of three. Cook’s distance is a measure of the influence which a case has on the regression solution, i. e., how different would the solution be if this case were omitted. Larger values of Cook’s distance indicate a greater effect on the regression analysis. There are different criteria for what constitutes an outlier on Cook’s distance. Cook’s original criteria was 1.0 Fox proposed 4 / (number of cases – number of iv’s – 1) We will use 0.5, which is about halfway between the other two.

8/7/2015Slide 4 We will use an updated version of the script for simple linear regression to analyze relationships and test re-expressions. The script will compute the transformations for both the dependent and independent variables. The defaults are marked. My preferences are for a scatterplot with boxplots for each variable, and the residual plot. We will use a combination of fit lines to evaluate normality. We have options for the criteria for Cook’s distance and the opportunity to exclude influential cases.

8/7/2015Slide 5 This scatterplot shows that the blue loess fit line fluctuates slightly around the regression line, and stays within the confidence interval.

8/7/2015Slide 6 The residual plot shows that the vertical spread of the residuals is approximately the same height from left to right across the predicted values. There is no evidence of a pattern or shape, suggesting non-linearity.

8/7/2015Slide 7 Influential cases are green instead of blue. There are no cases with undue influence in this plot. This relationship satisfies the criteria for a linear relationship. There is no reason to re-express the data.

8/7/2015Slide 8 The next problem examines the relationship between poverty and per capita GDP.

8/7/2015Slide 9 The loess fit line clearly curves outside the confidence interval. Th boxplot for GDP suggests that we should re-express GDP on a logarithmic scale. The large positive skew value supports the use of logarithms.

8/7/2015Slide 10 The limited spread in the left side of the plot suggests a problem with homogeneity of variance as well as linearity. The pattern of the points is u-shaped supporting the non-linearity

8/7/2015Slide 11 The limited spread on the left side of the plot suggests a problem with homogeneity of variance as well as linearity.

8/7/2015Slide 12 To re-express GDP as logarithms, mark the option button for scale.

8/7/2015Slide 13 The log transformation improved the linearity of the scatterplot. The loess fit line moves slightly outside the confidence interval, but it is more a fluctuation than a well-defined curve.

8/7/2015Slide 14 The residual plot shows that the vertical spread is somewhat reduced at the left side of the plot. It is not so pronounced to be treated as a non-linear relationship. I would interpret the relationship between poverty and the log of GDP as linear.

8/7/2015Slide 15 The skewness of poverty (0.563) was not as pronounced as the skewness of GDP, but we can still re-express the data to see its impact on the relationship.

8/7/2015Slide 16 Including the log of poverty increased the non-linearity shown at the middle of the loess line, though R² increased from to

8/7/2015Slide 17 I think I see evidence of a curve emerging in the residual plot rather than up and down fluctuations. I think a case could be made to include the log of poverty based on the higher R². as well as a case for using raw data for poverty since it is more linear.

8/7/2015Slide 18

8/7/2015Slide 19 The curve clearly curves outside the confidence interval.

8/7/2015Slide 20 Both non-linearity and unequal variance are evident in the residual plot

8/7/2015Slide 21 We will first re-express deathrat as logarithms.

8/7/2015Slide 22 The curve looks more evident after the transformation.

8/7/2015Slide 23 The curve looks more evident after the transformation.

8/7/2015Slide 24 We will re-express poverty as logarithms as well, but I am not optimistic that it will help.

8/7/2015Slide 25 The second transformation did not help either. We can try the transformation of poverty with the raw data for deathrat.

8/7/2015Slide 26 Nor does it help to use the logarithm of poverty with the raw data for deathrat. We should be very cautious about reporting this relationship as linear.

8/7/2015Slide 27

8/7/2015Slide 28 In addition to being non- linear, this relationship shows one influential case.

8/7/2015Slide 29 In addition to being non- linear, this relationship shows one influential case.

8/7/2015Slide 30 x xx Since the skewness for both variables is greater than 1.0, we will try a log transformation for both.

8/7/2015Slide 31 The loess line has a very shallow curve to it, though without the loess line, I would judge this to be linear. The influential case is not as distant from the other cases in the scatterplot and is no longer colored green as an influential case.

8/7/2015Slide 32 The lower left-hand corner looks suspicious for equality of variance, but this is may be the result of lower bounds for the variables, i.e. values stop at zero and cannot be negative.