Correlation, Bivariate Regression, and Multiple Regression

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Lesson 10: Linear Regression and Correlation
Kin 304 Regression Linear Regression Least Sum of Squares
Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Lecture 6: Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Relationships Among Variables
Correlation & Regression
Correlation and Linear Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 6 & 7 Linear Regression & Correlation
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Examining Relationships in Quantitative Research
Part IV Significantly Different: Using Inferential Statistics
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression (SLR)
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Regression and Correlation
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Statistics for the Social Sciences
Regression and Correlation
Kin 304 Regression Linear Regression Least Sum of Squares
Regression Analysis Simple Linear Regression
Simple Linear Regression
Chapter 12: Regression Diagnostics
Chapter 13 Simple Linear Regression
Checking Regression Model Assumptions
BPK 304W Correlation.
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
AP Exam Review Chapters 1-10
Checking Regression Model Assumptions
CHAPTER- 17 CORRELATION AND REGRESSION
Product moment correlation
Introduction to Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Correlation, Bivariate Regression, and Multiple Regression Chapter 7 Correlation, Bivariate Regression, and Multiple Regression

Pearson’s Product Moment Correlation Correlation measures the association between two variables. Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable. r ranges from +1 to -1. Correlation can be used for prediction. Correlation does not indicate the cause of a relationship.

Scatter Plot Scatter plot gives a visual description of the relationship between two variables. The line of best fit is defined as the line that minimized the squared deviations from a data point up to or down to the line.

Line of Best Fit Minimizes Squared Deviations from a Data Point to the Line

Always do a Scatter Plot to Check the Shape of the Relationship

Will a Linear Fit Work?

Will a Linear Fit Work? y = 0.5246x - 2.2473 R2 = 0.4259

2nd Order Fit? y = 0.0844x2 + 0.1057x - 1.9492 R2 = 0.4666

6th Order Fit? y = 0.0341x6 - 0.6358x5 + 4.3835x4 - 13.609x3 + 18.224x2 - 7.3526x - 2.0039 R2 = 0.9337

Will Linear Fit Work?

Linear Fit y = 0.0012x - 1.0767 R2 = 0.0035

Correlation Formulas

Evaluating the Strength of a Correlation For predictions, absolute value of r < .7, may produce unacceptably large errors, especially if the SDs of either or both X & Y are large. As a general rule Absolute value r greater than or equal .9 is good Absolute value r equal to .7 - .8 is moderate Absolute value r equal to .5 - .7 is low Values for r below .5 give R2 = .25, or 25% are poor, and thus not useful for predicting.

Significant Correlation?? If N is large (N=90) then a .205 correlation is significant. ALWAYS THINK ABOUT R2 How much variance in Y is X accounting for? r = .205 R2 = .042, thus X is accounting for 4.2% of the variance in Y. This will lead to poor predictions. A 95% confidence interval will also show how poor the prediction is.

Venn diagram shows (R2) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R2) = .36, 36% R2=.64 (64%) Variance in Y that is explained by X

The vertical distance (up or down) from a data point to the line of best fit is a RESIDUAL. Y = mX + b Y = .72 X + 13

Calculation of Regression Coefficients (b, C) If r < .7 prediction will be poor. Large SDs adversely affect the accuracy of the prediction.

Standard Deviation of Residuals

Standard Error of Estimate (SEE) SD of Y Prediction Errors The SEE is the SD of the prediction errors (residuals) when predicting Y from X. SEE is used to make a confidence interval for the prediction equation.

The SEE is used to compute confidence intervals for prediction equation.

Example of a 95% confidence interval. Both r and SDY are critical in accuracy of prediction. If SDY is small and r is big, predictions are will be small. If SDY is big and r is small, predictions are will be large. We are 95% sure the mean falls between 45.1 and 67.3

The advantage of multivariate or bivariate regression is Multiple Regression Multiple regression is used to predict one Y (dependent) variable from two or more X (independent) variables. The advantage of multivariate or bivariate regression is Provides lower standard error of estimate Determines which variables contribute to the prediction and which do not.

Multiple Regression b1, b2, b3, … bn are coefficients that give weight to the independent variables according to their relative contribution to the prediction of Y. X1, X2, X3, … Xn are the predictors (independent variables). C is a constant, similar to Y intercept. Body Fat = Abdominal + Tricep + Thigh

List the variables and order to enter into the equation X2 has biggest area (C), it comes in first. X1 comes in next area (A) is bigger than area (E). Both A and E are unique, not common to C. X3 comes in next, it uniquely adds area (E). X4 is not related to Y so it is NOT in the equation.

Ideal Relationship Between Predictors and Y Each variable accounts for unique variance in Y Very little overlap of the predictors Order to enter? X1, X3, X4, X2, X5

Regression Methods Enter: forces all predictors (independent variables) into the equation, in one step. Forward: Each step adds a new predictor. Predictors enter based upon the unique variance in Y they explain. Backward: Starts with full equation (all predictors) and removes them one at a time on each step, beginning with the predictor that adds the least. Stepwise: Each step adds a new predictor. One any step a predictor can be added and another removed if it has high partial correlations with the newly added predictor.

Regression Methods in SPSS Choose desired Regression Method.

Regression Assumptions Homoscedaticity: equal variance of X at any Y value. The residuals are normally distributed around the line of best fit. X and Y are linearly related

Tests for Normality Use SPSS Descriptives Explore Set 1 Set 2 Set 3 11 123 2 5 25 144 29 14 155 4 24 17 7 125 1 31 10 147 9 37 182 35 22 166 6 122 8 27 165 143 30 156 28 19 154 149 26 Tests for Normality Use SPSS Descriptives Explore

Tests for Normality

Tests for Normality

Tests for Normality

Tests for Normality Not less than 0.05 so the data are normal.

Tests for Normality: Normal Probability Plot or Q-Q Plot If the data are normal the points cluster around a straight line

Tests for Normality: Boxplots Bar is the median, box extends from 25 – 75th percentile, whiskers extend to largest and smallest values within 1.5 box lengths Outliers are labeled with O, Extreme values are labeled with a star

Tests for Normality: Normal Probability Plot or Q-Q Plot

Cntry15.Sav Example of Regression Assumptions

Cntry15.Sav Example of Regression Assumptions

Cntry15.Sav – Regression Statistics Settings

Cntry15.Sav – Regression Plot Settings

Cntry15.Sav – Regression Save Settings

Cntry15.Sav Example of Regression Assumptions Standardized Residual Stem-and-Leaf Plot Frequency Stem & Leaf 3.00 -1 . 019 4.00 -0 . 0148 7.00 0 . 0466669 1.00 1 . 7 Stem width: 1.00000 Each leaf: 1 case(s)

Cntry15.Sav Example of Regression Assumptions Distribution is normal. Two scores are somewhat outside

Cntry15.Sav Example of Regression Assumptions No Outliers [labeled O] No Extreme scores [labeled with a star]

Cntry15.Sav Example of Regression Assumptions The points should fall randomly in a band around 0, if the distribution is normal. In this distribution there is one extreme score.

Cntry15.Sav Example of Regression Assumptions The data are normal.

Regression Violations

Regression Violations

Regression Violations