CORRELATON & REGRESSION

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and regression
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Correlation and Regression
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Correlation and Regression Analysis
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
SIMPLE LINEAR REGRESSION
Lecture 11 Chapter 6. Correlation and Linear Regression.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
Correlation and Linear Regression
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Linear Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 DSCI 3023 Linear Regression Outline Linear Regression Analysis –Linear trend line –Regression analysis Least squares method –Model Significance Correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression and Correlation
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Simple Linear Regression. Correlation Correlation (  ) measures the strength of the linear relationship between two sets of data (X,Y). The value for.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Chapter 6 & 7 Linear Regression & Correlation
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Introduction to Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Examining Relationships in Quantitative Research
LBSRE1021 Data Interpretation Lecture 11 Correlation and Regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation & Regression Analysis
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Regression
Regression and Correlation
Correlation and Simple Linear Regression
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
Simple Linear Regression and Correlation
Topic 8 Correlation and Regression Analysis
Chapter Thirteen McGraw-Hill/Irwin
Linear Regression and Correlation
Correlation and Simple Linear Regression
Presentation transcript:

CORRELATON & REGRESSION Correlation and regression are concerned with the investigation of relationships between two or more variables.

We consider just two associated variables. We might want to know: If a relationship exists between those variables If so, how strong that relationship is What form that relationship takes Can we make use of that relationship for predictive purposes i.e. forecasting?

General method for investigating the relationship between 2 variables: Correlation is used to find the strength of the relationship Regression describes the relationship itself in the form of an equation which best fits the data General method for investigating the relationship between 2 variables:

For an initial insight into the relationship between two variables: plot a scatter diagram If there appears to be a linear relationship, quantify it: calculate the correlation coefficient This is a measure of the strength of this linear relationship. Its symbol is 'r' and its value lies between -1 and +1

If the relationship is found to be significantly strong: find the equation of the ‘line of best fit’ through the data, using linear regression The 'goodness of fit' statistic can be calculated to see how useful the regression equation is likely to be Once defined by an equation, the relationship can be used for predictive purposes.

The data represents a sample of advertising Example The data represents a sample of advertising expenditures and sales for ten randomly selected months. See slide 12 for complete data. Month Advertising Sales expenditure (£0.000’s) y (£0,000’s) x 1 1.2 101 2 0.8 92 3 1.0 110 etc. Plot a scatter diagram of the data

The graph suggests a linear relationship between Note scales are not started at zero The graph suggests a linear relationship between sales and advertising expenditure. The larger the amount spent on advertising the higher the sales in general.

If there is a relationship, we need to be able to measure the strength of that relationship. i.e. calculate the value of the correlation coefficient

Pearson's Product Moment Correlation Coefficient (r) is a measure of how close a linear relationship there is between x and y. can be produced directly from a calculator in LR (linear regression) mode For the sales and advertising data the correlation coefficient: r = 0.875 The value of r is always between + 1 and -1

r = -1 perfect negative correlation r = 0 no correlation r = +0.8 r = +1 perfect positive correlation

Formula for correlation coefficient, r r = Sxy Sxx Syy where Sxx = Sx2 - Sx Sx n Syy = Sy2 - Sy Sy Sxy = Sx2 - Sx Sy

Longhand calculations for correlation coefficient r. Step 1

Step 2 Sxx = Sx2 - Sx Sx = 9.28 - 9.4 x 9.4 = 0.444 n 10 Therefore: Sxx = Sx2 - Sx Sx = 9.28 - 9.4 x 9.4 = 0.444 n 10 Syy = Sy2 - Sy Sy = 93569 - 959 x 959 = 1600.9 n 10 Sxy = Sxy - Sx Sy = 924.8 - 9.4 x 959 = 23.34 n 10 Step 3 Therefore: r = Sxy = 23.34 = 0.875 Sxx Syy 0.444 x 1600.9

Hypothesis test for the value of r We shall not go into the details here! Null hypothesis (H0): A linear relationship does not exist between sales and advertising Alternative hypothesis(H1): A linear relationship does exist between sales and advertising. If we calculate a test statistic and critical value we discover that test statistic > critical value so we reject H0 Conclude that a linear relationship exists between sales and amount spent on advertising.

The Goodness of Fit Statistic (R2) This also measures of the closeness of the relationship between x and y R2 = 100r2 R2 tells us what percentage of the total variation in y (here sales) is explained by the variation in x (here advertising expenditure)

Interpretation: If r = +1 or –1, then R2 =100% So 100% of the variation in y is explained by the variation in x. If r = 0, then R2 = 0% So none of the variation in y is explained by the variation in x For the data above the goodness of fit statistic R2 = 100 r2 = 100 x 0.8752 = 76.6%

76.6% of the variation in sales is explained by the variation in the amount spent on advertising. The remaining 23.4% of the variation is explained by other factors: e.g. price competitor’s prices etc.

Regression equation Since we know, for the sample data, that there is a significant relationship between the two variables, the next obvious step is to find its equation. We can then add the regression line to the scatter diagram and use it to predict future sales, given advertising expenditure for a particular month. The regression equation can be produced directly from a calculator in LR mode.

The regression line has the equation: y = a + bx x is the independent variable y is the dependent variable a is the intercept on the y-axis b is the gradient or slope of the line.

For the sales and advertising data, the values of a and b are 46.5 and 52.6. So regression equation is: y = 46.5 + 52.6x Sales = 46.5 + 52.6 advertising (a and b can be found using LR mode on your calculator or by calculation)

Formula for a and b This is found by calculating the square of the differences between actual and expected values. We chose a and b so that the total difference is minimizied: b = Sxy a = y - b x Sxx ( x , y ) is called the centroid Where x , y are the means of the x and y data and the S’s are defined as previously.

Calculations for the regression equation. In the regression equation y = a + bx b = Sxy = 23.34 = 52.6 Sxx 0.444 a = y - b x = 95.9 - 52.6 x 0.94 = 46.5 (As y = Sy = 959 and x = Sx = 9.4 = 0.94) n 10 n 10 Therefore the regression equation is y = 46.5 + 52.6x

Plotting the regression equation on the scatter diagram. The line y = a + bx can be plotted on the scatter diagram by plotting three points. The centroid ( x , y ) and any other two points, which satisfy the regression equation. From the data (x, y) = (0.94, 95.9) When x = 0.6, y = 46.5 + (52.6 x 0.6) = 78.06 When x = 1.2, y = 46.5 + (52.6 x 1.2) = 109.6 Plot (0.94,95.9) Plot (0.6, 78.6) Plot (1.3, 109.6)

x x x x

Note regression equation y = a + bx can only be used to calculate an estimate for y given the value of x The linear relationship y = a + bx can only be assumed to exist between y and x for the range of values within the sample

Interpreting the coefficients in the regression equation - first the a value The intercept (a) is the estimate of y when x = 0, but care is needed if using this – why? y = 46.5 + 52.6x Sales = 46.5 + 52.6 advertising When x = 0, y = 46.5 i.e. When nothing is spent on advertising, sales would be expected on average to be 46.5 units = 46.5 x £10,0000 =£ 465,000

If x = 0 y = 46.5, but care is needed here! the b value y = 46.5 + 52.6x If x = 0 y = 46.5, but care is needed here! If x = 0.6 y = 46.5 + (52.6)(0.6) = If x = 0.8 y = 46.5 + (52.6)(0.8) = If x = 1 y = 46.5 + 52.6 = If x = 1.2 y = 46.5 + (52.6)(1. 2) = If x = 2 y = 46.5 + 52.6 x 2 but care is needed here also! etc. So if advertising expenditure is increased by 1 unit, sales will be increased by 52.6 units on average.

For each additional £10,000 spent on advertising, sales will increase by £52.6 x £10,000 = £526,000 on average. But we cannot estimate sales outside the range: E.g. we should not try to estimate sales for x = 5 using this method.