Correlations and simple regression analysis

Slides:

Advertisements

Similar presentations

Managerial Economics in a Global Economy

Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.

Forecasting Using the Simple Linear Regression Model and Correlation

Econ 140 Lecture 81 Classical Regression II Lecture 8.

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester

Correlation and regression

Chapter 14 Introduction to Linear Regression and Correlation Analysis

Chapter 10 Simple Regression.

Chapter 12 Simple Regression

Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D

Chapter 13 Introduction to Linear Regression and Correlation Analysis

The Simple Regression Model

SIMPLE LINEAR REGRESSION

Chapter Topics Types of Regression Models

Linear Regression and Correlation Analysis

Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.

SIMPLE LINEAR REGRESSION

Chapter 14 Introduction to Linear Regression and Correlation Analysis

So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.

Relationships Among Variables

Correlation and Linear Regression

Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.

Lecture 15 Basics of Regression Analysis

Active Learning Lecture Slides

SIMPLE LINEAR REGRESSION

Introduction to Linear Regression and Correlation Analysis

Chapter 11 Simple Regression

Linear Regression and Correlation

Correlation and Linear Regression

OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.

Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.

Introduction to Linear Regression

Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.

Correlation & Regression

Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Lecture 10: Correlation and Regression Model.

Correlation & Regression Analysis

Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry

Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.

Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/

Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Correlation and Linear Regression

Regression and Correlation

Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.

10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.

Correlation and Simple Linear Regression

Chapter 11 Simple Regression

Understanding Standards Event Higher Statistics Award

Elementary Statistics

Chapter 13 Simple Linear Regression

Simple Linear Regression

Lecture Slides Elementary Statistics Thirteenth Edition

Correlation and Regression

CHAPTER 29: Multiple Regression*

BNAD 276: Statistical Inference in Management Spring 2016

Statistical Inference about Regression

Product moment correlation

SIMPLE LINEAR REGRESSION

Chapter Thirteen McGraw-Hill/Irwin

Correlation and Simple Linear Regression

Presentation transcript:

Correlations and simple regression analysis Data analysis and information management EUZC405 M.Bazarov m.bazarov@wiut.uz

Today’s Agenda Measuring association between the variables (covariance and coefficient of correlation) Simple regression analysis Summary in Excel

Learning Objectives After completion of this lecture you will be able to: Define and calculate correlation coefficient; Find the regression line and use it for regression analysis; Define and calculate coefficient of determination (R-squared); Understand and interpret regression output from Excel

Measuring association between the variables Use of term correlation implies: That there are two or more entities under consideration. That there is some common link which makes them related to a greater or lesser degree. Consider: CA1 assessment scores and final exam results. Height and Weight. Price of goods and wages paid to the producers.

Measuring association between the variables Consider example: Tim Newton is the sales manager of a firm which manufactures meat products and a sell a big part of them directly to retail food stores via a large force of sales representatives. Recently, as the recession has begun to affect the business, Mr. Newton has become aware of the need to monitor representatives’ performance more closely, but the trouble is that he does not have very much idea what factors may influence that performance.

Measuring association between the variables Rep. no. Value of last quarter’s sales ($000s) Number of retail outlets visited regularly Area covered (square miles) 1 2 3 4 5 6 7 8 9 10 25 29 31 42 44 45 47 57 50 12 17 21 26 34 30 38 61 450 500 350 250 150 420 275 200 400 300

Measuring association between the variables

Measuring association between the variables What can we say about this relationship? Outlier Outlier!

Measuring association between the variables In general, one could observe that when number of outlets visited (or variable X) is above its mean then sales (or variable Y) also above its mean. Mean X Mean Y

Measuring association between the variables The covariance measures linear dependence between two variables. Covariance (x,y)= Cov>0 indicates that two variables move in the same direction (when x is above the mean so does the y) Cov<0 indicates that two variables move in opposite direction (when x is above the mean the y is opposite)

Measuring association between the variables To standardize the covariance we need to divide it by the product of two separate standard deviations. R or r = Where R or r is also known as Pearson’s product moment correlation coefficient Cov (x,y)=

The sales data revisited Rep No Value of last quarter's sales (y) Number of retail outlets visited regularly (x) y^2 x^2 xy 2 25 12 625 144 300 3 29 17 841 289 493 4 31 21 961 441 651 5 26 676 806 6 42 34 1764 1156 1428 7 44 30 1936 900 1320 8 45 38 2025 1444 1710 9 47 2209 2115 10 57 61 3249 3721 3477 351 284 14571 10796 12300

Finding the coefficient of correlation = 351/9, =284/9 Covariance= = 136

Simple regression analysis Hence, if the relationship between variables exists (as we can see from correlation coefficient) we would be interested in predicting the behaviour of one variable, say y, from behaviour of the other, say x - predictor or independent variable denoted x ; - dependant variable denoted by y.

Simple regression analysis For example, relationship between the sales and number of outlets visited could be well approximated by the line : Sales=a+ b *number of outlets visited (where a is a number of sales when no outlet is visited (x=0) Or y=a+bx

Simple regression analysis The problem is we could draw many possible lines. Which one to choose?

Simple regression analysis Well, try to find a line that minimizes the sum of squared distances between the data and the line (see the graph!) to ensure a better fit!

Simple regression analysis For example, let’s estimate the regression line for our data on sales minimizing the sum of squared differences between data and the line: Sales=a+ b *number of outlets visited Coefficient b of such line could be found using the following formula Coefficient a of such line could be found using the following formula

Simple regression analysis Hence,

Simple regression analysis sales=17.94+0.6673x Wow, we now could predict the sales by looking at number of outlet visited by sales representatives! In our case, if we increase the number of outlets visited by sales representative by one the sales will increase by 0.6673 thousand dollars or 667.3 $.

Simple regression analysis After we derived the regression line you have to ask yourself how well such line actually fits the data or “Goodness-of-fit” of the regression? Consider example: The average sales are: 351/9=39 Take any one value, say representative #8 Regression predicts: y=17.94+0.6673x= 17.94+0.6673*38=43.29 Rep No Value of last quarter's sales (y) Number of retail outlets visited regularly (x) 2 25 12 3 29 17 4 31 21 5 26 6 42 34 7 44 30 8 45 38 9 47 10 57 61 544

Simple regression analysis Look at the graph: Y=45 s a l e du=45-43.29=1.71 dt=Y-mean=45-39=6 de=43.29-39=4.29 Mean=39 b= 0.6673 a=17.94 # of outlet visited X=38

Simple regression analysis Hence, we could say that on average we generate 39 thousand dollars in sales. When representative #8 visits 38 outlets we use regression to predict the sales number to be 43.29 thousand dollars. Hence, our regression explains proportion of deviation from the mean or de (explained deviation) and du (unexplained deviation) is the proportion of deviation that is left unexplained. The total deviation (dt) is simply sum of both: dt=de+du! Summing such deviation across all observations gives us: As you probably remember from our previous lectures deviations from the mean sum to zero.

Simple regression analysis Hence, we could use the sum of squared deviations to see how well our regression fits the data. And we denote -Total Sum of Squares (TSS) -Regression (Explained) Sum of Squares (ESS) - Residual (Unexplained) Sum of Squares (RSS) The coefficient of determination (R-squared) is R (squared):

Simple regression analysis Now, look at the regression output (from Excel) below:

Simple regression analysis As you have probably noticed, the good thing is we do not need do all these calculations manually, Excel reports it to us! And you can easily identify all components we looked at today: correlation coefficient (Multiple R), R-squared, and regression coefficients (a=17.94 and b=0.66) The only part, we have to explain to finalize our discussion today is to understand what is the t-statistics reported means.

Simple regression analysis As you have probably noticed, the estimated coefficients (a=17.94 and b=0.66) or estimates are obtained from the sample! The t-statistics tests the hypothesis that a population regression coefficient β is 0, that is, Ho: β=0. There is also alternative hypothesis H1: β≠0. So t-statistics shows us how β is significantly different from zero. In our example t-statistics for β is equal to 9.363761.

Simple regression analysis Please note, it is only different from z-statistics we used in our previous example is that we are using SD of sample coefficient in our formula above! Should we reject this H0 or not at 5% level of significance? To decide on this you could either look at p-value or confidence interval reported in Excel regression output.! Using p-values: p = 2*P(t-statistics>t-critical). In other words, p-value less than significance level leads us to reject null hypothesis H0: β=0

Further Reading and Reference Chapter S3 . Swift and S. Piff Quantitative methods for business, management and finance (2001 2-edition), Palgrave Chapter 15&16, Curwin, J. & Slater, R. (2002 5th edition) Quantitative Methods for Business Decisions, Thomson Chapter 3, Burton, G., Carrol, G. & Wall, S. (2002 2nd edition) Quantitative Methods for Business & Economics, Financial Times / Prentice Hall Chapter 11, Bancroft and O’Sullivan (2000) Foundations in Quantitative Business Techniques, Mc-Graw Hill Publishing