Chapter 12 Correlation & Regression

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Correlation and Linear Regression
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Bivariate Regression (Part 1) Chapter1212 Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology Ordinary Least Squares Formulas.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple linear regression Tron Anders Moger
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Slides by JOHN LOUCKS St. Edward’s University.
St. Edward’s University
Presentation transcript:

Chapter 12 Correlation & Regression Examine the relationship among two or more random variables Visual Display Numerical Analysis Correlation Analysis Regression Analysis 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Visual Display How to display the relationship between two variables? E.g. the relationship between a car’s mileage and a car’s value Scatter Plot! Exercise: create a scatter plot from the data file 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Typical Scatter Plots Positive Relation Negative Relation No Correlation Non-linear Relation 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Numerical Measure for the relation Numerical measures: to formally capture the relationship to be able to conduct higher level analysis Commonly Used Measurements: Covariance Could be any real number: positive, negative, or 0 Captures the co-movement of the two variables The sign indicates the direction of the trend line. Correlation A standardized measurement derived from the covariance The value will be from -1 to 1, Measures the degree of linearity 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Correlation Coefficient Formula: Use excel to compute the correlation: use excel function: =correl() use data analysis tool  correlation 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Correlation estimation and typical Scatter Plots 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Values of correlation If the scatter plot is exactly a line upwards, correlation is +1 downwards, correlation is -1 Correlation between the exactly same random variables are +1 If the value of x has no impact on y, then correlation is 0. Example: payoff of the first round flip coin game and payoff of the second round flip coin game. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Test the population correlation Population correlation coefficient:  Sample correlation coefficient: r Determine whether  ≥ 0,  ≤ 0, or  = 0 based on the sample coefficient r. Theorem The t-value for r is This t-value follows a student’s t-distribution with a degree of freedom n-2 When r > 0, the t value is positive When r < 0, the t value is negative When r = 0, the t value is 0 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Hypothesis test Take the example of problem 12.6 (p478) Write down the hypotheses pair: H0 :  ≥ 0 HA:  < 0 Write down the decision rule: If t < t, reject the hypothesis H0, If t ≥ t, do not reject the hypothesis H0. Make decision: compute r, then the t value of r find out t using the t table. compare t and t to make the decision. Reject when the t value of sample r is too low 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Exercise Problem 12.7 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Practice on correlation model Type 1: start with a conjecture e.g. there is a negative correlation between the amount of money a person spend on grocery shopping and the amount of money on dinning out. Justification: because a person tend to do less grocery shopping when he/she eats in the restaurant more. Collect data and conduct the test to verify the conjecture. Type 2: start without a clear conjecture Based on the available data, find out for any pair of things, whether there is a strong correlation If there is one, => “warning” Observe and study why. You may find out surprising answer: Data Mining 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Comments on correlation analysis It can only identify the comovement. It cannot indicate the causality Sometimes, there is a third variable (factor) to explain the comovement. Correlation analysis cannot help you find out the underlying factor Sometimes, there are multiple factors affecting the comovement. The interaction among factors makes the comovement unpredictable. We need higher level analysis to get a better understanding. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Simple Regression Analysis Also called “Bivariate Regression” It analyzes the relationship between two variables It is regarded as a higher lever of analysis than correlation analysis It specifies one dependent variable (the response) and one independent variable (the predictor, the cause). It assumes a linear relationship between the dependent and independent variable. The output of the analysis is a linear regression model, which is generally used to predict the dependent variable. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis The regression Model yi = 0 + 1 * xi + i The model assumes a linear relationship Two variables: x – independent variable (the reason) y – dependent variable (the result) For example, x can represent the number of customers dinning in a restaurant y can represent the amount of tips collected by the waiter Parameters: 0: the intercept – represents the expected value of y when x=0. 1: the slope (also called the coefficient of x) – represents the expected increment of y when x increases by 1 : the error term – the uncontrolled part 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Graphical explanation of the parameters Assume this is a scatter plot of the population  1 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Building the model The regression model is used to predict the value of y explain the impact of x on y Scenarios, x is easily observable, but y is not; or x is easily controllable, but y is not; or x will affect y, but y cannot affect x. The causality should be carefully justified before building up the model When assigning x and y, make sure which is the reason and which is the result. – otherwise, the model is wrong! Example: Information System research: “Ease of use” vs. “The Usefulness” There may always be a second thought on the causality. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Example Build up the regression models At State University, a study was done to establish whether a relationship existed between a student’s GPA when graduating and SAT score when entering the university. The Skeleton Manufacturing Company recently did a study of its customers. A random sample of 50 customer accounts was pulled from the computer records. Two variables were observed: The total dollar volume of business this year Miles away the customer is from corporate headquarters 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Estimate the coefficient Regression Model Given 0=2 and 1=3, If knowing x=4, we can expect y. How to know 0=2 and 1=3? To know 0 and 1, we need to have the population data for all x and y. Normally, we only have a sample. The trend line determined by a sample is an estimation of the population trend line.  The Fitted Model yi = 0 + 1 * xi + i b0 and b1 are estimations of 0 and 1, they are sample statistics The hat indicates a predicted value 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Estimate the coefficients Based on the sample collected Run “simple regression analysis” to find the “best fitted line”. The intercept of the line: b0 The slope of the line: b1 They are estimates of 0 and 1 We can use b0 and b1 to predict y when we know x The prediction model 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

How to determine the trend line? The trend line is also called the “best fitted line” How to define the “best fitted line”? There could be a lot of criteria. The most commonly used one: The “Ordinary Least Squares” Regression (OLS) To find the line with the least aggregate squared residual Residual: for each sample data point i, the y value (yi) is not likely to be exactly the predicted value ( ), the residue: 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Solution for OLS regression The objective function: Find the best b0 and b1, which minimize the sum of squared residuals Solution: Use Excel: Add a trend line Run a regression analysis (Data Analysis too kit) 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Exercise Open “Midwest.xls” Create a scatter plot Add a trend line. Provide your estimation of y when x = 10 x = 0 x = 4 Residue: ei, for each sample data point. In regression analysis, we assume that the residues are normally distributed, with mean 0 The smaller the variance of residue, the stronger the linear relationship. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Add a trend line Step 1: Use your scatter plot, right click one data point, choose the option to “add trend line” Step 2: choose “option tag”, check “Display equation on chart”  “OK” y= 175.8 + 49.91*x 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

The “Fitness” Sometimes, it is just not a good idea to use a line to represent the relationship: Just see how well the sample data form a line -- how well the model predicts Not good ! kinda good better 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

The measurement for the fitness The Sum of Squared Errors (SSE) The smaller the SSE, the better the fit. In the extreme case, if every point lies on the line, there is no residual at all, SSE=0 (Every prediction is accurate) SSE also increase when the sample size gets larger (more terms to sum up) -- however, this doesn’t indicate a worse fitness. Other associated terms: SST – total sum of squares: Total variation of y SSR – sum of squares Regression Total variation of y explained by the model It can be computed that SST, SSR, and SSE has the following relationship: 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis A standardized measure of fitness: Interpretation: The proportion of the total variation in the dependent variable (y) that is explained by the regression model In other words, the proportion that is not explained by the residuals. The larger the R2, the better the fitness In the Simple Linear Regression Model, R2=r2. Compute the correlation and verify. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Read the regression report Step 1: check the fitness whether the model is correct Step 2: what are the coefficients, whether the slope of x is too small? Interval Estimation of 0 and 1: (conf level: 95%) 0: 53.3~298.2529 1: 26.5~73.31 Regression Statistics Multiple R 0.832534056 R Square 0.693112955 Adjusted R Square 0.662424251 Standard Error 92.10553441 Observations 12 Better greater than 0.3, The greater the better.   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 175.8288191 54.98988674 3.197476 0.009532 53.30372 298.3539 Years with Midwest 49.91007584 10.50208428 4.752397 0.000777 26.50997 73.31018 p-value of 0 =0 y= 175.8 + 49.91*x p-value of 1 =0 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

Confidence Interval Estimation Input the required confidence level   Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0% Intercept 32.64209 2.60924 12.51019 1.56E-06 26.62517 38.659 27.7900 37.494 X Variable 1 -0.64049 0.126544 -5.06142 0.000975 -0.9323 -0.34868 -0.8758 -0.4051 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Hypothesis Test People are normally interested in whether 1 is 0 or not. In other words, whether x has an impact on y. Based on the report from excel, it is very convenient to conduct such a test. Simply compare whether the p value of the coefficient is smaller than  or not. Hypothesis: H0: 1 =0 HA: 1 0 Decision rules: If p < , reject the null hypothesis, If p  , do not reject the null hypothesis. Compare p and , make the decision. 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

When you don’t have a good fit If the fitness is not good, that is, the correlation between x and y is not strong enough. It is always a good idea to check the scatter plot first. Cases Case A. Maybe there are outliers (explain the outlier) 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Not a good fit? Case 2: Check the variation of x. In order to have a good prediction model, the independent variable should cover a certain range. Collect more data while guarantee the variations of x. Case 3: Inherently non-linear relationship Non-linear regression (not required) Segment regression Separate your data into groups and run regression separately. X Y 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis

BUS304 – Chapter 12-13 Multivariate Analysis Exercise Problem 12.14 (Page 498) Problem 12.15 Problem 12.19 04/05/06 BUS304 – Chapter 12-13 Multivariate Analysis