Ch14: Linear Correlation and Regression 13 Mar 2012 Dr. Sean Ho busi275.seanho.com Please download: 09-Regression.xls 09-Regression.xls HW6 this week Projects.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.
Linear regression models
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 12 Multiple Regression
The Simple Regression Model
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter Topics Types of Regression Models
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Correlation Coefficients Pearson’s Product Moment Correlation Coefficient  interval or ratio data only What about ordinal data?
Chapter 7 Forecasting with Simple Regression
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 15 Basics of Regression Analysis
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Simple Linear Regression Models
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Ch15: Multiple Regression 12.1: One-way ANOVA 20 Mar 2012 Dr. Sean Ho busi275.seanho.com Please download: 10-MultRegr.xls 10-MultRegr.xls HW7 this week.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Ch10: T-tests 6 Mar 2012 Dr. Sean Ho busi275.seanho.com Please download: 08-TTests.xls 08-TTests.xls HW5 this week Projects.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Intro to Time Series and Semester Review 3 Apr 2012 Dr. Sean Ho busi275.seanho.com Please download: 12-TheFed.xls 12-TheFed.xls Presentations next week!
Ch12-13: Factorial ANOVA, Blocking ANOVA, and χ 2 Test 27 Mar 2012 Dr. Sean Ho busi275.seanho.com Please download: 11-ANOVA.xls 11-ANOVA.xls HW8 this week.
: Pairwise t-Test and t-Test on Proportions 25 Oct 2011 BUSI275 Dr. Sean Ho HW6 due Thu Please download: Mileage.xls Mileage.xls.
Ch15: Multiple Regression 3 Nov 2011 BUSI275 Dr. Sean Ho HW7 due Tues Please download: 17-Hawlins.xls 17-Hawlins.xls.
Ch14: Correlation and Linear Regression 27 Oct 2011 BUSI275 Dr. Sean Ho HW6 due today Please download: 15-Trucks.xls 15-Trucks.xls.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
: Hypothesis Tests on Regression 1 Nov 2011 BUSI275 Dr. Sean Ho HW7 due next Tues Please download: 16-Banks.xls 16-Banks.xls.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 12: Correlation and Linear Regression 1.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 13 Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Relationship with one independent variable
Prepared by Lee Revere and John Large
Inferential Statistics and Probability a Holistic Approach
Simple Linear Regression
Relationship with one independent variable
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Ch14: Linear Correlation and Regression 13 Mar 2012 Dr. Sean Ho busi275.seanho.com Please download: 09-Regression.xls 09-Regression.xls HW6 this week Projects

13 Mar 2012 BUSI275: regression2 Outline for today Correlation r 2 as fraction of variability t-test on correlation Prediction using Simple Linear Regression Linear regression model Regression in Excel Analysis of variance: Model vs. Residual The global F-test T-test on slope b 1 Confidence intervals on predicted values

13 Mar 2012 BUSI275: regression3 Linear correlation Correlation measures the strength of a linear relationship between two variables Does not determine direction of causation Does not imply a direct relationship There might be a mediating variable (e.g., between ice cream and drownings) Does not account for non-linear relationships The Pearson product-moment correlation coefficient (r) is between -1 and 1 Close to -1: inverse relationship Close to 0: no linear relationship Close to +1: positive relationship

13 Mar 2012 BUSI275: regression4 Correlation on scatterplots y x r = -1 y x r = -0.6 y x r = 0 r = -0.2 y x r = +0.3 y x

13 Mar 2012 BUSI275: regression5 Correlation is an effect size We often want to understand the variance in our outcome variable: e.g., sales: why are they high or low? What fraction of the variance in one variable is explained by a linear relationship w/the other? e.g., 50% of the variability in sales is explained by the size of advertising budget The effect size is r 2 : a fraction from 0% to 100% Also called the coefficient of determination SalesAdvert (shared variability)

13 Mar 2012 BUSI275: regression6 Correlation: t-test r is sample correlation (from data) ρ is population correlation (want to estimate) Hypothesis: H A : ρ ≠ 0 (is there a relationship?) Standard error: 1 – r 2 is the variability not explained by the linear relationship df = n–2 because we have two sample means Test statistic: t = r / SE Use TDIST() to get p-value

13 Mar 2012 BUSI275: regression7 Correlation: example e.g., is there a linear relationship between caffeine intake and time spent in Angry Birds? H A : ρ ≠ 0 (i.e., there is a relationship) Data: 8 participants, r = 0.72 Effect size: r 2 = = 51.84% About half of variability in AB time is explained by caffeine intake Standard error: SE = √(( ) / 6) ≈ Test statistic: t = 0.72 / ≈ 2.54 P-value: TDIST(2.54, 6, 2) → 4.41% At α=0.05, there is a significant relationship

13 Mar 2012 BUSI275: regression8 Correlation: Excel Example: “Trucks” in 09-Regression.xls09-Regression.xls Scatterplot: POE Gross (G:G), WIM Gross (H:H) Correlation: CORREL(dataX, dataY) Coefficient of determination: r 2 T-test: Sample r → SE → t-score → p-value

13 Mar 2012 BUSI275: regression9 Correl. and χ 2 independence Pearson correlation is for two quantitative (continuous) variables For ordinal variables, there exists a non-parametric version by Spearman (r s ) What about for two categorical variables? χ 2 test of goodness-of-fit (ch13) 2-way contingency tables (pivot tables) Essentially a hypothesis test on independence

13 Mar 2012 BUSI275: regression10 Outline for today Correlation r 2 as fraction of variability t-test on correlation Prediction using Simple Linear Regression Linear regression model Regression in Excel Analysis of variance: Model vs. Residual The global F-test T-test on slope b 1 Confidence intervals on predicted values

13 Mar 2012 BUSI275: regression11 Regression: the concept Regression is about using one or more IVs to predict values in the DV (outcome var) E.g., if we increase advertising budget, will our sales increase? The model describes how to predict the DV Input: values for the IV(s). Output: DV value Linear regression uses linear functions (lines, planes, etc.) for the models e.g., Sales = 0.5*AdvBudget Every $1k increase in advertising budget yields 500 additional sales, and With $0 spending, we'll still sell 2000 units

13 Mar 2012 BUSI275: regression12 Regression: the model The linear model has the form Y = β 0 + β 1 X + ε X is the predictor, Y is the outcome, β 0 (intercept) and β 1 (slope) describe the line of best fit (trend line), and ε represents the residuals: where the trend line doesn't fit the observed data  ε = (actual Y) – (predicted Y) The residuals average out to 0, and if the model fits the data well, they should be small overall Least-squares fit: minimize SD of residuals y x ε trend line

13 Mar 2012 BUSI275: regression13 Regression: Trucks example Trucks example Scatterplot: X: POE Gross (G:G) Y: WIM Gross (H:H) Layout → Trendline Linear, R 2 Regression model: Slope β 1 : SLOPE(dataY, dataX) Intercept β 0 : INTERCEPT(dataY, dataX) SD of the residuals: STEYX(dataY, dataX)

13 Mar 2012 BUSI275: regression14 Regression: assumptions Both IV and DV must be quantitative (extensions exist for other levels of meas.) Independent observations Not repeated-measures or hierarchical Normality of residuals DV need not be normal, but residuals do Homoscedasticity SD of residuals constant along the line These 4 are called: parametricity T-test had similar assumptions Omid Rouhani

13 Mar 2012 BUSI275: regression15 Regression: Banks example “Banks” in 09-Regression.xls09-Regression.xls Scatterplot: X: Employees (D:D) Y: Profit (C:C) Layout → Trendline Correlation r: CORREL(datY, datX) Regression model: Intercept b 0 : INTERCEPT(dataY, dataX) Slope b 1 : SLOPE(dataY, dataX) SD of residuals (s ε ): STEYX(dataY, dataX)

13 Mar 2012 BUSI275: regression16 Using regression for prediction Assuming that our linear model is correct, we can then predict profits for new companies, given their size (number of employees) Profit ($mil) = 0.039*Employees – e.g., for a company with 1000 employees, our model predicts a profit of $2.558 million This is a point estimate; s ε adds uncertainty Predicted Ŷ values: using X values from data Citicorp: Ŷ = 0.039*93700 – ≈ 3618 Residuals: (actual Y) – (predicted Y): Y - Ŷ = 3591 – 3618 = ($mil) Overestimated Citicorp's profit by $27.73 mil

13 Mar 2012 BUSI275: regression17 Outline for today Correlation r 2 as fraction of variability t-test on correlation Prediction using Simple Linear Regression Linear regression model Regression in Excel Analysis of variance: Model vs. Residual The global F-test T-test on slope b 1 Confidence intervals on predicted values

13 Mar 2012 BUSI275: regression18 Analysis of variance In regression, R 2 indicates the fraction of variability in the DV explained by the model If only 1 IV, then R 2 = r 2 from correlation Total variability in DV: SS tot = Σ(y i – y) 2 =VAR(dataY) * (COUNT(dataY) – 1) Explained by model: SS mod = SS tot * R 2 Unexplained (residual): SS res = SS tot – SS mod Can also get from Σ(y i – ŷ i ) 2 Hence the total variability is decomposed into: SS tot = SS mod + SS res (book: SST = SSR + SSE)

13 Mar 2012 BUSI275: regression19 Regression: global F-test Follow the pattern from the regular SD: Total (on DV)ModelResidual SSSS tot = Σ(y - y) 2 SS mod = Σ(ŷ - y) 2 SS res = Σ(y - ŷ) 2 dfn – 1#vars – 1n - #vars MS = SS/dfSS tot / (n-1)SS mod / 1SS res / (n–2) SD = √(MS)sYsY -s ε (=STEYX) The test statistic is F = MS mod / MS res Get p-value from FDIST(F, df mod, df res )

13 Mar 2012 BUSI275: regression20 Calculating the F test Key components are the SS mod and SS res If we already have R 2, the easiest way is: Find SS tot = VAR(dataY) * (n-1)  e.g., Banks: (≈ 39e6) Find SS mod = SS tot * R 2  e.g., 39e6 * 88.53% ≈ 34e6 Find SS res = SS tot – SS mod  e.g., 39e6 – 34e6 ≈ 5e6 Otherwise, find SS res using pred ŷ and residuals Or, work backwards from s ε = STEYX(Y, X)  e.g., SS res = (301) 2 * (n-2)

13 Mar 2012 BUSI275: regression21 F-test on R 2 vs. t-test on r If only one predictor, the tests are equivalent: F = t 2,  e.g., Banks: F ≈ 378, t ≈ 19.4 F-dist with df mod = 1 is same as t-dist  Using same df res If multiple IVs, then there are multiple r's Correlation only works on pairs of variables F-test is for the overall model with all predictors R 2 indicates fraction of variability in DV explained by the complete model, including all predictors

13 Mar 2012 BUSI275: regression22 Outline for today Correlation r 2 as fraction of variability t-test on correlation Prediction using Simple Linear Regression Linear regression model Regression in Excel Analysis of variance: Model vs. Residual The global F-test T-test on slope b 1 Confidence intervals on predicted values

13 Mar 2012 BUSI275: regression23 T-test on slopes b i In a model with multiple predictors, there will be multiple slopes (b 1, b 2, …) A t-test can be run on each b i to test if that predictor is significantly correlated with the DV Let SS X = Σ(x – x) 2 be for the predictor X: Then the standard error for its slope b 1 is SE(b 1 ) = s ε / √SS X Obtain t-score and apply a t-dist with df res : =TDIST( b 1 / SE(b 1 ), df res, tails ) If only 1 IV, the t-score is same as for r

13 Mar 2012 BUSI275: regression24 Summary of hypothesis tests Regression with only 1 IV is same as correlation All tests would then be equivalent CorrelationRegressionSlope on X 1 Effect size rR2R2 b1b1 SE√( (1-r 2 ) / df )-s ε / √SS X dfn - 1 df1 = #var – 1 df2 = n - #var n - #var Test statistict = r / SE(r) F = MS mod / MS res t = b 1 / SE(b 1 )

13 Mar 2012 BUSI275: regression25 Confidence int. on predictions Given a value x for the IV, our model predicts a point estimate ŷ for the (single) outcome: ŷ = b 0 + b 1 *x The standard error for this estimate is Recall that SS X = Σ(x – x) 2 Confidence interval: ŷ ± t * SE(ŷ) When estimating the average outcome, use

13 Mar 2012 BUSI275: regression26 TODO HW6 due Thu Projects: be pro-active and self-led If waiting on REB approval: generate fake (reasonable) data and move forward on analysis, presentation Remember your potential clients: what questions would they like answered? Tell a story/narrative in your presentation