BA 555 Practical Business Analysis

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
BA 275 Quantitative Business Methods
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 13 Multiple Regression
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
1 BA 275 Quantitative Business Methods Simple Linear Regression Introduction Case Study: Housing Prices Agenda.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Regression Diagnostics - I
Chapter 11 Multiple Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
1 BA 555 Practical Business Analysis Midterm Examination #1 Conjoint Analysis Linear Programming (LP) Introduction LINDO and Excel-Solver Agenda.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
BPS - 5th Ed. Chapter 231 Inference for Regression.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
BA 275 Quantitative Business Methods
Presentation transcript:

BA 555 Practical Business Analysis Agenda Linear Regression Analysis Case Study: Cost of Manufacturing Computers Multiple Regression Analysis Dummy Variables

Regression Analysis A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X1, X2, … Xk). The model allows us to understand (quantify) the effect of each X on Y. It also allows us to predict Y based on X1, X2, …. Xk.

Types of Relationship Linear Relationship Nonlinear Relationship Simple Linear Relationship Y = b0 + b1 X + e Multiple Linear Relationship Y = b0 + b1 X1 + b2 X2 + … + bk Xk + e Nonlinear Relationship Y = a0 exp(b1X+e) Y = b0 + b1 X1 + b2 X12 + e … etc. Will focus only on linear relationship.

Simple Linear Regression Model population True effect of X on Y Estimated effect of X on Y sample Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y? ASSOCIATION ≠ CAUSALITY

Least Squares Method Least squares line: It is a statistical procedure for finding the “best-fitting” straight line. It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Bad fit. Deviations are minimized.

Case: Cost of Manufacturing Computers (pp.13 – 45) A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. The following production variables were recorded: COST: the total weekly production cost (in $millions) UNITS: the total number of units (in 000s) produced during the week. LABOR: the total weekly direct labor cost (in $10K). SWITCH: the total number of times that the production process was re-configured for different types of computers FACTA: = 1 if the observation is from factory A; = 0 if from factory B.

Raw Data (p. 14) How many possible regression models can we build?

Simple Linear Regression Model (pp. 17 – 26) Research Questions: Is Labor a significant cost driver? How accurate can Labor predict Cost?

Initial Analysis (pp. 15 – 16) Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations Things to look for Features of Data (e.g., data range, outliers) do not want to extrapolate outside data range because the relationship is unknown (or un-established). Summary statistics and graphs. Is the assumption of linearity appropriate? Inter-dependence among variables? Any potential problem? Scatter plots and correlations.

Correlation (p. 15) Is the assumption of linearity appropriate? r (rho): Population correlation (its value most likely is unknown.) r: Sample correlation (its value can be calculated from the sample.) Correlation is a measure of the strength of linear relationship. Correlation falls between –1 and 1. No linear relationship if correlation is close to 0. But, …. r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1

Scatterplot (p. 16) and Correlation (p Scatterplot (p.16) and Correlation (p. 15) Checking the linearity assumption Sample size P-value for H0: r = 0 Ha: r ≠ 0 Is 0.9297 a r or r?

Hypothesis Testing for b (pp Hypothesis Testing for b (pp.18 – 19 ) Key Q1: Does X have any effect on Y? b0 or b0? b0 b1 or b1? b1 Sb0 H0: b1 = 0 Ha: b1 ≠ 0 Sb1 ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step. Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs.

Confidence Interval Estimation for b (pp Confidence Interval Estimation for b (pp. 19 – 20) Key Q2: How large is the effect? Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation b0 b1 Sb1 Sb0 Degrees of freedom = n – k – 1 k = # of independent variables

Prediction (pp. 25 – 26) Key Q3: What is the Y-prediction? What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? Point estimate: predicted cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Prediction Interval What is the average production cost of a typical week that Labor = 5? Point estimate: estimated cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Confidence Interval

Prediction vs. Confidence Intervals (pp. 25 – 26) ☻ ☻ ☻ ☻ ☻ ☻ ☺ ☺ ☺ ☺ ☺ ☺ Variation (margin of error) on both ends seems larger. Implication?

Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.

Sum of Squares (p.22) SSE = remaining variation that can not be explained by the model. Syy = Total variation in Y SSR = Syy – SSE = variation in Y that has been explained by the model.

Fit Statistics (pp. 23 – 24) 0.45199 x 0.45199 = 0.204295

Another Simple Regression Model: Cost = b0 + b1 Units + e (p. 27) A better model? Why?

Multiple Regression Model Cost = b0 + b1 Units + b2 Labor + e (p. 29) Test of Global Fit (p. 29) Marginal effect (p. 30) Adjusted R-sq (p. 30)

R-sq vs. Adjusted R-sq Independent variables R-sq Adjusted R-sq Labor 20.43% 18.84% Units 86.44% 86.17% Switch 0.05% -1.95% Labor, Units 86.51% 85.96% Units, Switch 88.20% 87.72% Labor, Switch 21.32% 18.11% Labor, Units, Switch 88.21% 87.48% Remember! There are still many more models to try.

Test of Global Fit (p.29) Variation explained by the model that consists of 2 Xs. Variation explained, on the average, by each independent variable. If F-ratio is large → H0 or Ha? If F-ratio is small → H0 or Ha? (please read pp. 39–41, 47 for finding the cutoff.) H0: the model is useless. Ha: the model is not completely useless.

Residual Analysis (pp.33 – 34) The three conditions required for the validity of the regression analysis are: the error variable is normally distributed with mean = 0. the error variance is constant for all values of x. the errors are independent of each other. How can we identify any violation?

Residual Analysis (pp. 33 – 34) We do not have e (random error), but we can calculate residuals from the sample. Residual = actual Y – estimated Y Examining the residuals (or standardized residuals), help detect violations of the required conditions.

Residuals, Standardized Residuals, and Studentized Residuals (p.33)

The random error e is normally distributed with mean = 0 (p.34)

The error variance se is constant for all values of X and estimated Y (p.34) Constant spread !

The spread increases with y Constant Variance When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y, actual y, and each independent variable X. Residual + + + + + + + + + + + + + ^ + + + y + + + + + + + + The spread increases with y ^

The errors are independent of each other (p.34) Do NOT want to see any pattern.

Non Independence of Error Variables Residual Residual + + + + + + + + + + + + + + + Time Time + + + + + + + + + + + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero.

Residual Plots with FACTA (p.34) Which factory is more efficient?

Dummy/Indicator Variables (p.36) Qualitative variables are handled in a regression analysis by the use of 0-1 variables. This kind of qualitative variables are also referred to as “dummy” variables. They indicate which category the corresponding observation belongs to. Use k–1 dummy variable for a qualitative variable with k categories. Gender = “M” or “F” → Needs one dummy variable. Training Level = “A”, “B”, or “C” → Needs 2 dummy variables.

Dummy Variables (pp. 36 – 38) A Parallel Lines Model: Cost = b0 + b1 Units + b2 FactA + e Least squares line: Estimated Cost = 0.86 + 0.27 Units – 0.0068 FactA Two lines? Base level?

Dummy Variables (pp. 36 – 38) An Interaction Model : Cost = b0 + b1 Units + b2 FactA + b3 Units_FactA + e Least squares line: Estimated Cost = 0.87 + 0.26 Units – 0.023 FactA + 0.016 Units_FactA

Models that I have tried (p. 41)

Statgraphics Prediction/Confidence Intervals for Y Simple Regression Analysis Relate / Simple Regression X = Independent variable, Y = dependent variable For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. Multiple Regression Analysis Relate / Multiple Regression For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports. Saving intermediate results (e.g., studentized residuals). Click the icon and check the results to save. Removing outliers. Highlight the point to remove on the plot and click the Exclude icon .

Regression Analysis Summary (pp. 43 – 44)