Keller: Stats for Mgmt & Econ, 7th Ed Linear Regression Analysis

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
Chapter 12 Simple Linear Regression
Introduction to Regression Analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Simple Linear Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
SIMPLE LINEAR REGRESSION
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Keller: Stats for Mgmt & Econ, 7th Ed
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Correlation and Linear Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Keller: Stats for Mgmt & Econ, 7th Ed
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Chapter 13 Simple Linear Regression
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
Linear Regression.
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Keller: Stats for Mgmt & Econ, 7th Ed
Chapter 11 Simple Regression
Chapter 13 Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
Prepared by Lee Revere and John Large
Keller: Stats for Mgmt & Econ, 7th Ed
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
St. Edward’s University
Presentation transcript:

Keller: Stats for Mgmt & Econ, 7th Ed Linear Regression Analysis Chapters 1. Introduction 2. Graphs 3. Descriptive statistics 4. Basic probability 5. Discrete distributions 6. Continuous distributions 7. Central limit theorem 8. Estimation 9. Hypothesis testing 10. Two-sample tests 13. Linear regression 14. Multivariate regression 11/10/2018 Chapter 13 Linear Regression Analysis Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Regression Analysis… Our objective is to analyze the relationship between interval variables; Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). Dependent variable: denoted Y Independent variables: denoted X1, X2, …, Xk 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Regression Analysis… If we are interested only in determining whether a relationship exists, we may employ correlation analysis. Regression analysis with two-variables is sometimes called simple linear regression. Mathematical equations describing their relationships are also called models 2 types: deterministic or probabilistic. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Model Types… Deterministic Model: an equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables. Contrast this with… Probabilistic Model: a method used to capture the randomness that is part of a real-life process. E.g. do all houses of the same size (measured in square feet) sell for exactly the same price? 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 A Model… To create a probabilistic model, we start with a deterministic model that approximates the relationship we want to model and add a random term that measures the error of the deterministic component. Deterministic Model: The cost of building a new house is about $75 per square foot and most lots sell for about $25,000. Hence the approximate selling price (y) would be: y = $25,000 + (75$/ft2) * x (where x is the size of the house in square feet) 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 A Model… A model of the relationship between house size (independent variable) and house price (dependent variable) would be: House Price: Y Building a house costs about $75 per square foot. House Price = 25000 + 75 * Size Most lots sell for $25,000 House size: X In this model, the price of the house is completely determined by the size. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 A Model… In real life however, the cost of houses will vary, even among for houses of the same size: Lower vs. Higher Variability House Price: Y 25K$ House Price = 25,000 + 75*Size + x House size: X Same square footage, but different price points (e.g. décor options, cabinet upgrades, lot location…) 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Random Term… We now represent the price of a house as a function of its size in this Probabilistic Model: y = 25,000 + 75x + , where (Greek letter epsilon) is the random term (a.k.a. error variable). It is the difference between the actual selling price and the estimated price based on the size of the house (model based price) . Its value will vary from house sale to house sale, even if the square footage (i.e. x) remains the same. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Simple Linear Regression Model… Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Simple Linear Regression Model… Generally, a straight line model with one independent variable is called a simple linear regression model. independent variable dependent variable y-intercept slope of the line error term 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Simple Linear Regression Model… Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Simple Linear Regression Model… Note that both and are population parameters which are usually unknown and hence estimated from the data. y rise run =slope (=rise/run) =y-intercept x 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Which line has the best “fit” to the data? Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Which line has the best “fit” to the data? ? ? ? 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Estimating the Coefficients… Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Estimating the Coefficients… In much the same way we based estimates of on , we estimate from b0 and from b1, the y-intercept and slope (respectively) of the least squares or regression line given by: (Recall: this is an application of the least squares method and it produces a straight line that minimizes the sum of the squared differences between the points and the line) 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Least Squares Line… these differences are called residuals This line minimizes the sum of the squared differences between the points and the line… How did we get .934 for a y-intercept and 2.114 for slope?? …but where did the line equation come from? 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Mathematical Optimization Routine (Optional) 11/10/2018 Towson University - J. Jung

Towson University - J. Jung 11/10/2018 Towson University - J. Jung

Towson University - J. Jung 11/10/2018 Towson University - J. Jung

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Least Squares Line… The coefficients b1 and b0 for the least squares line… …are calculated as: 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Least Squares Line… Statistics Recall… Data Information Data Points: x y 1 6 2 3 9 4 5 17 12 y = .934 + 2.114x 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Towson University - J. Jung Excel Insert – Scatter Right click on data point – add trend line (linear) Tick on add equation and add R^2 x y 1 6 2 3 9 4 5 17 12 11/10/2018 Towson University - J. Jung

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Least Square Line … Exercise (optional) : Decide which of these equations fits the given data best. MJT = Mean January Temperature LAT = Latitude A: MJT’ = 100 -1.8 (LAT) B: MJT’ = 108 - 2.1 (LAT) C: MJT’ = 115 – 2.5 (LAT) City MJT LAT Boise 22 43.7 Philly 24 40.9 Milwaukee 13 43.3 Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Towson University - J. Jung Model predictions: Model Comparison Lat (x) A (y’) B (y’) C (y’) MJT (y) 43.7 21.34 16.23 5.75 22 40.9 26.38 22.11 12.75 24 43.3 22.06 17.07 6.75 13 MJT-MJT_A MJT-MJT_B MJT-MJT_C 0.66 5.77 16.25 -2.38 1.89 11.25 -9.06 -4.07 6.25 Sum: -10.78 3.59 33.75 (MJT-MJT_A)^2 (MJT-MJT_B)^2 (MJT-MJT_C)^2 0.4356 33.2929 264.0625 5.6644 3.5721 126.5625 82.0836 16.5649 39.0625 Sum^2: 88.1836 53.4299 429.6875 11/10/2018 Towson University - J. Jung

A Simple Linear Regression Model Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 A Simple Linear Regression Model It models the relationship between X and Y as a straight line. We have seen this earlier as a least squares line. Values of X are assumed fixed, i.e. not random. The only randomness in the model comes from ε, the error term. Assume: ε ~ N(0, σ) 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

A Simple Linear Regression Model Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 A Simple Linear Regression Model Population Model: The Estimated Model: Y= β0 + β1X + ε Y’ = b0 + b1X Y is the value of the dependent variable β0 is the population (true) Y-intercept β1 is the population (true) slope coefficient X is any selected value of the independent variable ε is the “error” term Y’ is the predicted value of Y|X b0 is the predicted Y-intercept b1 is the predicted slope of the line 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

How Well Does the Model Fit? Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 How Well Does the Model Fit? We would like Y and Y’ to be as close to one another as possible (or the difference between Y and Y’ to be as small as possible). In Fact, we want ALL the differences-squared ADDED UP to be as small as possible. The sum of all total differences are called the Sum of Squared Errors or SSE. a. (Y - Y’) = e, the estimated error, or residual. b. c. Aggregate measure of how much the predicted Y’s differ from the actual Y’s. d. The bs minimizing SSE construct the Least Square Line. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

How Well Does the Model Fit? Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 How Well Does the Model Fit? SST = SSR + SSE The total variation (Total Sum of Squares: SST) of the dependent variable can be divided into: The variation explained by the model (Sum of Squares for Regression: SSR) and The variation unexplained (Sum of Squared Error: SSE). We measure the variation around the mean, Ybar. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Towson University - J. Jung SST = SSR + SSE y x 11/10/2018 Towson University - J. Jung

Coefficient of Determination (R2) SSE - Sum of Squared Error SSR - Sum of Squares for Regression SST – Total Sum of Squares 11/10/2018 Towson University - J. Jung

Coefficient of Determination (R2) Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Coefficient of Determination (R2) The proportion of the deviation of Y from its mean in the sample that is explained by the regression. It is always between 0 and 1. The closer to 1, the better is the fit. It happens also to be the correlation squared: where is the correlation coefficient It is the most commonly used measure of the “fit” of a regression. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Standard Error of Estimation The sum of squares for error is calculated as: and is used in the calculation of the standard error of estimate: If is zero, all the points fall on the regression line. 11/10/2018 Towson University - J. Jung

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Example A used car dealer recorded the price (in $1,000’s) and odometer reading (also in 1,000s) of 100 three-year old Ford Taurus cars in similar condition with the same options. Can we use data to find a regression line? 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Example COMPUTE Tools > Data Analysis… > Regression Y range (price) X range (odometer) OK Check this if you want a scatter plot of the data… 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Example Lots of good statistics calculated for us, but for now, all we’re interested in is this… 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Example As you might expect with used cars… The slope coefficient, b1, is –0.0669, that is, each additional mile on the odometer decreases the price by $0.0669 or 6.69¢ The intercept, b0, is 17,250. One interpretation would be that when x = 0 (no miles on the car) the selling price is $17,250. However, we have no data for cars with less than 19,100 miles on them so this isn’t a correct assessment. 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Example Selecting “line fit plots” on the Regression dialog box, will produce a scatter plot of the data and the regression line… 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Required Conditions for OLS Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Required Conditions for OLS For these regression methods to be valid the following three conditions for the error variable (𝜀) must be met: The mean of the distribution is 0 so that E 𝜀|𝑋 =0 The standard deviation of 𝜀 is 𝜎 𝜀 is constant regardless of the value of x The value of 𝜀 associated with any particular value of y is independent of 𝜀 associated with any other value of y Regressors in X must all be linearly independent In addition: If the distribution of 𝜀 is normal, the OLS estimates are efficient i.e. the procedure works really well 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Coefficient of Determination Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Coefficient of Determination R2 has a value of 0.6483. This means 64.83% of the variation in the auction selling prices (y) is explained by the variation in the odometer readings (x). The remaining 35.17% is unexplained, i.e. due to error. In general the higher the value of R2, the better the model fits the data. R2 = 1: Perfect match between the line and the data points. R2 = 0: There are no linear relationship between x and y. Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Towson University - J. Jung Example Data x y 1 6 2 3 9 4 5 17 12 11/10/2018 Towson University - J. Jung

Example Tools > Data Analysis… > Regression Y range X range x y 1 6 2 3 9 4 5 17 12 Tools > Data Analysis… > Regression Y range X range Regression Statistics Multiple R 0.700695581 R Square 0.490974298 Adjusted R Square 0.363717872 Standard Error 4.502909113 Observations 6   Coefficients t Stat P-value Intercept 0.933333333 4.19198025 0.222647359 0.834716871 x 2.114285714 1.076401159 1.96421724 0.120968388 11/10/2018 Towson University - J. Jung

Testing the significance of the slope coefficients… Keller: Stats for Mgmt & Econ, 7th Ed 11/10/2018 Testing the significance of the slope coefficients… We can test to determine whether there is enough evidence of a linear relationship between variable X and the dependent variable Y for the entire population H0: = 0 (no effect, or insignificant) H1: ≠ 0 (significant) for i = 1 and using: as our test statistic (with n–i–1 degrees of freedom). 11/10/2018 Towson University - J. Jung Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

P-value of slope coefficient of income Since the p-value is larger than α, we fail to reject H0. We conclude that slope coefficient is not significantly different from zero. p-value = 2*P(T>1.196) = = 2*tdist(1.96,4,1) = 0.12 Degrees of freedom: 6-1-1=4 α/2=0.025 α/2=0.025 p-value/2 =0.06 t =1.96 t t.025=abs(t.inv(0.05,4)) =2.78 t.025=-2.78 H0: b1 = 0 11/10/2018 Towson University - J. Jung