Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Basic Ideas of Linear Regression: The Two- Variable Model chapter six
6-2 Regression Analysis Study of the relationships between a dependent variable (Y) and one or more independent or explanatory variables (X 1, X 2,…) Regression does not necessarily imply causation. Causation must be inferred from the theory underlying the phenomenon that is tested empirically.
6-3 Objectives of Regression Analysis Estimate the mean of Y given the X values, or E(Y|X) Test hypotheses about the nature of the dependence (is the price elasticity of demand = -1.0?) To predict or forecast the mean value of Y given values of X beyond the sample range. Two or more of these combined.
6-4 Example How much money do people at different income levels spend on NY state Lotto each week? Let Y represent weekly expenditure on Lotto Let X represent weekly personal disposable income Assume a population of 100 Lotto players divided into 10 income classes, 10 players in each class See Table 6-1 and scatter diagram Fig. 6-1.
6-5 Table 6-1 Weekly lotto expenditure in relation to weekly personal disposable income.
6-6 Figure 6-1 Weekly expenditure on Lotto ($) and weekly personal disposable income ($).
6-7 Population Regression Line The circled values in Fig. 6-1 are the mean values of Y for each X Called conditional mean values or conditional expected values Connect the various conditional mean values of Y and the resulting line is the population regression line (PRL) The PRL gives the mean value of the dependent variable for each value of the independent variable.
6-8 Population Regression Function Since the PRL is approx. linear it can be expressed mathematically as E(Y|X i ) = B 1 + B 2 X i. This is the population regression function See the conditional mean values in the last row of Table 6-1 The regression of Y on X is the mean of the distribution of Y values corresponding to the given X. The PRL is a line that passes through the conditional means of Y.
6-9 Conditional Regression Analysis In the PRF above, B 1 and B 2 are the parameters or regression coefficients B 1 is called the intercept (coefficient) B 2 is the slope coefficient and measures the rate of change in the conditional mean of Y per unit change in X This is conditional regression analysis – behavior of Y conditional on given values of X – commonly called just regression analysis In this context, E(Y) means E(Y|X).
6-10 Statistical or Stochastic Specification Note in Table 6-1 that the mean Lotto expenditure may be $20.90 at a PDI of $150 BUT individual customer’s expenditures range from $12 to $33 An individual’s expenditure may be expressed as the group average plus or minus a quantity Y i = B 1 + B 2 X i + u i Where u i is the stochastic, or random, error term, a random variable.
6-11 Figure 6-2
6-12 Stochastic PRF (B 1 + B 2 X i ) the systematic or deterministic component u i the nonsystematic or random component, sometimes called the noise component Influence of left-out variables Inherent randomness in human behavior Errors in measurement Ockham’s Razor – intentionally leave out variables if the effects are too small or too unsystematic so that the effects are left in the error term
6-13 The Sample Regression Function How do we estimate the PRF with sample data? Table 6-2: a random sample from Table 6-1. Notice we have only one Y value for each X value Sampling fluctuations or sampling error Undermine our ability to estimate the PRF Suppose we have another random sample (Table 6-3) and plot data from both samples (Figure 6-3).
6-14 Table 6-2 A random sample from Table 6-1.
6-15 Table 6-3 Another random sample from Table 6-1.
6-16 Figure 6-3 Sample regression lines based on two independent samples
6-17 Sample Regression Function The SRLs plotted in Fig. 6-3 are different and likely not the same as the PRL More samples give us more SRLs, all different Analogous to the PRF is the sample regression function
6-18 Sample Regression Functions Y i “hat” is the estimator of E(Y|X i ) b 1 is the estimator of B 1 b 2 is the estimator of B 2 Stochastic SRF → → e i, called the residual, is the estimator of u i, the random error
6-19 Objective Estimate the PRF on the basis of the SRF What procedure or method will make the approximation (SRF) as close as possible to the PRF? Remember we do not observe B 1, B 2, and u i as in Figure 6-4
6-20 Figure 6-4 The population and sample regression lines.
6-21 “Linear” Regression Linearity in the variables The conditonal mean of the dependent variable is a linear function of the independent variables A function Y = f(X) is linear if X appears with a power of 1 only (no X 2 or √X) X is not multiplied or divided by another variable For regression models The rate of change in the dependent variable for a unit change in the explanatory variable remains constant Or the slope of Y in X is constant (Fig. 6-5)
6-22 Figure 6-5 (a) Linear demand curve; (b) nonlinear demand curve.
6-23 “Linear” Regression Linearity in the parameters The conditional mean of the dependent variable is a linear function of the parameters A function is linear in the parameter B 2, if B 2 appears with a power of 1 only For our purposes, linear regression means a regression that is linear in the parameters, but not necessarily linear in the explanatory variables.
6-24 Multiple Regression The dependent variable is a function of more than one explanatory variable
6-25 Estimation of Parameters Method of Ordinary Least Squares Estimate the PRF from the SRF → Choose b 1, b 2 so that e is as small as possible In OLS, choose b 1, b 2, to minimize the residual sum of squares (RSS), ∑e i 2
6-26 OLS Estimators
6-27 Properties of OLS Estimators The SRF passes through the sample mean values of X and Y The mean of the residuals is zero, ∑e i /n = 0 X and e are uncorrelated, ∑e i X i = 0 Similarly
6-28 Calculate b 1 and b 2 See Table 6-4 This yields → See Fig. 6-6 If PDI goes up by $1, then Lotto expenditure goes up by $ If PDI = 0, Lotto expenditure ≈ $7.62?
6-29 Table 6-4 Raw data (from Table 6-2) for lotto.
6-30 Figure 6-6 Regression line based on data from Table 6-4.
6-31 Table 6-5 Average hourly wage by education.
6-32 Figure 6-7 S&P 500 composite index and three-month Treasury bill rate.
6-33 Table 6-6 Median home price (MHP) and mortgage interest rate (INT) in metropolitan New York area,
6-34 Figure 6-8 Median home prices and interest rates.
6-35 Table 6-7 Hypothetical data on weekly consumption expenditure and weekly income.
6-36 Table 6-8 Consumer price index (CPI) and S&P 500 index (S&P), United States,
6-37 Table 6-9 Nominal interest rate (Y) and inflation (X) in nine industrial countries for the year 1988.
6-38 Table 6-10 Consumer price index (CPI) and S&P 500 index (S&P), United States,
6-39 Table 6-11 Selected data on top business schools in the United States.
6-40 Table 6-12 Real gross domestic product and civilian unemployment rate, United States,
6-41 Table 6-13 S&P 500 index (S&P) and three-month Treasury bill rate (3-M T Bill)
6-42 Table 6-14 Auction data on price, age of clock and number of bidders.
6-43 Table 6-15 Mean scholastic aptitude test (S.A.T.) verbal and math scores for college-bound seniors,