Chapter 4 Basic Estimation Techniques
Learning Objectives Set up and interpret simple linear regression equations Estimate intercept and slope parameters of a regression line using the method of least‐squares Determine statistical significance using either t‐tests or p values associated with parameter estimates Evaluate the “fit” of a regression equation to the data using the R2 statistic and test for statistical significance of the whole regression equation using an F‐test Set up and interpret multiple regression models Use linear regression techniques to estimate the parameters of two common nonlinear models: quadratic and log‐linear regression models http://lectures.mhhe.com/media/0073523224/videos/Demonstration_Problem_3_3.mp4
Basic Estimation Parameters Parameter estimation The coefficients in an equation that determine the exact mathematical relation among the variables Parameter estimation The process of finding estimates of the numerical values of the parameters of an equation
Regression Analysis Regression analysis Dependent variable A statistical technique for estimating the parameters of an equation and testing for statistical significance Dependent variable Variable whose variation is to be explained Explanatory variables Variables that are thought to cause the dependent variable to take on different values
Simple Linear Regression True regression line relates dependent variable Y to one explanatory (or independent) variable X Intercept parameter (a) gives value of Y where regression line crosses Y-axis (value of Y when X is zero) Slope parameter (b) gives the change in Y associated with a one-unit change in X:
Simple Linear Regression Regression line shows the average or expected value of Y for each level of X True (or actual) underlying relation between Y and X is unknown to the researcher but is to be discovered by analyzing the sample data Random error term Unobservable term added to a regression model to capture the effects of all the minor, unpredictable factors that affect Y but cannot reasonably by included as explanatory variables
Fitting a Regression Line Time series A data set in which the data for the dependent and explanatory variables are collected over time for a single firm Cross-sectional A data set in which the data for the dependent and explanatory variables are collected from many different firms or industries at a given point in time
Fitting a Regression Line Method of least squares A method of estimating the parameters of a linear regression equation by finding the line that minimizes the sum of the squared distances from each sample data point to the sample regression line
Fitting a Regression Line Parameter estimates are obtained by choosing values of a & b that minimize the sum of squared residuals The residual is the difference between the actual and fitted values of Y: Yi – Ŷi Equivalent to fitting a line through a scatter diagram of the sample data points
Fitting a Regression Line The sample regression line is an estimate of the true (or population) regression line Where and are least squares estimates of the true (population) parameters a and b
Sample Regression Line (Figure 4.2) 70,000 Sample regression line Ŝi = 11,573 + 4.9719A ei Si = 60,000 Sales (dollars) • 60,000 • 50,000 • Ŝi = 46,376 40,000 • • 30,000 • 20,000 • 10,000 A 2,000 4,000 6,000 8,000 10,000 Advertising expenditures (dollars)
Unbiased Estimators The estimates & do not generally equal the true values of a & b & are random variables computed using data from a random sample The distribution of values the estimates might take is centered around the true value of the parameter An estimator is unbiased if its average value (or expected value) is equal to the true value of the parameter
Relative Frequency Distribution* (Figure 4.3) 1 1 2 3 4 5 6 7 8 9 10 *Also called a probability density function (pdf)
Statistical Significance There is sufficient evidence from the sample to indicate that the true value of the coefficient is not zero Hypothesis testing A statistical technique for making a probabilistic statement about the true value of a parameter
Statistical Significance Must determine if there is sufficient statistical evidence to indicate that Y is truly related to X (i.e., b 0) Even if b = 0, it is possible that the sample will produce an estimate that is different from zero Test for statistical significance using t-tests or p-values
Statistical Significance First determine the level of significance Probability of finding a parameter estimate to be statistically different from zero when, in fact, it is zero Probability of a Type I Error 1 – level of significance = level of confidence Level of confidence is the probability of correctly failing to reject the true hypothesis that b = 0
Performing a t-Test t-ratio is computed as Use t-table to choose critical t-value with n – k degrees of freedom for the chosen level of significance n = number of observations k = number of parameters estimated
Performing a t-Test t-statistic Numerical value of the t-ratio If the absolute value of t-statistic is greater than the critical t, the parameter estimate is statistically significant at the given level of significance
Using p-Values p-value gives exact level of significance Treat as statistically significant only those parameter estimates with p-values smaller than the maximum acceptable significance level p-value gives exact level of significance Also the probability of finding significance when none exists
Coefficient of Determination R2 measures the fraction of total variation in the dependent variable (Y) that is explained by the variation in X Ranges from 0 to 1 High R2 indicates Y and X are highly correlated, and does not prove that Y and X are causally related
F-Test Used to test for significance of overall regression equation Compare F-statistic to critical F-value from F-table Two degrees of freedom, n – k & k – 1 Level of significance If F-statistic exceeds the critical F, the regression equation overall is statistically significant at the specified level of significance
Multiple Regression Uses more than one explanatory variable Coefficient for each explanatory variable measures the change in the dependent variable associated with a one-unit change in that explanatory variable, all else constant
Quadratic Regression Models Use when curve fitting scatter plot is U-shaped or ∩-shaped Y = a + bX + cX2 For linear transformation compute new variable Z = X2 Estimate Y = a + bX + cZ
Log-Linear Regression Models Use when relation takes the form: Y = aXbZc Transform by taking natural logarithms: Percentage change in Y Percentage change in X b = Percentage change in Y Percentage change in Z c = b and c are elasticities
Summary A simple linear regression model relates a dependent variable Y to a single explanatory variable X The regression equation is correctly interpreted as providing the average value (expected value) of Y for a given value of X Parameter estimates are obtained by choosing values of a and b that create the best-fitting line that passes through the scatter diagram of the sample data points If the absolute value of the t-ratio is greater (less) than the critical t-value, then is (is not) statistically significant Exact level of significance associated with a t-statistic is its p-value A high R2 indicates Y and X are highly correlated and the data tightly fit the sample regression line
Summary If the F-statistic exceeds the critical F-value, the regression equation is statistically significant In multiple regression, the coefficients measure the change in Y associated with a one-unit change in that explanatory variable Quadratic regression models are appropriate when the curve fitting the scatter plot is U-shaped or ∩-shaped (Y = a + bX + cX2) Log-linear regression models are appropriate when the relation is in multiplicative exponential form (Y = aXbZc) The equation is transformed by taking natural logarithms