Simple Regression
Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the corresponding statistical model? Given the statistical model and a sample of data on two economic variables, how do we use this information?
Main Points Identify relationships between economic variables Answer questions like: If one variable changes in some way, by how much does another variable change? Move from studying one economic variable to studying two
Specific Example Extend household food expenditure Let population of interest be all households with three members, no matter what income level Can now look at what happens to food expenditure as income rises or falls?
Economic Model Y : Household expenditure on food X : Household income Economic Model (general form) y = f(x) Specifies that household expenditure on food is some function of household income
Relationship between y and x Need to quantify the change in food expenditure that occurs when income changes Must be more precise about the nature of the relationship between x and y Many possible forms – sometimes theory provides some guide Simplest form: y = a + bx (linear)
Statistical Model: Error Terms Economic model is an approximation Need to account for other factors that affect the relationship between economic variables Add an unobservable error term (e) y = a + bx + e 1.The combined effect of other influences on x 2.Approximation error from functional form 3.Elements of random behavior by individuals
Adding Data Suppose we have observations of y and x from i = 1,2,…, n households y i = a + bx i + e i y : Dependent or Response variable x : Explanatory variable Level of household expenditure on food is related to the level of household income
Method of Least Squares The parameters a and b tell us about the relationship between y and x We need a rule to tell us how to make use of sample data to estimate the parameters We use the Least Squares Method: find a line so that the sum of the squares of the vertical distances from each point to the line is as small as possible
.. y i (exp) } eiei xixi y = a + bx x y y i (obs) Residual Errors
There will be n of these Depend on the fitted line as defined by the specific values a and b Squares can be summed e i 2 = ( y i - a - bx i ) 2
Normal equations Two equations in two unknowns They can be solved for a unique solution y = na + b x xy = a x + b x 2
Formulae for the regression coefficients
Correlation
The correlation between two random variables X and Y measures the strength of the relationship between them.
The coefficient of determination is a statistic which measures the extent to which the variation in Y is explained by the regression line of Y on X. It is denoted by r 2. Coefficient of determination
y = a + bx x y Coefficient of determination y = y
The quantity is known as the unexplained variation since the deviations are completely random and thus unpredictable. Coefficient of determination
The coefficient of determination is given by where numerator = explained variation denominator = total variation Coefficient of determination
The coefficient of determination is given by Coefficient of determination
The linear product-moment correlation coefficient is given by Correlation coefficient