Regression
Remembering the basics of linear regression Hypothesis: a variable(s) x (x1, x2, x3, …) cause(s) another variable y Corollary: x can be used to partially predict y Mathematical implication: This is mimimal and random This is linear
Example
Example: Regression approach Error in prediction is minimal, and random
The way these assumptions look Child’s IQ = 20.99 + 0.78*Mother’s IQ
Prediction Child’s IQ = 20.99 + 0.78*Mother’s IQ Predicted Case 3 IQ = 20.99 + 0.78*110 Predicted Case 3 IQ = 106.83 Actual Case 3 IQ = 102
Example
Example
Example
How are the regression coefficients computed? MINIMIZE SQUARED DEVIATIONS BETWEEN ACTUAL AND PREDICTED VALUES ε is “minimal” and random
Interpreting coefficients
Error in estimation of b The estimate of b will differ from sample to sample. There is sampling error in the estimate of b. b is not equal to the population value of the slope (B). If we take many many simple random samples and estimate b many many times ….
Standard error of b
R2 = + 1 = + 1- Sum of squared errors of regression Sum of squared deviations from the mean only Variance due to regression Total variance Error variance = + Proportion of variance due to regression Proportion of variance due to error 1 = +
Multiple Regression
Multiple regression Purpose: Include “relevant” predictors Reduce error in prediction Test hypotheses about related independent variables
Matrices
Questions about matrices If A is a 3*4 matrix, what is the dimensions of A’ (A transposed)? If A is a 4*5 matrix and B is a column vector with 5 elements (5*1), what are the dimensions of C=A*B? If A is a 3*2 matrix, what are the dimensions of A-1?
Question
Answer
Vectors and Matrices
Vectors and Matrices
Multiple regression – OLS estimation
Multiple regression – OLS estimation k*1 n*1 n*1 n*k
Multiple regression – OLS estimation (k*1) {(k*n)*(n*k)} {(k*n)*(n*1)} (k*1) {(k*k)} * {(k*1)}
Multiple regression – OLS estimation Information matrix : each element is a cross-product term
OLS estimation: important points Information matrix is inverted Information matrix cannot be “singular.” Information matrix cannot have any two rows or columns that are (almost) identical. Every regression coefficient depends on the entire information matrix. Every regression coefficient depends on the covariances of all Xs with Y.
Assumptions of multiple regression Equal probability of selection (SRS) Linearity Independence of observations: Errors are uncorrelated The mean of error term is ALWAYS zero: Mean does not depend on x. Normality Homoskedasticity Variance does not depend on x. No multicollinearity