Econometric methods of analysis and forecasting of financial markets Lecture 2. Linear regression model assumptions and its violations
This lecture helps to understand: What are the main assumptions of linear regressions What if they violate and how to detect it. How to solve these problems.
Contents Linear regression model assumptions Multicollinearity Endogeneity Heteroskedasticity autocorrelation
Linear regression model assumptions Once again from the previous lecture: E( 𝑢 𝑡 )=0 (zero mean of the errors) Var 𝑢 𝑡 = 𝜎 2 <∞ (constant and finite variance of the errors) Cov 𝑢 𝑖 , 𝑢 𝑗 =0 (linearly independent errors) Cov 𝑢 𝑖 , 𝑥 𝑖 =0 (no relationship between the error and explanatory variable) 𝑢 𝑖 ~𝑁(0, 𝜎 2 ) (normally distributed)
Multicollinearity Multicollinearity is the case when explanatory variables are highly correlated with each other. Leads to: R2 will be high but the individual coefficients will have high standard errors; regression becomes very sensitive to small changes in the specification; confidence intervals for the parameters become very wide, and significance tests show incorrect results.
Multicollinearity Solutions to the problem: Ignore it, if the model is otherwise adequate Drop one of the collinear variables Transform the highly correlated variables into a ratio and include only the ratio Look at your data and collect more
Endogeneity If one of the explanatory variables is correlated with the error term, then this variable is endogenous. Formally: 𝑦= 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 +…+ 𝛽 𝑘 𝑥 𝑘 +𝜀 𝑥 𝑗 is endogenous, if 𝑐𝑜𝑟𝑟( 𝑥 𝑗 , 𝜀)≠0 for at least one j In this case estimators are biased and inconsistent!
Endogeneity Potential reasons for endogeneity: omitted variable functional form misspecification measurement error (in dependent variable or in explanatory variable) simultaneity
Endogeneity How to solve the problem of endogeneity? proxy variables IV and 2sls The main steps: use tests for form misspecification and try to find the most appropriate model (including quadratic terms and logs) if you have data for the omitted variable - include it, if not - use proxy variable (in the case you don’t know what proxy to use, take as a proxy lagged dependent variable) use instrumental variables and 2sls estimation
Heteroskedasticity Assumption 2 (Var 𝑢 𝑡 = 𝜎 2 <∞,constant and finite variance of the errors) is the assumption of homoscedasticity. If it violates, there is the case of heteroskedasticity.
Heteroskedasticity Consequence? Wrong standard errors! In this case it’s impossible to run the tests How to detect heteroskedasticity? Breush-Pagan test 𝑦= 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 +…+ 𝛽 𝑘 𝑥 𝑘 +𝑢 (1) 𝑢 2 = 𝛿 0 + 𝛿 1 𝑥 1 + 𝛿 2 𝑥 2 +…+ 𝛿 𝑘 𝑥 𝑘 +𝑣 (2) 𝐻 0 : 𝛿 1 = 𝛿 2 =…= 𝛿 𝑘 =0 (homoskedasticity) estimate (1) by OLS, obtain 𝑢 2 Estimate (2) using 𝑢 2 , get the 𝑅 2 Test statistic 𝑛𝑅 2 ~ 𝜒 𝑘 2 If 𝑛𝑅 2 > 𝜒 𝑘 2 then reject 𝐻 0
Heteroskedasticity How to detect heteroskedasticity? White test Estimate (1) by OLS, get 𝑦 and 𝑢 , compute 𝑦 2 and 𝑢 2 Run the regression 𝑢 2 = 𝛿 0 + 𝛿 1 𝑦 + 𝛿 2 𝑦 2 +𝜀, get the 𝑅 2 Test statistic 𝑛𝑅 2 ~ 𝜒 𝑘 2 If 𝑛𝑅 2 > 𝜒 𝑘 2 then reject 𝐻 0
Heteroskedasticity Solutions for heteroskedasticity: Transforming the variables into logs or reducing by some other measure of ‘size’. Using heteroscedasticity-consistent standard error estimates (‘robust’).
Autocorrelation The violation of the assumption about correlation of error terms results in autocorrelation problem: cov(εi, εj)≠0 if i≠j Reasons for autocorrelation: Omitting relevant variables. As a result they could correlate with the error term (lags of included variables, trend, seasonal effects). More likely when frequency of data is higher. Wrong specification of a model. Why this problem is important? Estimates by OLS method are biased and inefficient: estimates βi have variance which is not the lowest. Consequently, this estimates are not the most accurate. Incorrect standard errors of estimates. Consequently, we cannot use it while doing t and F tests and constructing confidence intervals. How to detect: t test for AR(1) serial correlation with strictly exogenous regressors Durbin-Watson test t test for AR(1) serial correlation without strictly exogenous regressors testing for higher order serial correlation
Autocorrelation Correction of autocorrelation: If we found out autocorrelation in the model it is obligatory to make a correction. For this purpose we can use the next ways: Include omitted variables (lag, trend, control for seasonality). Use more appropriate methods for data with high frequency. Change model specification.
Conclusions We covered The main violations of linear regressions assumptions What they cause and how to solve these problems
References Brooks C. Introductory Econometrics for Finance. Cambridge University Press. 2008. Cuthbertson K., Nitzsche D. Quantitative Financial Economics. Wiley. 2004. Tsay R.S. Analysis of Financial Time Series, Wiley, 2005. Y. Ait-Sahalia, L. P. Hansen. Handbook of Financial Econometrics: Tools and Techniques. Vol. 1, 1st Edition. 2010. Alexander C. Market Models: A Guide to Financial Data Analysis. Wiley. 2001. Cameron A. and Trivedi P.. Microeconometrics. Methods and Applications. 2005. Lai T. L., Xing H. Statistical Models and Methods for Financial Markets. Springer. 2008. Poon S-H. A practical guide for forecasting financial market volatility. Wiley, 2005. Rachev S.T. et al. Financial Econometrics: From Basics to Advanced Modeling Techniques, Wiley, 2007.