Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.

Slides:



Advertisements
Similar presentations
Financial Econometrics
Advertisements

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
A Generalized Nonlinear IV Unit Root Test for Panel Data with Cross-Sectional Dependence Shaoping Wang School of Economics, Huazhong University of Science.
Multiple Regression Analysis
Chapter Outline 3.1 Introduction
The Multiple Regression Model.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Prediction with Regression
Pattern Recognition and Machine Learning
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Model Assessment, Selection and Averaging
Chapter 2: Lasso for linear models
EC220 - Introduction to econometrics (chapter 2)
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
The Simple Linear Regression Model: Specification and Estimation
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 10 Simple Regression.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Additional Topics in Regression Analysis
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 5: Regression with One Explanator (Chapter 3.1–3.5, 3.7 Chapter 4.1–4.4)
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Prediction and model selection
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Classification and Prediction: Regression Analysis
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Chapter 11 Simple Regression
What does it mean? The variance of the error term is not constant
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
Analysing shock transmission in a data-rich environment: A large BVAR for New Zealand Chris Bloor and Troy Matheson Reserve Bank of New Zealand Discussion.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
Confidence Interval & Unbiased Estimator Review and Foreword.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CpSc 881: Machine Learning
Chap 6 Further Inference in the Multiple Regression Model
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Forecasting with Bayesian Vector Autoregression Student: Ruja Cătălin Supervisor: Professor Moisă Altăr.
Machine Learning 5. Parametric Methods.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Univariate Time series - 2 Methods of Economic Investigation Lecture 19.
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
The Simple Linear Regression Model: Specification and Estimation
Ch3: Model Building through Regression
Towson University - J. Jung
Roberto Battiti, Mauro Brunato
Econ 3790: Business and Economics Statistics
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
What is Regression Analysis?
Introduction to Predictive Modeling
Linear Model Selection and regularization
Parametric Methods Berlin Chen, 2005 References:
Sparse Principal Component Analysis
Presentation transcript:

Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010

Introduction (1) We want to forecast: –The rate of growth of employment, –The change in annual inflation, –The change in federal fund rate. A standard and simple system approach in economics is the VAR.

Introduction (2) OLS provides the efficient estimator for the VAR. However, there are a lot of evidences showing that Bayesian VAR outperforms unrestricted OLS VAR in out-of-sample forecasting: –Litterman (1986), and Robertson and Tallman (1999).

Introduction (3) Banbura et al. (2008) also show that it is possible and satisfactory to employ many endogenous variables with long lags in the Bayesian VAR (131 var, 13 lags). We see some studies following this direction.

Introduction (4) There is another related literature in forecasting using large number of predictors in the model. A popular method is the “Approximate Factor Model”, proposed by Stock and Watson (2002).

Introduction (5) In this literature, it was shown that using larger number of predictors (independent variables) does not always help improve the forecasting performances. Bai and Ng (2008) show that selecting variables using the LASSO or the elastic net, before applying the methodology of the approximate factor model can outperform bigger models.

Introduction (6) Even they interpret their results differently, we see that this is an evidence of redundancy of models with large predictors. Now, considering VAR with large endogenous variables and long lags, we think that redundancy should be the case as well.

Introduction (7) We have not gone into VAR with large endogenous variables yet. But we are working with 13 lags in the VAR.

Bias-Variance Tradeoff (1) Suppose OLS estimate is unbiased. Gauss-Markov Theorem: –OLS estimate has the smallest variance among all linear unbiased estimates. However, we know that there are some biased estimates that have smaller variances than the OLS estimate.

Bias-Variance Tradeoff (2) OLS; Unbiased, but High Variances Shrinkage; Biased, but Small Variance x x True Model

VAR (1) We consider a VAR relationship. Note here that we cannot write the bias- variance tradeoff for the VAR. –The OLS estimate is biased under finite sample. We still think similar logic applies. However, direction of shrinkage may be important.

VAR (2) With T observations, we have: where We assume

VAR (3) The unrestricted OLS estimator is: This estimator may not be defined if we have too many endogenous variables or too many lags.

Bayesian VAR (1) This is a shrinkage regression. We follow Kadiyala and Karlson (1997) and Banbura et al. (2008) to use the Normal-(Inverted)-Wishart as our prior distribution. We work with stationary and demeaned variables. Hence, we set the mean of prior distribution at zero.

Bayesian VAR (2) We can write the (point) estimator of our Bayesian VAR estimate as: where

Ridge Regression (1) Well-known in statistical literature. Can be defined as: This is a regression that imposes a penalty on the size of the estimated coefficients.

Ridge Regression (2) The solution of the previous problem is: Observe the similarity with:

BVAR v RR (1) Proposition 1: –BVAR estimator can be seen as the solution of the optimization problem: –where is the (j,j)-th element of the matrix.

BVAR v RR (2) Proposition 2: –Let, we have: –Where Note: If, is just standardized.

LASSO (1) Least Absolute Shrinkage and Selection Operator. The LASSO estimate can be defined as:

LASSO (2) LASSO is proposed because: –Ridge regression is not parsimonious. –Ridge regression may generate huge prediction errors under sparse matrix of true (unknown) coefficients. LASSO can outperform RR if: –True (unknown) coefficients are composed of a lot of zeros.

LASSO (3) If there are a lot of irrelevant variables in the model, setting their coefficients at zeros every time can reduce variance without disturbing the bias that much. We see that VAR with 13 lags may possess a lot of irrelevant variables.

The Elastic Net (1) Zou and Hastie (2005) propose another estimate that can further improve the performance of LASSO. It is called the elastic net, and the naïve version can be defined as:

The Elastic Net (2) We modify the elastic to allow treating different lagged variables differently. Our modified naïve elastic net is:

Implementation We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) to implement both LASSO and EN efficiently. This can be applied to our modified version as well.

Empirical Study (1) I use the US data set from Stock and Watson (2005). –Monthly data cover Jan 1959 – Dec –There are 132 variables. But I use only 7. I transformed the data as in De Mol, Giannone, and Reichlin (2008) to obtain stationary. –Their replication file can be downloaded. –Their transformation make every variable to be annual growth or change in annual growth.

Empirical Study (2) Out-of-sample performances. –In each month from Jan 1981 to Dec 2003 (276 times), regress one model using the most recent 120 observations, to make one forecast. –The performances are measured using Relative Mean Squared Forecast Errors (RMSFE), using OLS as the benchmark regression.

Empirical Study (3) There are 3 variables that we want to forecast: –The employment (EMPL) –The annual inflation (INF) –The Federal Fund Rate (FFR). The order of VAR is p = 13. There are 4 forecast horizons (1,3,6,12), and 3 values of (0,1,2).

Empirical Study (4) The most time-consuming part is to figure out suitable parameters for each regression. We use grid searches on out-of-sample performances during the test period Jan 1971 – Dec 1980 (120 times). –Bayesian VAR: We employ the process in my previous chapter. –LASSO: A grid of 90 values. –Modified Elastic Net: A grid of 420 pairs of values.

Empirical Study (5) We also employ the combination of LASSO and Bayesian VAR as well. –LASSO discards some variables that tend to correspond with zero true coefficients. –Bayesian VAR is similar to ridge regression, which assigns better amount of shrinkage to positive coefficients.

Empirical Study (6) For the smallest model, we use the 3 variables to forecast themselves.

Empirical Study (7)

Empirical Study (8)

Empirical Study (9) Comparing different regressions. Pi = 0

Empirical Study (10) Comparing different regressions. Pi = 0

Empirical Study (11) When we change to 7-variable VAR.

Conclusion Even the empirical results are not impressive, we still think this is a promising way to improve the performances of Bayesian VARs. When the model becomes bigger, e.g. models with 131 endogenous variables, this should be more relevant. We can think of some cautions like Boivin and Ng’s (2006) for the VAR as well.

Thank you very much.