L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)

L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)
Nowcasting BERD at MS Level: a Penalised Linear Model Approach L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)

Definition Nowcasting: "Exploitation of information published early and/or at higher frequencies than the target variable of interest to obtain an early estimate before the official figure" (M. Bańbura et al, Now-casting and the real-time data flow. ECB, NO 1564 / July 2013) BERD: Business enterprise expenditure on R&D covers R&D activities carried out in the business sector by performing firms and institutes, regardless of the origin of funding and is arguably most closely linked to the creation of new products and production techniques. Used to improve the timeliness of the data

Nowcasting BERD Rational
Timeliness matters: official data are two years old BERD accounts for more than 60% of the total R&D expenditure at EU level BERD expenditure is used in a number of reports and indicators including the EIS. Time series analysis and clustering methods have not performed sufficiently well. Previous attempts Merit: Timeliness of Innovation Union Scoreboard data. Options for forecasting data Brussels October 2015 M.Moucharta, J. Romboutsb Clustered panel data models: An efficient approach for nowcasting from poor data International Journal of Forecasting 21 (2005) 577– 594 Statistics Austria: Nowcasting procedures in Austria for estimating Research & Development. OECD Statistics Newsletter July 2012 Reference to Eurostat's update of R&D statistics Merit:Expert Workshop on Opportunities and Challenges of Improving the Timeliness of Innovation Data.

Model Requirements Open sustainable data Easy to update/maintain hence
No manual selection of the input data for each MS No hand tuning the parameters for each MS No computational intensive Model as a prediction tool: validation as a part of model development. Keep things as simple as possible, but not any simpler need to go beyond plain OLS

Methodology overview Selection of the best predictors in terms of relevance and data availability. We consider 27 predictors Selection of an appropriate model based on its prediction capability. GLMNET in R Test the predictions. Cross validation. Leave the last available year out for testing Calculate prediction errors and their CI. Bootstrap

Model Input Same predictors as model input for each MS
Feature selection performed algorithmically

OLS approach OLS fits the model by solving But it cannot handle p>Noverfitting!

Ridge and Lasso Regressions
When p>N or many correlated predictors, best to use Ridge regression: it shrinks the coefficients of correlated predictors towards each other Lasso regression: it picks one of the correlated predictors and ignores the rest

GLMNET Elastic net models combine ridge and lasso ideas
𝛼=0ridge; 𝛼=1lasso penalty Biased {𝛽}, but reduced model variance Handles well many correlated predictors How to tune the model? Choice of 𝛼 and λ (hyperparameters)?cross validation

Cross Validation Split randomly the data into K=3 groups and take a couple of values {λ,𝛼} Each time 1 group as test and the other 2 as train set. Evaluate model performance on test set Repeat many times with different {λ,𝛼} and select { λ 𝑜𝑝𝑡 , 𝛼 𝑜𝑝𝑡 } Test Train Prediction error 1 Prediction error 2 Prediction error 3

CI for prediction We used a Bootstrap approach
Example on a simple linear regression By resampling the residuals, create many realization of the independent variable to retrain your model (fixed-x bootstrap) and get new predictions It is a non-parametric technique, so no assumptions are made on the distribution of the residuals.

After GLMNET and cross validation
Train model only on data up to 2013 (withhold 2014) Nowcast BERD 2014 (not used in model training)  Overfitting and poor nowcast GLMNET+CV+CI

EU28 nowcast Prediction error less than 1%.

Overall Assessment 11 MS have prediction error<5%. Their BERD accounts for 70% of the total EU28 BERD 10 MS have prediction error>10%. Their BERD accounts for less than 5% of the total EU28 BERD This is the reason why the method gives so go predictions for the EU28 At MS level the problem is with small economies where even small changes have visible impacts

Way forward Improvement of the methodology (e.g. introduction of weights) Investigate poor performance of the model in certain countries Apply the method to other R&D indicators (e.g. public R&D expenditure)

Thank you for your attention!
Last but not least…… Thank you for your attention! 21 November 2018

L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)

Similar presentations

Presentation on theme: "L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)

Similar presentations

Presentation on theme: "L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)"— Presentation transcript:

Similar presentations

About project

Feedback