State Space Modelling For UK LFS Unemployment

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Properties of Least Squares Regression Coefficients
Dates for term tests Friday, February 07 Friday, March 07
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Time Series Building 1. Model Identification
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
Time series analysis - lecture 5
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Prediction and model selection
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Chapter 11 Multiple Regression.
Economics 20 - Prof. Anderson
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Autocorrelation Lecture 18 Lecture 18.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
Traffic modeling and Prediction ----Linear Models
Chapter 11 Simple Regression
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
The Ogden Tables and Contingencies Other than Mortality Zoltan Butt Steven Haberman Richard Verrall Ogden Committee Meeting 21 July 2005.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Time Series Analysis PART II. Econometric Forecasting Forecasting is an important part of econometric analysis, for some people probably the most important.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Hedonic Price Models with Omitted Variables and Measurement Errors
Esman M. Nyamongo Central Bank of Kenya
Chapter 15 Panel Data Models.
ASEN 5070: Statistical Orbit Determination I Fall 2014
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Linear Mixed Models in JMP Pro
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 4: Seasonal Series: Forecasting and Decomposition
Simultaneous equation system
PSG College of Technology
Charles University Charles University STAKAN III
Simple Linear Regression - Introduction
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Introduction to Instrumentation Engineering
Autocorrelation.
Modern Spectral Estimation
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar
Filtering and State Estimation: Basic Concepts
Statistical Assumptions for SLR
10701 / Machine Learning Today: - Cross validation,
OVERVIEW OF LINEAR MODELS
Forecasting Qualitative Analysis Quantitative Analysis.
OVERVIEW OF LINEAR MODELS
Bayes and Kalman Filter
Principles of the Global Positioning System Lecture 11
Regression III.
Principles of the Global Positioning System Lecture 13
Parametric Methods Berlin Chen, 2005 References:
Lecturer Dr. Veronika Alhanaqtah
Autocorrelation.
Lecturer Dr. Veronika Alhanaqtah
Unfolding with system identification
Time Series introduction in R - Iñaki Puigdollers
CH2 Time series.
Load forecasting Prepared by N.CHATHRU.
Presentation transcript:

State Space Modelling For UK LFS Unemployment Gary Brown, Ping Zong Time Series Analysis Branch Office for National Statistics Jan Angenendt   Knowledge, Analysis and Intelligence HM Revenue and Customs Moshe Feder Southampton Statistical Sciences Research Institute (S3RI) University of Southampton

LFS Rolling Quarterly Data State Space Model Overview Introduction LFS Rolling Quarterly Data State Space Model The General State Space Model The Specific Model Proposed for UK LFS Results Further Work

A key advantage of using SSM: Introduction A State Space Model (SSM) represents a structural time series approach to capturing the characteristics of a time series. Similarly to X-12-ARIMA, a time series can be decomposed into trend, seasonal and irregular using SSM. A key advantage of using SSM: allows explicitly modelling of unobservable components

Aims of the SSM Project Currently the UK LFS publishes a single estimate for each rolling quarter, based on a rotating panel design with five waves of interviews The aim of the SSM project is to model the complex LFS structure by fitting wave-specific rolling quarter data, to better account for: sampling error autocorrelation (between wave- specific estimates) rotation group bias (systematic differences between wave-specific estimates)

LFS Sample Design The LFS sample size, around 120,000 people in 40,000 households, is split into 190 Interviewer Areas (IAs). Each IA is split into 13 weekly ‘stints’ – in this way a representative sample is achieved every 13 weeks. To weight the sample, the 13 weeks are allocated to ‘months’ in a 4-4-5 pattern.

LFS Sample Design (cont) Interviews are (approximately) split by mode: First interview – face-to-face Second interview (13 weeks later) – telephone Third, fourth and fifth interviews (each 13 weeks after the previous) – telephone. After the fifth interview (wave 5) households drop out of the survey and are replaced with a new set of households (wave 1).

Data Structure Table 1: Rolling quarterly estimates

Rolling quarterly estimate Each three months yields a representative sample, and each month is in three of these For example, survey responses from June are included in three rolling quarterly estimates: (April,May,Jun), (May,Jun,Jul), (Jun,Jul,Aug) Given this structure, the overall unemployment quarterly estimate at time t is a combination of three months: where ‘Y’ = unemployment rate, ‘ILO’ = ILO unemployed, and ‘EA’ = economically active.

Table 2: Sample rotation in wave-specific data

The sample rotation means: The same wave does not include the same households (samples) in each rolling quarter. The same households (cohort) appear in different waves after one quarter. There are different data collection methods in waves. These characteristics need to be accounted for. Using the SSM approach for UK LFS unemployment enables this to happen

The General State Space Model (GSSM) is: (1) (2) where: yt is the measurement equation at is the state vector (the transition equation) Z, T, H and Q are matrices et and ht are error terms Compare the General Linear Regression Model (GLRM) (3)

In (1) and (3), y is a function of time, but Comparing SSM and GLRM In (1) and (3), y is a function of time, but GLRM: the coefficient is a which is fixed SSM: the coefficient is at which will vary over time Hence, GLRM is a static regression model and SSM is a dynamic regression model In the SSM, each coefficient at varies according to a random walk at = at-1 + ht which gives a state vector in the form at = Tat-1 + ht as in Equation (2). So, equation (1) expresses the dynamic regression process, and equation (2) expresses the dynamic change condition

SSM with Signal and Noise The SSM model can be expressed as two parts: signal qt noise et (4) where: yt is the design unbiased survey estimate qt is signal - the unknown population quantity et is noise - the survey errors

Signal qt - Basic Structural Model (BSM) (4d) if using dummy seasonality where: Lt is the level Rt is the slope St is the seasonal hL,t, hR,t, hS,t are white noise terms

Noise et - the Extended SSM model (5) where: f is the coefficient of AR process et-j is the sampling error he,t is white noise The standard assumption is independence of errors

BSM + Extended SSM Both signal and noise have their own measurement equation (ZBSM,t and Ze,t) and state vector (aBSM,t and ae,t) in the transition equation, ie - measurement equation for signal - state vector for signal - measurement equation for noise - state vector for noise if AR(4) These two parts, signal and noise, are brought together to form a completed SSM model.

The Specific SSM Proposed for UK LFS State Vector (Lt, Rt, St, et) As survey responses from month t are included in estimates based on three representative samples centred at (t-1,t,t+1), the state vector will not only consider parameters at time t but will take all three time periods (t-1,t,t+1) into account All the original Lt, Rt, St and et will include for level for slope for seasonal for sample error

State vector (Lt, Rt, St, et) - cont The slope (R) will be the same at three different levels, so is kept at the t+1 value. Also because there are five waves, and each wave includes three time periods (t-1, t, t+1) for sample error, there are in total 15 sample error state variables in our model. Total = 30 state variables for the state vector.

Survey error structure (et) Survey errors do not overlap in the wave structure data but do appear between two quarter across waves (cohorts), ie someone interviewed in wave i at time t will be interviewed in wave (i+1) at time t+3 – the sample errors will correlated so are defined as follows. (6)

Building the SSM for UK LFS Matrices are used as the basic method for building the SSM The main SSM matrices/vectors for UK LFS are: observation matrices (Z) transition matrices (T) covariance matrices (Q) state vectors (at) disturbance vectors

Observation matrices (Z) ZBSM (signal) is a 5x15 matrix for SSM with dummy seasonality Ze (noise) is a 5x15 matrix

State vectors (at, at-1) and transition matrix (T): BSM State vectors (aBSM,t), transition matrix (TBSM) + disturbance

State vectors (at, at-1) and transition matrix (T): e State vectors (ae,t), transition matrix (Te) + disturbance

Join Signal and Noise - block all matrices together Observation matrices: State vectors: Disturbance vectors: Transition matrices: where TBSM is the 15x15 matrix and Te is the 15x15 matrix with

Covariance matrices (Q) where with and with

Covariance matrices (Q) – cont. and with

The Model Estimate Setting As long as all SSM matrices are set appropriately and all parameters in the model are known, the state vector can be predicted, filtered and smoothed using the Kalman Filter In fact, all these parameters are unknown, thus we need initialisation of all parameters: (at-1) in the state vector in the disturbance matrix AR parameters (r and r*) in the transition matrix

Initialisation for (at-1) in the state vector For non-stationary components: initialised the non-stationary components mean by zero initialised the associated non-stationary component variances with a very large value (ie 10000) For stationary components: initialised the stationary component (sampling error et) mean with unconditional mean initialised the stationary component variance with its own pseudo-error variance

Initialisation for (ht) in the disturbance vector All in the disturbance vector can be estimated using Maximum Likelihood in the model. There are two approaches to estimating these parameters: The hyper-parameters approach assumes that all parameters are unknown and are estimated simultaneously in the SSM model. The pseudo-error approach is different ...

Initialisation in the pseudo-error approach Different approaches for different parameters hL,t, hR,t,hS,t are treated as unknown parameters, and he,t is treated as a known parameter (estimated in a separate process) Initialised variance value for the unknown parameter vector (in Q matrices) are set based on a separate estimation process (‘Proc ucm’ in SAS, ‘StructTS’ in DLM/R) obtained based on calculation of the autocorrelation through the pseudo-error process AR parameters, r and r*, are estimated using Yule-Walker equations and substituted into SSM

Simulation results 1. Trend prediction:

Seasonality prediction

Sample error prediction

The project is not complete – work remaining: Further work The project is not complete – work remaining: test whether including (t-1, t, t+1) into the model is necessary (through comparison analysis) test whether the proposed method for sample error estimation is correct test a consistent approach with one used in SSM to estimate the AR(1) coefficients of the pseudo-error consider MA models consider including rotating group bias and the claimant count

Any questions? ping.zong@ons.gsi.gov.uk

Appendix Trigonometric seasonality model was using sines and cosines. where E[ ]=0, E[ ]=0, Var[ ]= Var[ ]= and for j = 1,...,6.

The observation matrix (ZBSM)(5 x17 matrix) Total sate vector ( ) - 32 variables:

The transition matrix (TBSM) (5 x17 matrix)