Integration of inconsistent data sources using Hidden Markov Models (HMMs) Paulina Pankowska, Bart Bakker, Daniel Oberski & Dimitris Pavlopoulos.

Slides:



Advertisements
Similar presentations
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Advertisements

Estimating the Level of Underreporting of Expenditures among Expenditure Reporters: A Further Micro-Level Latent Class Analysis Clyde Tucker Bureau of.
Paul Biemer, UNC and RTI Bac Tran, US Census Bureau Jane Zavisca, University of Arizona SAMSI Conference, 11/10/2005 Latent Class Analysis of Rotation.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
T-tests continued.
Copyright EM LYON Par accord du CFC Cession et reproduction interdites Research in Entrepreneurship- The problem of unobserved heterogeneity Frédéric Delmar.
Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational.
The adjustment of the observations
Pooled Cross Sections and Panel Data II
TOOLS OF POSITIVE ANALYSIS
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
1 Assessing inconsistencies in reported job characteristics of employed stayers: An analysis on two-wave panels from the Italian Labour Force Survey,
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Education 793 Class Notes Decisions, Error and Power Presentation 8.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
7 - 1 © 1998 Prentice-Hall, Inc. Chapter 7 Inferences Based on a Single Sample: Estimation with Confidence Intervals.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Bayesian Travel Time Reliability
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
OXFORD INSTITUTE OF AGEING Oxford Institute of Ageing Developing individualised life tables BSPS Annual Conference 12 September 2007 Martin KarlssonLes.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
AC 1.2 present the survey methodology and sampling frame used
Multilevel modelling: general ideas and uses
Missing data: Why you should care about it and what to do about it
Handling Attrition and Non-response in the 1970 British Cohort Study
Lurking inferential monsters
The University of Manchester
The user as data detective
Assessing Disclosure Risk in Microdata
Monitoring international migration flows in Europe Frans Willekens
“Victor Babes” UNIVERSITY OF MEDICINE AND PHARMACY TIMISOARA
The Centre for Longitudinal Studies Missing Data Strategy
John Loucks St. Edward’s University . SLIDES . BY.
Sampling And Sampling Methods.
Chapter Three Research Design.
Multiple Imputation Using Stata
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Employers, Founders and Gazelles: Developments in Business Demography in Europe Since th International Roundtable on Business Survey Frames - Wiesbaden.
Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017
The University of Manchester
Hidden Markov Autoregressive Models
Use of the business register in the Dutch labour statistics
Education and Training Statistics Working Group Meeting 5/6 June 2012 Item 4.1 Revised legal framework for CVT statistics Sylvain Jouhette 5/6 June.
Informal Caregiving Formal Employment.
ERRORS, CONFOUNDING, and INTERACTION
Estimation of Employment for Cities, Towns and Rural Districts
The Effect of Survey Measurement Error on Clustering Algorithms
A Metric for Evaluating Static Analysis Tools
Chapter 7: The Normality Assumption and Inference with OLS
Fixed, Random and Mixed effects
Quality evaluation of register-based statistics
Research in Psychology
Non response and missing data in longitudinal surveys
in the Spanish Labour Market:
Positive analysis in public finance
University of Minnesota and CEPR
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Frédéric Haldi, Darren Robinson, Claus Pröglhöf, Ardeshir Mahdavi
Type I and Type II Errors
Fredrik Olsson, Statistics Sweden,
FROM SCHOOL TO LABOUR MARKET PROJECTS IN ISRAEL Dalit COHEN-LERNER
Statistical Power.
SMALL AREA ESTIMATION FOR CITY STATISTICS
Testing Causal Hypotheses
Presentation transcript:

Integration of inconsistent data sources using Hidden Markov Models (HMMs) Paulina Pankowska, Bart Bakker, Daniel Oberski & Dimitris Pavlopoulos

Background- measurement error and HMMs Measurement error: threat to official statistics NSI’s deal with the problem by: Using only the “superior” data source (timely statistics) Applying macro-integration (definite statistics) An alternative solution: Applying latent variable modelling Latent class modelling (LCM) Hidden Markov Models (HMMs) HMMs Consist of: A structural part (latent/ true) A measurement part (observed)

HMMs produCING official statistics? HMMs potentially attractive tool to correct for measurement error But…. They are complex, time consuming and expensive In most cases they require record linkage Might be (too) sensitive to changes in data collection processes Our research focus: Can HMMs be used to correct for measurement error in official statistics? Feasibility of parameter re-use Sensitivity to linkage error Effects of data collection processes (independent vs dependent interviewing) 3 Faculty / department / title presentation

Data Linked dataset: 8,886 individuals aged 25 to 55 Labour Force Survey (LFS) Employment Register (ER) 8,886 individuals aged 25 to 55 15 time points (months) per individual

The hidden markov Model An extended two-indicator HMM Two indicators per time point Autocorrelation of error in register (Un)observed heterogeneity in latent initial probabilities and transitions Heterogenous latent transitions

feasibility of parameter re-use Analysing 2009 data ‘from scratch’ compared to Re-using error parameters based on 2007 model estimates from Pavlopoulos and Vermunt (2015) 3 monthly transition rates from temporary to permanent employment LFS 0.058 ER 0.073 Original analysis of 2009 0.017 Re-using parameters from 2007 0.016 6

Relative bias from linkage error Simulations with various degrees of false-negative (FN) and false-positive (FP) linkage errors: 5, 10 and 20% of the individuals are excluded or mislinked Probabilities are (i) random (ii) depend on covariates correlated with outcome variable Individuals selected are mislinked (i) at random (ii) based on common characteristics – only for FP HMM estimations for each scenario are compared to original (‘linkage error free’) results False-positive False-negative 7

The effect of dependent interviewing Temporary contracts DI: In place until end of 2009 PDI- “remind, still” Only if no job change occurred INDI: In place since 2010 Also in 2009 if job change occurred “Last time you had a temporary contract. Is this still the case?” “Do you currently have a permanent contract?” 8

The effect of dependent interviewing Autocorrelation not incl. Misclassification rates for those who had or would have had DI Contract distribution and transition rates (overall and by wave) DI reduces random measurement error DI increases systematic error (overall and by wave) Scenario Overall Wave 1 2 3 4 5 No job change & 2009- DI 0,301 - 0,290 0,307 0,306 No job change & 2010- INDI 0,297 0,291 0,295 0,300 0,305 Transition rate 2009 2010 Observed Latent Temp to perm 0,07 0,05 0,11 Autocorrelation not incl. Autocorrelation incl. Contract type 0,59 0,60 Permanent 0,14 Temporary 0,27 0,26 Other 0,044 0,051 Temp to perm transition rate Observed at t-1 Latent at t-1 Latent at t Observed at t Perm Temp Other Temporary Permanent 0,10 0,90 0,00 0,87 0,13 9

Conclusions- what we know and what we want to know Can we use HMMs to correct for ME when: There are more than 3 contract categories There are more covariates affecting the structural and/or measurement part of the HMM How robust are HMMs to changes in data collection processes? How can we combine HMMs and MI? Allows correcting for error in microdata HMMs can potentially be used in the production of official statistics? Inexpensive- possible to re-use (error) parameters Linkage error largely not a problem Different data collection methods do not necessarily affect ME 10 Faculty / department / title presentation

Thank you! Paulina Pankowska p.k.p.pankowska@vu.nl Bart Bakker bfm.bakker@cbs.nl Daniel Oberski d.l.oberski@uu.nl Dimitris Pavlopoulos d.pavlopoulos@vu.nl 11 Faculty / department / title presentation