Incomplete data: Indirect estimation of migration flows Modelling approaches
Aim: Synthetic data base by effective combination of data from different sources
Requirements Data representation: a mathematical model of the ‘complete’ or desired migration data Data types: the different ways of measuring migration –Data on events [relocations] (‘movement data’) Migrations –Data on changes in status [place of residence] Migrants
Requirements Typology of missing or incomplete data Related to data types: what is missing? Typology of available data Related to data types: what is available? –Primary data –Auxiliary data (e.g. historical migration matrix) Measure of reliability of available data. Method to infer missing data from available statistical data and ‘soft’ information on migration
Existing approaches Net migration: residual method Gross migration flows: spatial interaction models –Gravity model –Entropy maximisation –Information-theoretic approaches –Iterative proportional fitting (bi- and multiproportional adjustment [RAS]) Age profile: model migration schedules
The approach Migration is a manifestation of behavioural processes and random processes (choice and chance) Describe the processes and get plausible/accurate parameter estimates based on the (incomplete) data and additional information Apply the model to predict migration flows
Data types Micro-data –Migration data (event data) Occurrence of migration in observation period Time at migration –Migrant data (status data; transition data) Current status Status at two or more points in time (panel) –Equal interval –Unequal interval (e.g. place of birth and place of current residence) Grouped data
Data types Micro-data Grouped data (aggregate data; tabulations) –Migrations (events) –Migrants (transitions) Observation in continuous time (e.g. population register) Observation in discrete time
Types of incompleteness Non-response Net migration vs gross flows Migrants vs migrations (events) Single migration recorded instead of sequence of migrations (e.g. last migration) Partially missing data –e.g. Origin by age or covariates –Some information missing for some persons
Solutions to incomplete data Collect missing information Use ancillary data and/or information on comparable population Live with it and minimise distortions caused by missing data Infer missing data from all the information you can get (combine sources)
Probability models of migration Migration is a realisation of a Poisson process
Log-rate model: rate = events/exposure Gravity model
RAS, Biproportional adjustment, etc.
Likelihood equations may be written as : Marginal totals are sufficient statistics
A different way of writing the spatial interaction model: Link Poisson - Multinomial
The gravity model is a log-linear model The entropy model is a log-linear model The RAS model is as log-linear (log-rate) model
Parameter estimation Maximise (log) likelihood function: probability that the model predicts the data Expectation: predict E[N rs ] = rs given the model and initial parameter estimates. Maximisation: maximise the ‘complete- data’ log-likelihood.
Z ki : Individual k is member of group i
When k and 2 are known, then
Conclusion A unified approach to the prediction of migration from different types of data and different data sources Approach based on probability theory and theory of statistical inference (not ad hoc) The EM algorithm is studied extensively. Much experience gathered. ‘Soft’data (e.g. expert opinions) can be added