ISTAT - Italian National Institute of Statistics Labour Force Survey Division Unit “Methods for LFS data treatment” 5 th Workshop on LFS methodology Paris, April 2010 Weighting Issues in LFS Longitudinal data Antonio R. Discenza Francesca Fiori Carlo Lucarelli
5th Workshop on LFS Methodology – Paris – April 2010 Longitudinal data from LFS Given the rotational pattern, is possible to match records for individual partecipating in two or more quarters. For example, in Quarter we have Cross-sectional LFS data 12 months Longitudinal LFS data –Quarter – Quarter 3 months Longitudinal LFS data –Quarter – Quarter Is possible to “weight” matched individual records in order to obtain coherent longitudinal estimates with the usual quarterly data resulting from the cross-sectional LFS
5th Workshop on LFS Methodology – Paris – April 2010 Outline of the presentation Issues related to the production of gross flows estimates consistent with the quarterly estimates already disseminated; a specific focus is devoted to the weighting procedure, which account both for the reference population and compensate for the total non-response at subsequent waves the most relevant methodological problems addressed are: definition of a suitable reference population for the longitudinal sample longitudinal non-responses and eligibility coherence between cross-sectional and longitudinal estimates
5th Workshop on LFS Methodology – Paris – April 2010 Net changes in quarterly levels are the final result of a high number of gross flows of different nature and different size Demographic flows: –Children aged 15 entering working age –Deaths –Internal and International migration Labour status transitions: –Flows between the three main activity states (employment, unemployment and inactivity)
5th Workshop on LFS Methodology – Paris – April 2010 GROSS LABOUR MARKET FLOWS EMPLOYMENT UNEMPLOYMENT INACTIVE DEATHS AND PEOPLE LEAVING MUNICIPALITIES CHILDREN AGED 15 AND PEOPLE ENTERING MUNICIPALITIES
5th Workshop on LFS Methodology – Paris – April 2010 Definition of the Reference Population The longitudinal component (sub-sample) of the Italian LFS requires the specification of a suitable reference population. In principle there are several possibilities to define the reference population for the longitudinal LFS sample, but the choice depends on the sample design, on the survey design and on the availability of population totals for weighting. The choice is anyway imposed by the fact that the LFS longitudinal component is not a real panel, and this has a direct effect on the transition matrix (matrix of stocks and flows) that can be built.
5th Workshop on LFS Methodology – Paris – April 2010 Choice 1: the reference population is equal to the population of the initial quarter Ideally, longitudinal data from LFS should represent the whole initial population. However, the initial population actually change during the period of observation because of deaths and internal and international migrations. Thus longitudinal data could represent the whole initial population only if the LFS was designed like a “proper” panel, in which all the individuals in the initial sample were “followed” for a new interview at a later stage. This means that the information must be collected also on people moving to another municipality or to another country. It also implies that we could identify those individuals dying during the period.
5th Workshop on LFS Methodology – Paris – April 2010 Scheme 1. Classification of individuals from the initial sample and eligibility in case of a real panel and absence of longitudinal non- response.
5th Workshop on LFS Methodology – Paris – April 2010 Table 1: Complete Matrix with net and gross flows. in case of a real panel and absence of longitudinal non-response. Quarter – Quarter (Thousands) In this “theoretical” situation we would be able to obtain nearly unbiased flow estimates for all the initial population, but in practice, we are quite far from it.
5th Workshop on LFS Methodology – Paris – April 2010 Choice 2: the reference population is a specific longitudinal population Usually, in the LFS, people moving out of the household are not “followed” for re-interview. Moreover, information on deaths of individuals which were in the initial sample is not available. Under this situation we should think whether it is still correct to use the initial population as the reference population. In fact, we only have longitudinal information about those individuals still resident in the same municipality at the end of the period. If we weight the longitudinal sample to the initial population we make a very strong assumption: the behaviour of individuals which moves out of the municipality from one wave to another is similar to those who do not move.
5th Workshop on LFS Methodology – Paris – April 2010 Scheme 2. Classification of individuals from the initial sample and eligibility in the Italian LFS (hypothesis of absence of longitudinal non- response).
5th Workshop on LFS Methodology – Paris – April 2010 Choice 2: the reference population is a specific longitudinal population If we weight the longitudinal sample to the initial population we make a very strong assumption: the behaviour of individuals which moves out of the municipality from one wave to another is similar to those who do not move. We have, thus, at least two problems: Actually, at least in Italy, these two groups are very different Moreover, if we use the longitudinal microdata to produce flow estimates, there are no records of individuals moving to other regions/country and/or dying (whereas they do exist in the population).
5th Workshop on LFS Methodology – Paris – April 2010 Let’s make an example: at the beginning of the period we have two unemployed persons with the same characteristics, resident in a region where most of the population is employed in agriculture; if one person continue to stay in the same region and gets a job, at the end of the period, it will be most probably in the agriculture sector; if the other person move to another region (where most of the labour demand is in construction) and gets a job, at the end of the period, it will be most probably in construction. This second person cannot be interviewed and correcting for non-response it will be represented by the first person. Is that correct ?
5th Workshop on LFS Methodology – Paris – April 2010 Definition of longitudinal population The Longitudinal Population is defined in Italy as the population which is resident in the same municipality for the entire 12 months period excluding –deaths –those who have moved to other Italian municipalities (change of residence) –Migrants to other countries It is computed from population register data on resident population; it is classified by broad age groups, geographical area (NUTS III) and nationality (Italian, EU, non-EU) It is fully consistent with the reference population of EU-LFS quarterly data was also ensured.
5th Workshop on LFS Methodology – Paris – April 2010 Advatages of using the longitudinal population The advantages of this choice are the following: The assumption of similarity between the behaviour of the individuals which move out of the municipality from one wave to another and of those who do not move is not required Estimates of stock and flows from longitudinal data will probably have a much smaller bias Unfortunately, even the situation represented by Scheme 2 is “theoretical” and unrealistic. In fact it assumes that we have longitudinal information about all those individuals, in the initial sample, still resident in the same municipality at the end of the period.
5th Workshop on LFS Methodology – Paris – April 2010 Longitudinal non-responses and eligibility A very important aspect of the longitudinal component of the LFS is usually affected by unit non-response in subsequent waves, such as: Municipality non-response: some (very small) municipalities are substituted in July at the beginning of a new annual survey cycle and some others may, for different reasons, fail to provide the interviews in subsequent waves; Household non-response: all the members of the household do not fill in the questionnaire because they refuse to respond; Individual non-response: some members of the household do not fill in the questionnaire because they refuse to respond, or they cannot be contacted or left the household to create a new household in the same municipality. Unit non-response may produce bias if non-respondents have significantly different labour features with respect to respondents.
5th Workshop on LFS Methodology – Paris – April 2010 Eligibility All the individuals can be classified into two groups: Eligible: –they represent part of the longitudinal population (because still living in the same municipality), –they should be re-interviewed at the subsequent wave. –some of them are non-respondents in the final quarter, so that they must be considered in a model for treatment of non- response (they must be represented by individuals with similar characteristics); Not-eligible: –they left the initial population during the observed period (deaths and migrations) –they do not represent part of the longitudinal population –they must be excluded from a model for treatment of non- response.
5th Workshop on LFS Methodology – Paris – April 2010 Scheme 3: Classification of individuals from the initial sample and eligibility in the Italian LFS (in presence of longitudinal non-response).
5th Workshop on LFS Methodology – Paris – April 2010 Limitations in the use of models for non-response treatment When using models/methods for treatment of longitudinal non- response we must face another important issue: usually, we don’t have enough information to distinguish eligible from not-eligible individuals. To be more precise, we cannot distinguish non-respondents which are not-eligible from non-respondents which are eligible. As immediate consequence, this means that it is not possible to use methods based on logistic regression models to compensate for longitudinal non-response in the Italian LFS. The use of this kind of model, in fact, requires that the last subgroup (non-respondents which are eligible) is perfectly identified in order to adjust the weights of similar matched individuals.
5th Workshop on LFS Methodology – Paris – April 2010 Coherence between cross-sectional and longitudinal estimates The last problem is that the longitudinal component produces both cross-sectional and longitudinal estimates referred to the longitudinal population. These cross-sectional estimates obtained by the longitudinal data have to be consistent (i.e. not higher than) with the “official” estimates provided by the cross-sectional samples (the full sample) at the beginning and at the end of the observed period. Given that longitudinal estimates have higher variability than quarterly estimates, it is not possible to control their consistency completely. However, it is possible to reduce the risk of obtaining inconsistent results by using specific constraints in the calibration procedure used to weight sample data.
5th Workshop on LFS Methodology – Paris – April 2010 Table 2: Complete Matrix with net and gross flows. Quarter – Quarter (Thousands) These two vectors are obtained by difference
5th Workshop on LFS Methodology – Paris – April 2010 Weighting procedure for longitudinal data Final longitudinal weights have been obtained in three steps and after two calibration stages: the first stage –accounting for bias due to municipality non-response –accounting for the differences between the rotation groups which overlaps and those who don’t; the last stage –adjusting for bias due to individual non-response –to make weighted longitudinal-sample totals conform to the longitudinal population. –to ensure consistency with quarterly estimates (for the most relevant figure at national, nuts2 and nuts3 level, by gender and age group ).
5th Workshop on LFS Methodology – Paris – April 2010 Step 1 : All the individuals which are linkable at the beginning of the period are selected. are all the individuals of the two rotation groups which overlap at 12 month and resident in those municipalities which provided interviews for both waves; they can be considered like a random sub-sample of the whole cross-sectional sample their base longitudinal weights are obtained from cross- sectional weights applying the following correction Step 1 of the weighting procedure
5th Workshop on LFS Methodology – Paris – April 2010 Step 2 : In order to ensure consistency between longitudinal and cross- sectional “official” estimates, the first calibration procedure makes linkable individuals at the beginning of the period represent exactly the same cross-sectional population of the whole cross-sectional sample. provide exactly the same cross sectional “official” estimates for a number of relevant figures (cross-classification of sex, region, age group, labour activity status, education, etc.). Steps of the weighting procedure
5th Workshop on LFS Methodology – Paris – April 2010 Step 2 : Thus, from the base longitudinal weights and for all the linkable individuals the intermediate longitudinal weights are obtained as result of a minimum constrained problem as follows: Step 2 of the weighting procedure
5th Workshop on LFS Methodology – Paris – April 2010 only for linked individuals the final longitudinal weights are computed applying a new calibration stage to make weighted longitudinal component totals conform to the longitudinal population under the following constraints The hypothesis that we make is the non-response is random inside the cells resulting from nesting population by gender, by age groups and NUTS1, NUTS2 and NUTS3 domains Step 3 of the weighting procedure
5th Workshop on LFS Methodology – Paris – April 2010 Flow chart of weighting procedure INTERMEDIATE WEIGHTS FINAL WEIGHTS INTERMEDIATE WEIGHTS
5th Workshop on LFS Methodology – Paris – April Deaths Employed Unemployed Inactive Total Total Labour Status at 2008Q1 Inactive Longitudinal PopulationEmployedUnemployed Labour Status at 2007Q1 Net change due to Longitudinal Population flows People Leavingthe Municipalities Population aged Q Children aged People Entering the Municipalities Population aged Q1 Net change in cross-sectional employment +324 Net change due to Migratory flows Net change due to Demographic flows - 49 Complete Matrix with net and gross flows. Quarter – Quarter (Thousands)
5th Workshop on LFS Methodology – Paris – April 2010 Transition Matrix for longitudinal population. Quarter – Quarter (Thousands) Employed Unemployed Inactive Total Total Labour Status at 2008Q1 Inactive Longitudinal PopulationEmployedUnemployed Labour Status at 2007Q1 Net change +105 Leaving employment Entering employment Persistence in employment almost movements
5th Workshop on LFS Methodology – Paris – April 2010 Main Findings Potentials Longitudinal data provide extremely useful insights on labour market dynamics Are obtained without additional costs, but with high investment in methodology Can be produced regularly on quarterly bases Constraints EU-LFS is not a panel survey, thus longitudinal estimates can refer only to a specific longitudinal reference population Known totals for this longitudinal reference population must be available for weighting Methods for non-response treatment must be used to reduce bias Methods to ensure consistency with cross-sectional estimates must be used
5th Workshop on LFS Methodology – Paris – April 2010 Analysis of Employment from Quarter – Quarter using 12 months longitudinal data (provisional estimates)
5th Workshop on LFS Methodology – Paris – April 2010 Women have lower persistence probability and higher transition probability to inactivity Employment: persistence and transition probabilities by gender and region. 2007Q1 – 2008Q1 South has lower persistence probability and much higher transition probability
5th Workshop on LFS Methodology – Paris – April 2010 High segmentation in persistence and transition Employment: persistence and transition probabilities by job characteristics. 2007Q1 – 2008Q1
5th Workshop on LFS Methodology – Paris – April 2010 Unemployment: transition probability to employment by duration of search at the starting point Transition probability is inversely correlated to the duration of search for employment Opportunities to get an Employment for long term Unemployed are stable in the period
5th Workshop on LFS Methodology – Paris – April 2010 Huge differences in the persistence and transition probabilities between North and South Unemployment: persistence and transition probabilities by sex and NUTS1 region. 2007Q1 – 2008Q1 Higher probability to get an employment for men Higher probability to leave labour force for women
5th Workshop on LFS Methodology – Paris – April 2010 THANK YOU FOR YOUR ATTENTION …. AND FOR YOUR GREAT PATIENCE