Sampling issues related to the implementation of EDSIM/ESIHSI

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Maintaining high quality surveys with optimized interviewers replacements : the new French sample monitoring strategy Sébastien Faivre, INSEE, Head of.
Chapter 7 Sampling Distributions
Continuous improvement of EU-SILC quality: standard error estimation and new quality reporting system Emilio Di Meglio and Emanuela Di Falco (EUROSTAT)
Formalizing the Concepts: STRATIFICATION. These objectives are often contradictory in practice Sampling weights need to be used to analyze the data Sampling.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Joint Canada/U.S. Health Survey Catherine Simile, National Center for Health Statistics Patrice Mathieu, Statistics Canada Ed Rama, Statistics Canada NCHS.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
IB Business and Management
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
9 th Workshop on Labour Force Survey Methodology – Rome, May 2014 The Italian LFS sampling design: recent and future developments 9 th Workshop on.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Handbook on Precision Requirements and Variance Estimation for ESS Household Surveys Denisa Florescu, Eurostat European Conference on Quality in Official.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Oversampling the capital cities in the EU SAfety SUrvey (EU-SASU) Task Force on Victimization Eurostat, February 2010 Guillaume Osier Service Central.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Guillaume Osier Institut National de la Statistique et des Etudes Economiques (STATEC) Social Statistics Division Construction.
Institute of Professional Studies School of Research and Graduate Studies Selecting Samples and Negotiating Access Lecture Eight.
Lecture 5.  It is done to ensure the questions asked would generate the data that would answer the research questions n research objectives  The respondents.
PHIA Surveys: Sample Designs and Estimation Procedures Graham Kalton Westat.
AC 1.2 present the survey methodology and sampling frame used
Sampling and Sampling Distribution
Sampling Meaning, Types, Procedure
Module 9: Choosing the Sampling Strategy
Statistics Stratification.
Sampling.
Sampling.
Sampling.
Part III – Gathering Data
Graduate School of Business Leadership
UNECE Work Session on Gender Statistics Belgrade November, 2017
Organizing national surveys
SAMPLING (Zikmund, Chapter 12.
Weighting issues in EU-LFS
Power, Sample Size, & Effect Size:
Keller: Stats for Mgmt & Econ, 7th Ed Data Collection and Sampling
CONCEPTS OF ESTIMATION
Chapter 7 Sampling Distributions
Data Collection and Sampling
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Effect of Panel Length and Following Rules on Cross-Sectional Estimates of Income Distribution: Empirical Evidence from FI-SILC Marjo Pyy-Martikainen Workshop.
SASU manual: sampling issues
The European Statistical Training Programme (ESTP)
Chapter 8: Weighting adjustment
Market Research Sampling Methods.
Task Force on Victimization Eurostat, October 2011 Guillaume Osier
Estimation of Employment for Cities, Towns and Rural Districts
Emilio Di Meglio and Emanuela Di Falco (EUROSTAT)
BMTRY 738: The Study Population
SAMPLING (Zikmund, Chapter 12).
Sampling and Power Slides by Jishnu Das.
Task Force on Environmental transfers of the Working Group on
Estimating population size and a ratio
Analysis of contact forms: results from the HBS in Luxembourg and Slovenia EU Working Group on the Household Budget Surveys (HBS) Eurostat, 12 May 2010.
Keller: Stats for Mgmt & Econ, 7th Ed Data Collection and Sampling
Agenda item 5.3 EHIS - Implementing Regulation
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Keller: Stats for Mgmt & Econ, 7th Ed Data Collection and Sampling
Sampling and estimation
The European Statistical Training Programme (ESTP)
Sampling.
Chapter 13: Item nonresponse
Workshop on best practices for EU-SILC revision, −
Keller: Stats for Mgmt & Econ, 7th Ed Data Collection and Sampling
Presentation transcript:

Sampling issues related to the implementation of EDSIM/ESIHSI Task Force on Disability Statistics, Eurostat, 28-29 June 2010

Outline The current model and its weaknesses Sampling issues Choice of the sampling units Allocation of the sample accross age and gender groups Sampling of institutions Use of administrative data Minimum sample sizes per country

The current model (1/7)

The current model (2/7) Two different purposes in surveying the disabled population need to be distinguished: The estimation of the prevalence of a disability in the population aged 15 or more or, more generally, the share of persons who declare that barriers hinder their full and effective social integration. The estimation of means of certain variables for persons in the disabled population, or proportions of the disabled population with certain characteristics

The current model (3/7) When we estimate the prevalence rate P of a disability in the population, or the share of persons who declare that barriers hinder their full and effective social integration, the level of accuracy is of order 1/n, where n is the sample size over the whole population aged 15 or more. Thus, by using samples of a few thousand individuals, one can achieve a good level of accuracy, especially when P is low.

The current model (4/7) La division 1: Produits alimentaires et boissons non alcoolisées comprend 62 positions différentes Chaque division est constituée d’un grand nombre de positions ou d’agrégats élémentaires :

The current model (5/7) The EDSIM also aims at drawing conclusions about the individuals having health problems and activity limitations which hinder their full and effective social integration. In this case, the level of accuracy one can hope for does not depend on the sample size n over the whole population aged 15 or more, but on the size of the sub-sample of the individuals who are effectively disabled. For a prevalence rate of P, one could expect an average number of nP disabled persons in the sample. For instance, with n = 8000 individuals and P = 5%, we have on average nP = 400 disabled persons.

The current model (6/7) The problem of accuracy becomes more acute when we consider breakdown estimates. The EDSIM reference population would be disaggregated by age groups (15-24, 25-59, 60-74 and 75+) and gender. When we estimate the prevalence rate of a disability over a given domain, the accuracy depends on the sample size of individuals aged 15 or more who fall into the domain.

The current model (7/7) However, when we consider the disabled persons who fall in a given age-gender category, the sample is generally too small to draw any reliable conclusions. For instance, consider a whole sample size of n = 8000 individuals aged 15+ and a disability having 3% prevalence among the female population aged 15-24. Let’s assume that the female population aged 15-24 accounts for 7% of the total population aged 15+. Thus, the expected sample size of disabled persons among the female population aged 15-24 is lower than 20 !!!

Sampling issues: choice of the sampling units (1/4) Observation units: individuals aged 15 or more Sampling units: depends on the sampling frames available at national level Frame of individuals: Population Register  Sampling units = individuals Frame of households: List of fixed phone numbers/Master sample/household register  Sampling units = households or individuals

Sampling issues: choice of the sampling units (2/4) On the other hand, the draft EDSIM Regulation requires that information be collected at household level. Thus, a two-stage cluster sampling of individuals can be envisaged: selection of geographical zones at the first stage to reduce the travel costs, selection of households at the second stage and then collection of information for all members aged 15+ in the selected households.

Sampling issues: choice of the sampling units (3/4) The selection of households should not create extra problems to the countries having already access to a frame of households (household register or master sample). On the contrary, if there is only a frame of individuals, a solution is to pick up a sample of individuals (aged 15 or more) and consider the households which the individuals belong to. This solution leads to selecting a sample of households with unequal probabilities proportional to the household size in individuals aged 15 or more)

Sampling issues: choice of the sampling units (4/4) In terms of statistical precision, assuming a simple random sampling of m households and a simple random sampling of n individuals within each of the m selected households, the variance of a proportion P (disability prevalence) is given by: = intra-household correlation coefficient > 0  > 0

Sampling issues: allocation across age-gender categories (1/3) Breakdowns by age and gender are required: Gender (M/F) Age groups (15-24, 25-59, 60-74, 75+) Under equal allocation, it would be difficult to get reliable data on the disabled for some breakdowns, particularly those wherein disabilities ought to be the least prevalent (e.g, the 15-24 years-old)

Sampling issues: allocation across age-gender categories (2/3) The sample size nh for age-gender category h should be determined so to ensure on average the same number of disabled in each category:

Sampling issues: allocation across age-gender categories (3/3) Ph is the prevalence rate of the disability,P is the harmonic mean of the Ph over all the H age-gender categories Ph can be calculated from another survey, for instance EU-SILC (variable PH030 – « Limitation in activities because of health problems »)

Sampling issues: sampling of institutions (1/4) Resources should be invested in order to select a sample of institutions for the EDSIM in addition to a sample of individuals living in private households. Although persons living institutions represent a minor part of the country population, it should convey a great deal of information about the population having disabilities, particularly when we consider medical institutions like chronic care hospitals or psychiatric institutions.

Sampling issues: sampling of institutions (2/4) Taking an additional sample of institutions is beneficial for statistical precision in that it makes the sample size higher, and especially the sample size of disabled people. On the other hand, one has to keep in mind that including institutionalised people would affect estimates because these people must have different characteristics in terms of disability than those living in private households. This would pose serious threat to comparability between the countries if some of them decided to use a sample of institutions, while the others did not.

Sampling issues: sampling of institutions (3/4) Certain countries have access to a Central Population Register which records all the individuals residing in the country, both in private households and institutions. A representative sample of the population can be selected from the register using whatever probability sampling design. On the other hand, some countries cannot benefit from such statistical infrastructure. Those countries are used to conduct a Census on a regular basis and use the data for sampling purposes. At this stage, Census data can be used as the basis for the selection of institutions.

Sampling issues: sampling of institutions (4/4) Institutions can be grouped into size categories based on the number of each institution's permanent residents. Then, a sample of institutions is selected based on type and size. All large institutions should be included in the survey, while samples of institutions are taken from the smaller subgroups. The administrator of each institution is asked to provide a list of all permanent residents aged 15 and over. Then, a sample of residents is selected from the list. For operational reasons, a fixed number of residents should be interviewed in all selected institutions.

Sampling issues: use of administrative data (1/4) Administrative data can be used in three directions in order to improve the sampling design: They help identify individuals who are at-risk-of-disability by bringing auxiliary variables in (e.g, number of medical visits, level of health expenditures). Then, the risk population can be over-sampled Multiple frames can be used in order to select extra samples of individuals who are suffering specific disabilities (e.g, people who are entitled for a disability allowance) Calibration of the sample: we adjust the weights so to conform certain population totals

Sampling issues: use of administrative data (2/4)

Sampling issues: use of administrative data (3/4) Consider a set of H frames, denoted as U1, U2 … UH. We select from each frame Ui (i = 1…H). A sample si is taken from Ui using a representative probability selection (which may be different from one frame to another). Let pi,k be the corresponding estimation weight of k in Ui (pi,k = 0 if k not in si). Thus, the resulting sample s for EDSIM is made up of the union of the samples si (i = 1…H), with duplicates being counted once.

Sampling issues: use of administrative data (4/4) Consider now a unit k in the sample s. Let Lk be the total number of frames (U1, U2 … UH) the unit k belongs to: Lk represents the multiplicity of k, that is, the number of times k can be selected. According to the Generalized Weight Share Method (GWSM), the estimation weight of k is given by: = pi,k if k in Ui, 0 otherwise

Minimum sample sizes: Proportionate allocation Proportionate allocation allocates a sample of size n using a sampling fraction in each of the countries that is proportional to that of the total population aged 15 or more. Let nh be the sample size in country h and Nh the country size (in number of individuals aged 15 or more). Proportionate allocation implies:

Minimum sample sizes: Optimal allocation The sample is allocated so as to achieve the smallest possible standard error of a given estimate. Let nh denote the sample size in country h, Nh the country size (in number of individuals aged 15 or more) and Sh the within-country standard deviation of the target variable. Optimal (Neyman) allocation implies:

Minimum sample sizes: Compromise allocation (1/2) We seek a balance between country optimality (same sample size per country) and EU optimality (Neyman allocation). Different criteria can be used:

Minimum sample sizes: Compromise allocation (2/2) Another way of compromising between country and EU accuracies is to allocate a given number of interviews n*h in country h. Let n* = h n*h . As for the n – n* remaining interviews, the can allocated using one of the methods which were previously presented (proportionate/optimal/compromise)

Nonresponse and design effects The precision requirements are based on the assumption that the sampling units are selected with simple random sampling and that there is no nonresponse. Such an assumption is never valid in practice and one has to take account of the effects of nonresponse and sampling design on statistical accuracy in order to modify the sample size accordingly. If not, the sample will lose precision and we may actually miss our targets in terms of statistical accuracy.

Thank you for your attention