Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling issues related to the implementation of EDSIM/ESIHSI

Similar presentations


Presentation on theme: "Sampling issues related to the implementation of EDSIM/ESIHSI"— Presentation transcript:

1 Sampling issues related to the implementation of EDSIM/ESIHSI
Task Force on Disability Statistics, Eurostat, June 2010

2 Outline The current model and its weaknesses Sampling issues
Choice of the sampling units Allocation of the sample accross age and gender groups Sampling of institutions Use of administrative data Minimum sample sizes per country

3 The current model (1/7)

4 The current model (2/7) Two different purposes in surveying the disabled population need to be distinguished: The estimation of the prevalence of a disability in the population aged 15 or more or, more generally, the share of persons who declare that barriers hinder their full and effective social integration. The estimation of means of certain variables for persons in the disabled population, or proportions of the disabled population with certain characteristics

5 The current model (3/7) When we estimate the prevalence rate P of a disability in the population, or the share of persons who declare that barriers hinder their full and effective social integration, the level of accuracy is of order 1/n, where n is the sample size over the whole population aged 15 or more. Thus, by using samples of a few thousand individuals, one can achieve a good level of accuracy, especially when P is low.

6 The current model (4/7) La division 1: Produits alimentaires et boissons non alcoolisées comprend 62 positions différentes Chaque division est constituée d’un grand nombre de positions ou d’agrégats élémentaires :

7 The current model (5/7) The EDSIM also aims at drawing conclusions about the individuals having health problems and activity limitations which hinder their full and effective social integration. In this case, the level of accuracy one can hope for does not depend on the sample size n over the whole population aged 15 or more, but on the size of the sub-sample of the individuals who are effectively disabled. For a prevalence rate of P, one could expect an average number of nP disabled persons in the sample. For instance, with n = 8000 individuals and P = 5%, we have on average nP = 400 disabled persons.

8 The current model (6/7) The problem of accuracy becomes more acute when we consider breakdown estimates. The EDSIM reference population would be disaggregated by age groups (15-24, 25-59, and 75+) and gender. When we estimate the prevalence rate of a disability over a given domain, the accuracy depends on the sample size of individuals aged 15 or more who fall into the domain.

9 The current model (7/7) However, when we consider the disabled persons who fall in a given age-gender category, the sample is generally too small to draw any reliable conclusions. For instance, consider a whole sample size of n = 8000 individuals aged 15+ and a disability having 3% prevalence among the female population aged Let’s assume that the female population aged accounts for 7% of the total population aged 15+. Thus, the expected sample size of disabled persons among the female population aged is lower than 20 !!!

10 Sampling issues: choice of the sampling units (1/4)
Observation units: individuals aged 15 or more Sampling units: depends on the sampling frames available at national level Frame of individuals: Population Register  Sampling units = individuals Frame of households: List of fixed phone numbers/Master sample/household register  Sampling units = households or individuals

11 Sampling issues: choice of the sampling units (2/4)
On the other hand, the draft EDSIM Regulation requires that information be collected at household level. Thus, a two-stage cluster sampling of individuals can be envisaged: selection of geographical zones at the first stage to reduce the travel costs, selection of households at the second stage and then collection of information for all members aged 15+ in the selected households.

12 Sampling issues: choice of the sampling units (3/4)
The selection of households should not create extra problems to the countries having already access to a frame of households (household register or master sample). On the contrary, if there is only a frame of individuals, a solution is to pick up a sample of individuals (aged 15 or more) and consider the households which the individuals belong to. This solution leads to selecting a sample of households with unequal probabilities proportional to the household size in individuals aged 15 or more)

13 Sampling issues: choice of the sampling units (4/4)
In terms of statistical precision, assuming a simple random sampling of m households and a simple random sampling of n individuals within each of the m selected households, the variance of a proportion P (disability prevalence) is given by: = intra-household correlation coefficient > 0  > 0

14 Sampling issues: allocation across age-gender categories (1/3)
Breakdowns by age and gender are required: Gender (M/F) Age groups (15-24, 25-59, 60-74, 75+) Under equal allocation, it would be difficult to get reliable data on the disabled for some breakdowns, particularly those wherein disabilities ought to be the least prevalent (e.g, the years-old)

15 Sampling issues: allocation across age-gender categories (2/3)
The sample size nh for age-gender category h should be determined so to ensure on average the same number of disabled in each category:

16 Sampling issues: allocation across age-gender categories (3/3)
Ph is the prevalence rate of the disability,P is the harmonic mean of the Ph over all the H age-gender categories Ph can be calculated from another survey, for instance EU-SILC (variable PH030 – « Limitation in activities because of health problems »)

17

18 Sampling issues: sampling of institutions (1/4)
Resources should be invested in order to select a sample of institutions for the EDSIM in addition to a sample of individuals living in private households. Although persons living institutions represent a minor part of the country population, it should convey a great deal of information about the population having disabilities, particularly when we consider medical institutions like chronic care hospitals or psychiatric institutions.

19 Sampling issues: sampling of institutions (2/4)
Taking an additional sample of institutions is beneficial for statistical precision in that it makes the sample size higher, and especially the sample size of disabled people. On the other hand, one has to keep in mind that including institutionalised people would affect estimates because these people must have different characteristics in terms of disability than those living in private households. This would pose serious threat to comparability between the countries if some of them decided to use a sample of institutions, while the others did not.

20 Sampling issues: sampling of institutions (3/4)
Certain countries have access to a Central Population Register which records all the individuals residing in the country, both in private households and institutions. A representative sample of the population can be selected from the register using whatever probability sampling design. On the other hand, some countries cannot benefit from such statistical infrastructure. Those countries are used to conduct a Census on a regular basis and use the data for sampling purposes. At this stage, Census data can be used as the basis for the selection of institutions.

21 Sampling issues: sampling of institutions (4/4)
Institutions can be grouped into size categories based on the number of each institution's permanent residents. Then, a sample of institutions is selected based on type and size. All large institutions should be included in the survey, while samples of institutions are taken from the smaller subgroups. The administrator of each institution is asked to provide a list of all permanent residents aged 15 and over. Then, a sample of residents is selected from the list. For operational reasons, a fixed number of residents should be interviewed in all selected institutions.

22 Sampling issues: use of administrative data (1/4)
Administrative data can be used in three directions in order to improve the sampling design: They help identify individuals who are at-risk-of-disability by bringing auxiliary variables in (e.g, number of medical visits, level of health expenditures). Then, the risk population can be over-sampled Multiple frames can be used in order to select extra samples of individuals who are suffering specific disabilities (e.g, people who are entitled for a disability allowance) Calibration of the sample: we adjust the weights so to conform certain population totals

23 Sampling issues: use of administrative data (2/4)

24 Sampling issues: use of administrative data (3/4)
Consider a set of H frames, denoted as U1, U2 … UH. We select from each frame Ui (i = 1…H). A sample si is taken from Ui using a representative probability selection (which may be different from one frame to another). Let pi,k be the corresponding estimation weight of k in Ui (pi,k = 0 if k not in si). Thus, the resulting sample s for EDSIM is made up of the union of the samples si (i = 1…H), with duplicates being counted once.

25 Sampling issues: use of administrative data (4/4)
Consider now a unit k in the sample s. Let Lk be the total number of frames (U1, U2 … UH) the unit k belongs to: Lk represents the multiplicity of k, that is, the number of times k can be selected. According to the Generalized Weight Share Method (GWSM), the estimation weight of k is given by: = pi,k if k in Ui, 0 otherwise

26 Minimum sample sizes: Proportionate allocation
Proportionate allocation allocates a sample of size n using a sampling fraction in each of the countries that is proportional to that of the total population aged 15 or more. Let nh be the sample size in country h and Nh the country size (in number of individuals aged 15 or more). Proportionate allocation implies:

27 Minimum sample sizes: Optimal allocation
The sample is allocated so as to achieve the smallest possible standard error of a given estimate. Let nh denote the sample size in country h, Nh the country size (in number of individuals aged 15 or more) and Sh the within-country standard deviation of the target variable. Optimal (Neyman) allocation implies:

28 Minimum sample sizes: Compromise allocation (1/2)
We seek a balance between country optimality (same sample size per country) and EU optimality (Neyman allocation). Different criteria can be used:

29 Minimum sample sizes: Compromise allocation (2/2)
Another way of compromising between country and EU accuracies is to allocate a given number of interviews n*h in country h. Let n* = h n*h . As for the n – n* remaining interviews, the can allocated using one of the methods which were previously presented (proportionate/optimal/compromise)

30 Nonresponse and design effects
The precision requirements are based on the assumption that the sampling units are selected with simple random sampling and that there is no nonresponse. Such an assumption is never valid in practice and one has to take account of the effects of nonresponse and sampling design on statistical accuracy in order to modify the sample size accordingly. If not, the sample will lose precision and we may actually miss our targets in terms of statistical accuracy.

31 Thank you for your attention


Download ppt "Sampling issues related to the implementation of EDSIM/ESIHSI"

Similar presentations


Ads by Google