Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling and estimation

Similar presentations


Presentation on theme: "Sampling and estimation"— Presentation transcript:

1 Sampling and estimation
ESTP course on SBS, 17 – 19 October 2012 Photis Stavropoulos

2 Probability sample surveys
Sample survey: the collection of information in an organised and methodical manner from a fraction (the sample) of the units of a population. Probability sample survey: the members of the sample are selected at random, with known probabilities of selection. Material borrowed from Statistics Canada (2003) Survey methods and practices. Ottawa: Statistics Canada. This slide replaces the earlier slide about what is survey methodology.

3 Population and sample Target population
Example: all enterprises with principal activity ‘manufacture of leather and related products’ Frame population Survey population Sample

4 Stratified random sampling (StrRS)
Stratification variables: Activity Size class by: employment turnover Additional: region Frame population Stratification by Activity Stratification by Size Class 1. Removed the bullet point about combination of survey with administrative data. Moved StrRS to the heading. 2. I did not understand the term “unplanned domain” next to stratum h and I removed it. For me domain is a sub-population therefore the stratum seems planned, i.e. you know how many members of it will be in the sample. Stratum h Sample drawn by Simple Random Sampling (SRS)

5 Applicability of StrRS
The variables of interest (e.g. purchases) vary widely across elements of population The enterprises can be grouped (e.g. by number of employees) and within groups (strata) their variables of interest (e.g. purchases) are close in value Stratification variables (e.g. number of employees) are available and of good quality in the frame

6 Pro’s and con’s of StrRS
Lower sampling errors as compared to SRS All strata are represented in the sample Frame population is already divided into sub-populations Well suited to business surveys Stratification criteria have to be adequate Highly reliant on register data Some strata may be too small 1. Added a fourth advantage od StRS, that it is suitable for skewed business populations (I will mention skewness while speaking).

7 Sampling and estimation
Population in stratum h = Nh Sample in stratum h = nh= gross sample Inclusion probability of units of stratum h: Sampling weight in stratum h:

8 Estimation Estimation of totals (e.g. total value of purchases):
One term per stratum. Example for stratum 1 (h=1): N1 = 10; n 1 = 5 (small numbers only for simplification of the example) y (value of purchases) for the 5 sampled enterprises = 4; 6; 10; 20; 10 Horvitz-Thomson estimator Interpretation: each sampled enterprise represents enterprises within each stratum

9 The importance of good stratification for estimation
Example: Variable of interest: value of purchases; N=4; n=2. Case Unit code Unknown true value of purchases (for N) Value of purchases known for sample (n) Estimated value of purchases, for N, from sample (n) 1 2 3 4 8 9 10 Total 36 34 (close to 36) 17 20 (not close to 36)

10 The importance of a good stratification for estimation (2)
Example: Value of purchases for 14 enterprises Stratification alternatives: a) b) 2, 2, 2, 3, 5, 6, 7, 8, 9, 9, 10, 11, 12, 13 2, 2, 2, 3 5, 6, 7 8, 9, 9, 10 11, 12, 13 2, 9, 10 3, 9, 13 2, 6, 7, 12 2, 5, 8, 11 Case a: the strata are very homogeneous (Lower within-strata variance ) Case b: the strata are heterogeneous and similar with each other (Lower between-strata variance ) 1. Changed the text of the bullet points below the figures.

11 Precision and bias of estimates
Precise and not biased Precise but biased not precise and biased Not biased but not precise

12 Mean squared error Total survey error of an estimator ~ measure of accuracy 1. Introduced the term “accuracy”.

13 Precision of estimates
Sampling error = the random error = distance between the sample estimates and the expected value of the estimator The expected value of the estimator = the average of all sample estimates generated in the long run over repeated sampling 1. Changed “expected population parameter” into “expected value of the estimator”.

14 Bias of estimates Bias = the systematic error
= the expected value of the estimator - the true population parameter Examples: Deficiencies of frame population Bad combination of the sample design and the estimator Characteristics of non-respondents and respondents differ

15 Sampling variance Variance of the estimator of a total:
: population variance of stratum h. Estimate of the variance: : variance of the sample from stratum h

16 Strategies for calculation of sample size
Bottom-up approach: Aim: a certain reliability at stratum level Top-down approach: Aim: a certain reliability at overall population level Optimization approach Aim: both a certain reliability at stratum and population level Inflate sample sizes taking into account the anticipated response rates!

17 Top-down approach Proportional allocation of the overall sample
Equal inclusion probabilities in all strata: Gains in precision accrue to all survey measures Risk: High sampling errors for small strata which are heterogeneous 1. Removed the reference to sampling weights.

18 Top-down approach (2) Optimal (Neymann) allocation of the overall sample The smallest possible sampling errors at overall level of a given variable Risks: - High sampling errors for some small strata - Higher sampling errors at overall level for other variables = measure of heterogeneity within stratum h (standard deviation) Not the standard deviation of the sample from h

19 Estimation with non-response
Sampling weight: Adjustment of weight for unit non-response Assumption: the characteristics of respondents are the same as the characteristics of non-respondents If not: bias! Population in stratum h = Nh Sample in stratum h = nh=gross sample Respondents in stratum h =mh= net sample

20 Improving estimation Calibration
match the estimated totals of auxiliary variables with reliable statistics from other sources: - adjust the sampling weights so that matches the total of the same auxiliary variable from a reliable source minimize the distance between the sampling weights and the final calibration weights software e.g. CALMAR, SAS macro Introduced reference to auxiliary variables to avoid confusion with the variables of interest.

21 Estimation formulae Equivalent formulae, with final weights replacing sampling weights

22 Treatment of enterprises according to their size
Take-all strata containing large enterprises Inclusion probabilities proportional to size Cut-off sampling

23 ESS methodological documents
More in… ESS methodological documents Handbook on the design and implementation of business surveys (1997) Survey sampling reference guidelines – Introduction to sample design and estimation techniques (2008) Monographs of official statistics - Variance estimation methods in the European Union (2002) Handbook of Recommended Practices for Questionnaire Development and Testing in the ESS (2004) Download from Eurostat website: path: Eurostat home page -> About Eurostat -> Research and Methodology -> ESS Methodological Documents or directly: dology/ess_methodological_documents Not sure whether the list needs updating.

24 Thank you for your attention!


Download ppt "Sampling and estimation"

Similar presentations


Ads by Google