Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1: Basic concepts of surveys

Similar presentations


Presentation on theme: "Chapter 1: Basic concepts of surveys"— Presentation transcript:

1 Chapter 1: Basic concepts of surveys
Handbook: chapter 1, 2 Some history of data collection Basic principles of sampling Some sampling designs Some estimators Errors in surveys

2 Data collection through the ages
Even the old empires needed statistical overviews Initially always complete enumeration. No sampling. China and Egypt (1000 B.C.): overviews for taxation and military affairs. Roma Empire: counts of people and their possessions. Census in Bethlehem (Pieter Bruegel, 1566)

3 Data collection through the ages
The Domesday book Commissioned by William the Conqueror (1086). Compiled by royal commissioners. Data about 13,000 places, 10,000 facts per county. Data about landowners, slaves, free people, woodland, pasture, mills, fish ponds, estimate value.

4 Data collection through the ages
The Quipucamayoc Statistician in the Inca Empire ( A.C.). Quipucamayoc in each district. Count of people, young man, houses, llama’s. Recorded on quipu’s. Knots in coloured ropes, decimal system. RAPI = Rope Assisted Personal Interviewing.

5 Data collection through the ages
The first modern censuses New France (Canada): 1666, Jean Talon, N=3215. Sweden: 1748, Denmark: 1769. Netherlands: 1795, new system of electoral constituencies. Standardized questionnaire. Legal obligation to participate.

6 The rise of sample surveys
The period until 1895 No sampling. It is not proper to replace people by computations (discrimination). No reliable conclusions base on sample data. The dawn of new era Industrialisation. Urbanisation. Population growth. Central government.

7 The rise of sample surveys
Developments 1895: Anders Kiaer proposes his ‘Representative Method’ Accuracy of estimates cannot be computed. 1906: Arthur Bowley shows the importance of random sampling Probability Theory can be applied. 1934: Jerzy Neyman introduces the confidence interval He also shows that purposive sampling does not work.

8 The rise of sample surveys
The fundamental principles of sampling Samples must be selected by means of probability sampling. Every element must have a positive probability of selection. All selection probabilities must be known. Consequences It is always possible to construct an unbiased estimator. Estimators often have a (approximately) normal distribution. Accuracy of estimators can be computed (confidence intervals). Warning For other forms of sampling (e.g. quota sampling), it is not clear how reliable and accurate the outcomes are.

9 Survey What is a survey Making inference about a population using data on only a small part of it. Population typically exists of people, households, farms, companies schools, etc. Data is collected using a questionnaire. Why a sample? Complete enumeration (census) is time-consuming. Complete enumeration (census) is expensive Response burden is decreased. More attention can be paid to quality.

10 Target population Definition Population to be investigated
Conclusions refer to this population Practical definition Example: Labour Force Survey Include people working outside the country? Include foreign workers with temporary job? Include illegal workers? Include employees of foreign embassies? Notation: Sample size: N Population: U = {1, 2, …, N}

11 Variables Target variables The variables we want to investigate.
Values: Y1, Y2, …, YN. Auxiliary variables Used for differentiating survey results and improving estimates. Values: X1, X2, …, XN. Example: demographic variables. Variable types: Quantitative variable: measures amount, size, values, or duration. Qualitative variable: divides in groups. Indicator variable: measures presence (1) or absence (0) of a certain property.

12 Population parameters
Population parameters for a quantitative variable Population total Y Population mean Adjusted population variance Population parameter for an indicator variable Percentage

13 Sampling Sampling design
Random selection procedure, with known probabilities. Sampling without replacement Sample Series of indicators a1, a2, …, aN, with ak = 1 if element k is selected ak = 0 if element k is not selected Sample size First order inclusion probability Second order inclusion probability

14 Estimation Estimator Recipe / algorithm to compute an estimate
Properties Unbiased: on average, the estimates must be equal to the value of the population parameter. Precise: variation of possible outcomes must be small Simple: linear combination of observed values. Horvitz-Thompson estimator (for population mean) Estimator: Unbiased, with variance

15 Precision of estimates
Variance of the estimator Standard error of the estimator Confidence interval Contains true value with a high probability 1- α. Confidence level = 1 – α. Usually α = 0.05 or α = 0.01 95% confidence interval for the population mean (α = 0.05): Standard error must be estimated using sample data:

16 Sampling designs: simple random sample
All first order inclusion probabilities are equal πk = n / N, for all k πkl = n(n-1) / N(N-1), for all k ≠ l Horvitz-Thompson estimator turns into the sample mean: Variance: Precision: Increases with sample size Independent of population size

17 Sampling designs: stratified sample
Stratification Population is divided in L strata (sub-populations). A sample is selected in each stratum. Unbiased stratum estimates are combined. Estimator: Variance (simple random samples) Precision Estimator is precise if strata are homogeneous.

18 Sampling designs: sampling with unequal probabilities
Reason Variance of Horvitz-Thompson estimator is small when Yk /k is approximately constant. Sampling Try to find auxiliary variable X that is strongly correlated with Y. Take inclusion probabilities k proportional to Xk. All Xk must be positive, and known for every element. Example: Survey on shoplifting Y = Value of stolen goods X = Floor size of shop

19 Sampling designs: cluster sampling
Practical reasons There is no sampling frame of elements, but there is a sampling frame for cluster of elements. For example: addresses. Cost reduction in face-to-face surveys. Sampling Clusters can be selected with equal or unequal probabilities. For example: select clusters proportional to size All elements in selected clusters are included in survey. Disadvantages Variances can be large (cluster effect). No control over sample size.

20 Sampling designs: two-stage sampling
Practical reasons There is no sampling frame of elements, but there is a sampling frame for cluster of elements. For example: municipalities. Cost reduction in face-to-face surveys.. Sampling Sample of clusters (with equal or unequal probabilities). Samples of elements within selected clusters (with equal or unequal probabilities) Properties More control over sample size Variances can be large (cluster effect). Example: First stage: municipalities Second stage: perosns

21 Estimators Improving the precision of estimates
Using auxiliary variables They are measured in the sample. Population distribution is available. Use of auxiliary variables in the sampling design Quantitative variable: Sampling with unequal probabilities Qualitative variable: Stratified sampling Use of auxiliary variables in the estimator Quantitative variable: Ratio estimator Quantitative variable: Regression estimator Qualitative variable: Post-stratification estimator

22 Estimators: ratio estimator
Assumed model: Consequence: Parameter B is estimated by Required: population mean of auxiliary variable X Estimator Variance

23 Estimators: regression estimator
Assumed model: Consequence: Parameter B is estimated by Required: population mean of auxiliary variable X Estimator Variance

24 Estimators: general regression estimator
Assumed model: Consequence: Parameter B is estimated by Required: population means of all auxiliary variables Estimator Variance

25 Estimators: post-stratification estimator
Assumed model: Population divided in L sub-populations (strata). Little variation of target variable Y within strata: homogeneity. Required: Numbers of elements in strata: N1, N2, .., NL Estimator Variance Special case of general regression estimator Introduce L dummy variables Xkh = 1 if in stratum h, otherwise Xkh = 0 Model:

26 Errors in surveys In theory
Sample-based estimates differ from true population values. Estimators are unbiased. Margin of error can be computed. Every thing under control. In practice Many other phenomena have an impact on estimators. They may decrease precision (increase standard error). They may also introduce a bias. The bias is often independent of the sample size. Problems do not disappear by increasing the sample size.

27 Non-observation error
Errors in surveys Total error Sampling error Estimation error Specification error Non-sampling error Observation error Over-coverage error Measurement error Processing error Non-observation error Under-coverage error Nonresponse error

28 Errors in surveys Sampling errors
Estimation error: consequence of randomization. Specification error: sampling frame. Non-sampling error Under-coverage: mixed-mode data collection. Measurement error: questionnaire design, interviewer training, editing techniques, imputation techniques. Processing error: editing techniques, imputation techniques. Nonresponse: reduction techniques (contact strategy, interviewer training, refusal conversion), correction techniques (weighting, adjustment).

29 The nonresponse problem
The problem Nonresponse occurs in every survey. Nonresponse cause estimates to be biased. Nonresponse problems seem to increase. It is not easy to reduce nonresponse. It is not easy to correct for nonresponse. The solution Attempt to reduce nonresponse in the fieldwork. Attempt to correct for nonresponse after the fieldwork. Vital role for auxiliary variables. Think about auxiliary variables in the design stage of the survey.


Download ppt "Chapter 1: Basic concepts of surveys"

Similar presentations


Ads by Google