Download presentation
Presentation is loading. Please wait.
Published byGriffin Carson Modified over 6 years ago
1
Representativeness The aim of any sample is to represent the characteristics of the sample frame. There are a number of different methods used to generate a sample. As a researcher you will have to select the most appropriate method meet the requirements of your research.
2
Sampling techniques Sampling techniques- random and non random sampling techniques Random sampling- probability sampling- ie the probabilities of a member of a population being included in the sample may be determined, although probabilities of member selection may or may not be equal Non random sampling- non probability sampling- the probability of member selection from the population cannot be determined, largely because there is no sampling frame.
3
Random sampling techniques
Probability Samples Probability samples offer each respondent an equal probability or chance at being included in the sample. They are considered to be: Objective Empirical Scientific Quantitative Representative Simple random sampling Systematic random sampling Stratified random sampling Cluster random sampling Multistage sampling
4
Why sampling ? reduces costs for acceptable level of accuracy (money, manpower, processing time...) may free up resources to reduce nonsampling error and collect more information from each person in the sample ex: 400 interviewers at $5 per interview: lower sampling error 200 interviewers at 10$ per interview: lower nonsampling error much quicker results
5
When is sample representative ?
Balance on gender and age: proportion of women in proportion in population proportions of age groups in proportions in population An ideal representative sample: A miniature version of the population: implying that every unit in the sample represents the characteristics of a known number of units in the population Appropriate probability sampling ensures a representative sample ”on the average”
6
Alternative approaches for statistical inference based on survey sampling
Design-based: No modeling, only stochastic element is the sample s with known distribution Model-based: The values yi are assumed to be values of random variables Yi: Two stochastic elements: Y = (Y1, …,YN) and s Assumes a parametric distribution for Y Example : suppose we have an auxiliary variable x. Could be: age, gender, education. A typical model is a regression of Yi on xi.
7
Statistical principles of inference imply that the model-based approach is the most sound and valid approach Start with learning the design-based approach since it is the most applied approach to survey sampling used by national statistical institutes and most research institutes for social sciences. Is the easy way out: Do not need to model. All statisticians working with survey sampling in practice need to know this approach
8
Design-based statistical inference
Can also be viewed as a distribution-free nonparametric approach The only stochastic element: Sample s, distribution p(s) for all subsets s of the population U={1, ..., N} No explicit statistical modeling is done for the variable y. All yi’s are considered fixed but unknown Focus on sampling error Sets the sample survey theory apart from usual statistical analysis The traditional approach, started by Neyman in 1934
9
Outstanding issues in design-based inference
Estimation for subpopulations, domains Choice of sampling design – discuss several different sampling designs appropriate estimators More on use of auxiliary information to improve estimates More on variance estimation
10
Model-based inference
Assumes a model for the y vector Conditioning on the actual sample Use modeling to combine information Problem: dependence on model Introduces a subjective element almost impossible to model all variables in a survey Design approach is “objective” in a perfect world of no nonsampling errors
11
III. Model-based inference in survey sampling
Model-based approach. Also called the prediction approach Assumes a model for the y vector Use modeling to construct estimator Ex: ratio estimator Model-based inference Inference is based on the assumed model Treating the sample s as fixed, conditioning on the actual sample Best linear unbiased predictors Variance estimation for different variance measures
12
Model-based approach Two stochastic elements:
Treat the sample s as fixed [Model-assisted approach: use the distribution assumption of Y to construct estimator, and evaluate according to distribution of s, given the realized vector y] We can decompose the total t as follows:
13
The unobserved z is a realized value of the random variable Z, so the problem is actually to predict the value z of Z. Can be done by predicting each unobserved yi: The prediction approach, the prediction based estimator
14
Remarks: 1. Any estimator can be expressed on the “prediction form: 2. Can then use this form to see if the estimator makes any sense
15
Ex 1. Ex.2 Reasonable sampling design when y and x are positively correlated
16
Three common models A model for business surveys, the ratio model:
assume the existence of an auxiliary variable x for all units in the population.
17
II. A model for social surveys, simple linear regression:
Ex: xi is a measure of the “size” of unit i, and yi tends to increase with increasing xi. In business surveys, the regression goes thru the origin in many cases III. Common mean model:
18
Simple random sampling
Each member in the target population has an equal chance (probability) of being selected each time a unit is drawn for inclusion in the sample. This involves selecting anybody from the sample frame entirely at random. Random means that each person within the sample frame has an equal chance of being selected. In order to be random, a full list of everyone within a sample frame is required. Random number tables or a computer is then used to select respondents at random from the list. Examples of simple random sampling procedures are; - hat method - computer generated random numbers - random number tables
19
Assumptions: Sampling Design I: Simple Random Sampling
Every possible combination of sampling units has an equal and independent chance of being selected. The selection of a particular unit to be sampled is not influenced by the other units that have been selected or will be selected. Samples are either chosen with replacement or without replacement.
20
Sampling Design I: Simple Random Sampling
We use the familiar equations for estimating common statistics for the population Estimate the population mean: Estimate the variance of individual values: Compute the coefficient of variation: FOR 220 Aerial Photo Interpretation and Forest Measurements
21
Sampling Design I: Simple Random Sampling
Determine sample size (withoout replacement, or finite population): Knowing the coefficient of variation and that error is expected to be within x % of the value of the mean, we use the following forumula n = where, n = required sample size, A = allowable error percent, t = t-value, CV = coefficient of variation, and N = population size
22
Sampling Design I Simple Random Sampling
Determine sample size (without replacement, or finite population): Example using 0.10 ac fixed radius plots in a ten acre stand Assume: We have calculated the CV and found it was 30% Our allowable error is +/- 10% of the mean Are using t-value of 2 N is the total number of potential plots that could be placed in the stand (i.e., population). In this case, N is stand size/plot size or 10/0.10 = 100 n = = 26 n =
23
Estimation theory-simple random sample
SRS of size n: Each sample s of size n has Can be performed in principle by drawing one unit at a time at random without replacement Estimation of the population mean of a variable y: A natural estimator - the sample mean: Desirable properties:
24
The uncertainty of an unbiased estimator is measured by its estimated sampling variance or standard error (SE): Some results for SRS:
25
usually unimportant in social surveys:
n =10,000 and N = 5,000,000: 1- f = 0.998 n =1000 and N = 400,000: 1- f = n =1000 and N = 5,000,000: 1-f = effect of changing n much more important than effect of changing n/N
26
The estimated variance
Usually we report the standard error of the estimate: Confidence intervals for m is based on the Central Limit Theorem:
27
Example n Proportion of samples with |Z| <1.64
N = 341 residential blocks in Ames, Iowa yi = number of dwellings in block i 1000 independent SRS for different values of n n Proportion of samples with |Z| <1.64 Proportion of samples with |Z| <1.96 30 0.88 0.93 50 70 0.94 90 0.90 0.95
28
For one SRS with n = 90:
29
Absolute value of sampling error is not informative when not related to value of the estimate
For example, SE =2 is small if estimate is 1000, but very large if estimate is 3 The coefficient of variation for the estimate: A measure of the relative variability of an estimate. It does not depend on the unit of measurement. More stable over repeated surveys, can be used for planning, for example determining sample size More meaningful when estimating proportions
30
Estimation of a population proportion p with a certain characteristic A
p = (number of units in the population with A)/N Let yi = 1 if unit i has characteristic A, 0 otherwise Then p is the population mean of the yi’s. Let X be the number of units in the sample with characteristic A. Then the sample mean can be expressed as
31
So the unbiased estimate of the variance of the estimator:
32
Examples A political poll: Suppose we have a random sample of 1000 eligible voters in Norway with 280 saying they will vote for the Labor party. Then the estimated proportion of Labor votes in Norway is given by: Confidence interval requires normal approximation. Can use the guideline from binomial distribution, when N-n is large:
33
In this example : n = 1000 and N = 4,000,000
Ex: Psychiatric Morbidity Survey 1993 from Great Britain p = proportion with psychiatric problems n = 9792 (partial nonresponse on this question: 316) 40,000,000
34
Systematic sampling Resembles simple random sampling because all the units in the sampling frame initially have an equal chance of being selected. It differs from simple random sampling because the probability of units being included in the sample is not equal The first step is to determine the sample size Then the sampling interval has to be determined by dividing the sample size by total population. Eg for a population of students and a sample size of 1000, our sampling interval is 1 in 20 . A number between 1 and 20 is randomly chosen, and then every 20th student will be selected until 1000 students are selected. 1. What assumptions can be made on systematic sample for it to approximate simple random sample?
35
Assumptions: Sampling Design II Systematic Sampling
The initial sampling unit is randomly selected or established on the ground. All other sample units are spaced at uniform intervals throughout the area sampled. Sampling units are easy to locate. Sampling units appear to be representative of an area. FOR 220 Aerial Photo Interpretation and Forest Measurements
36
Sample Selection Procedure
List all the units in the population from 1,2,…,N – Sampling frame Select a random number g in the interval 1 g K, using a random mechanism e.g. random number tables, where K = K is called the Sampling Interval N is the population size; n is the sample size The random number g is called the random start and constitutes the first unit of the sample
37
Sample Selection Procedure
Take every kth unit after the random start The selected units will be g, g+k, g+2k, g+3k, g+4k, …,g+(n-1)k Until we have n units Example N =10000, n=100 k = =100 Suppose g=87
38
Sample Selection Procedure
We select the following units 87, 187, 287, 387,…, 9987 NB: This procedure is however only valid if k is an integer (whole number) If k is not an integer (whole number) there are a number of methods we can use. We will consider just two of them
39
Sample Selection Procedure
Method 1: Use Circular Sampling Treat the list as circular so that the last unit is followed by the first Select a random start g between 1 and N, using a random mechanism Add the intervals k until n units are selected Any convenient interval k will result into a random sample
40
Sample Selection Procedure
One suitable suggestion is to choose the integer k closest to the ratio Method 2: Use Fractional Intervals Suppose we want to select a sample of 100 units from a population of 21,156. Calculate k = =211.56 Select a random start g between 1 and using a random mechanism
41
Sample Selection Procedure
Suppose g = 582 Add the interval successively obtaining exactly 100 numbers The numbers will be 582, 21738, 42894, … Divide each number by 100 and round to the nearest whole number to get the selected sample, i.e. 6, 217, 429, etc
42
Advantages and Disadvantages of Systematic sampling
The major advantage is that it is easy, almost foolproof and flexible to implement It is especially easy to give instructions to fieldworkers If we order our list prior to taking the sample, the sample will reflect the ordering and as such can easily give a proportionate sample
43
Advantages and Disadvantages of Systematic sampling
The main disadvantage is that if there is an ordering (monotonic trend or periodicity) in the list which is unknown to the researcher, this may bias the resulting estimates There is a problem of estimating variance from systematic sampling- variance is biased
44
Arguments: For: *** Against: Sampling Design II: Systematic Sampling
Regular spacing of sample units may yield efficient estimates of populations under certain conditions. *** Against: Accuracy of population estimates can be low if there is periodic or cyclic variation inherent in the population. FOR 220 Aerial Photo Interpretation and Forest Measurements
45
Arguments: For: Against: Sampling Design IISystematic Sampling
There is no practical alternative to assuming that populations are distributed in a random order across the landscape. Against: Simple random sampling statistical techniques can’t logically be applied to a systematic design unless populations are assumed to be randomly distributed across the landscape. FOR 220 Aerial Photo Interpretation and Forest Measurements
46
Systematic sampling as Implicit Stratification
In practice: Very often when using systematic sampling (common design in national statistical institutes): The population is ordered such that the first k units constitute a homogeneous “stratum”, the second k units another “stratum”, etc. Implicit strata Units 1 1,2….,k 2 k+1,…,2k : n = N/k assumed (n-1)k+1,.., nk Systematic sampling selects 1 unit from each stratum at random
47
Systematic sampling vs SRS
Systematic sampling is more efficient if the study variable is homogeneous within the implicit strata Ex: households ordered according to house numbers within neighbourhooods and study variable related to income Households in the same neighbourhood are usually homogeneous with respect socio-economic variables If population is in random order (all N! permutations are equally likely): systematic sampling is similar to SRS Systematic sampling can be very bad if y has periodic variation relative to k: Approximately: y1 = yk+1, y2 = yk+2 , etc
48
Variance estimation No direct estimate, impossible to obtain unbiased estimate If population is in random order: can use the variance estimate form SRS as an approximation Develop a conservative variance estimator by collapsing the “implicit strata”, overestimate the variance The most promising approach may be: Under a statistical model, estimate the expected value of the design variance Typically, systematic sampling is used in the second stage of two-stage sampling (to be discussed later), may not be necessary to estimate this variance then.
49
Summary: Sampling Design II: Systematic Sampling
We can (and often do) use systematic sampling to obtain estimates about the mean of populations. When an objective, numerical statement of precision is required, however, it should be viewed as an approximation of the precision of the sampling effort. (i.e. 95% confidence intervals) Use formulas presented for simple random sampling, and where appropriate, use the “without replacement” variations of those equations (if sampling from a small population), otherwise use the normal SRS statistical techniques. FOR 220 Aerial Photo Interpretation and Forest Measurements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.