Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of 1987. Suppose.

Slides:



Advertisements
Similar presentations
Multistage Sampling.
Advertisements

1 Cluster Sampling Module 3 Session 8. 2 Purpose of the session To demonstrate how a cluster sample is selected in practice To demonstrate how parameters.
Introduction Simple Random Sampling Stratified Random Sampling
Ex Post Facto Experiment Design Ahmad Alnafoosi CSC 426 Week 6.
Sampling with unequal probabilities STAT262. Introduction In the sampling schemes we studied – SRS: take an SRS from all the units in a population – Stratified.
QBM117 Business Statistics Statistical Inference Sampling 1.
Complex Surveys Sunday, April 16, 2017.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Dr. Chris L. S. Coryn Spring 2012
Clustered or Multilevel Data
Why sample? Diversity in populations Practicality and cost.
Sampling and Randomness
STAT262: Lecture 5 (Ratio estimation)
Chapter 12 Sample Surveys
A new sampling method: stratified sampling
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
Section 5.1. Observational Study vs. Experiment  In an observational study, we observe individuals and measure variables of interest but do not attempt.
Sampling Designs and Techniques
Formalizing the Concepts: Simple Random Sampling.
Other Sampling Methods
Ch 5: Equal probability cluster samples
Sampling Moazzam Ali.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Sampling Design  M. Burgman & J. Carey Types of Samples Point samples (including neighbour distance samples) Transects line intercept sampling.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Chapter 12: AP Statistics
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Near East Regional Workshop - Linking Population and Housing Censuses with Agricultural Censuses. Amman, Jordan, June 2012 Improving Efficiency.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Chapter 7 Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.
1 1 Slide Chapter 7 (b) – Point Estimation and Sampling Distributions Point estimation is a form of statistical inference. Point estimation is a form of.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Sample Surveys.  The first idea is to draw a sample. ◦ We’d like to know about an entire population of individuals, but examining all of them is usually.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
1 Chapter 7 Sampling and Sampling Distributions Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lecture 4. Sampling is the process of selecting a small number of elements from a larger defined target group of elements such that the information gathered.
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
AP STATISTICS LESSON AP STATISTICS LESSON DESIGNING DATA.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
Chapter 5 Sampling: good and bad methods AP Standards Producing Data: IIB4.
Review HW: E1 A) Too high. Polltakers will never get in touch with people who are away from home between 9am and 5pm, eventually they will eventually be.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Probability Sampling. Simple Random Sample (SRS) Stratified Random Sampling Cluster Sampling The only way to ensure a representative sample is to obtain.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Status for AP Congrats! We are done with Part I of the Topic Outline for AP Statistics! (20%-30%) of the AP Test can be expected to cover topics from chapter.
1. 2 DRAWING SIMPLE RANDOM SAMPLING 1.Use random # table 2.Assign each element a # 3.Use random # table to select elements in a sample.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Dr Hidayathulla Shaikh. Contents At the end of lecture student should know  Why sampling is done  Terminologies involved  Different Sampling.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
John Loucks St. Edward’s University . SLIDES . BY.
Sampling with unequal probabilities
Slides by JOHN LOUCKS St. Edward’s University.
Power, Sample Size, & Effect Size:
Stratified Sampling STAT262.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Cluster Sampling STAT262.
Chapter 7 Sampling and Sampling Distributions
MGS 3100 Business Analysis Regression Feb 18, 2016
Sadeq R Chowdhury JSM 2019, Denver
Presentation transcript:

Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose that the population data of 1987 are available. How can we combine the two techniques? 1

Method 1: combined ratio estimator Step 1: combine strata to estimate t x and t y Step 2: use ratio estimation 2

Method 2: separate ratio estimators Step 1: use ratio estimation in each stratum Step 2: combine strata 3

Method 1 vs Method 2 If the ratios vary from stratum to stratum, use method 2 If sample sizes are small, use method 1 Poststratificatio is a special case of method 2 4

Cluster Sampling 5

A new sampling method Motivating example Want to study the average amount water used by per person How would you design a survey? 6

A new sampling method Consider the two strategies – Sample person by person – Sample household by household Which one do you prefer and why? 7

A new sampling method In the water usage example, I would sample households, in other words, I would use household as the sampling unit. I do this for convenience. I am interested in average monthly usage per person, but I sample household 8

A new sampling method The example of water usage is an example of cluster sampling – Households are the primary sampling units (PSUs) or clusters – Persons are the secondary sampling units (SSUs). They are the elements in the population 9

Definition of Cluster Sampling Take an SRS on clusters Individual elements of the population are allowed in the sample only if they belong to a cluster (primary sampling unit) that is included in the sample The sampling unit (psu) is not the same as the observation unit (ssu), and the two sizes of experimental units must be considered when calculating standard errors from cluster samples 10

Stratified sampling vs Cluster sampling The two sampling methods look similar – A cluster is also a grouping of elements of the population But the sampling schemes are different – Stratified: SRS from each stratum – Cluster: SRS of the clusters. For each selected cluster, we select all its elements – See the following two slides 11

Stratified sampling 12

Cluster sampling 13

Stratified sampling vs Cluster sampling Stratified sampling – Variance of the estimate of depends on the variability of values within strata – For greater precision, individual elements within each stratum should be similar values, but stratum means should differ from each other as much as possible – Stratified sampling usually improves the precision of SRS 14

Stratified sampling vs Cluster sampling Cluster sampling – The cluster is the sampling unit – The more clusters we sample, the smaller the variance – The variance of the estimate of depends primarily on the variability between cluster means – For greater precision, individual elements with each cluster should be heterogeneous and cluster means should be similar to one another – Cluster sampling usually ??? the precision of SRS 15

Why does cluster sampling tend to reduce precision? Elements of the same cluster tend to be more similar than elements selected at random from the whole population. E.g, – Elements of the same household tend to have similar political views – Fish in the same lake tend to have similar concentrations of mercury – Residents of the same nursing home tend to have similar opinions of the quality of care The similarities arise because of some underlying factors that may or may not be measurable – Residents of the same nursing home may have similar opinions because the care is poor – The concentration of mercury in the fish will reflect the concentration of mercury in the lake 16

Why does cluster sampling tend to reduce precision? Because of the similarities of elements within clusters, we do not obtain as much information By sampling everyone in the cluster, we partially repeat the same information instead of obtaining new information As a result, cluster sampling leads to less precision for estimates of population quantities 17

Motivation of using cluster sampling A sampling frame list of observation units may be difficult, expensive, or unavailable – Cannot list all honeybees in a region The population may be widely distributed geographically or may occur in nature clusters – Nursing home residents cluster in nursing homes Cluster sampling leads to convenience and reduced cost Cluster sampling may result in more information per dollar spent 18

Versions of cluster sampling: one-stage vs two-stage cluster sampling We will consider one-stage and two-stage sampling – One-stage sampling: every element within a sampled cluster is included in the sample – Two-stage sampling: we subsample only some of the elements of selected clusters 19

One-stage cluster sampling (1) (2)(3) 20

Two-stage cluster sampling (1) (2)(3) 21

Notation for cluster sampling 22

Notation for cluster sampling 23

Notation for cluster sampling 24

Notation for cluster sampling 25

One-stage cluster sampling (1) (2)(3) 26

One-stage cluster sampling Every element within a cluster (PSU) is included in the sample Either “all” or “none” of the elements that compose a cluster (PSU) are in the sample 27

Clusters of equal sizes – Most naturally occurring clusters do not fit into this framework – Can occur in agricultural and industrial sampling – Estimating population means or totals is simple We treat the cluster means or totals as the observations and simply ignore the individual elements We have an SRS of n observations, where t i is the total for all the elements in PSU i. 28

Clusters of equal sizes 29

Clusters of equal sizes Nothing is new here 30

Clusters of equal sizes: an example 31

Clusters of equal sizes: an example 32

Clusters of equal sizes: sampling weights 33

Theory of Cluster sampling with equal sizes 34

Theory of Cluster sampling with equal sizes In one-stage cluster sampling, the variability of the unbiased estimator of t depends entirely on the between-cluster part of the variability For cluster sampling 35

Theory of Cluster sampling with equal sizes When MSB/MSW is large – MSB is relatively large: elements in different clusters vary more than elements in the same cluster – cluster sampling is less precise than SRS If MSB>S^2, cluster sampling is less precise 36

37

Measurements of correlation ICC (or ρ): Intraclass (or intracluster) Correaltion Coefficient – Describes how similar elements in the same cluster are – Provides a measure of homogeneity within the clusters Definition: It can be shown that 38

Measurements of correlation 39 If SSB=0, then

One-stage cluster sampling with equal sizes vs SRS 40 If N is large 1+(M-1)ICC SSU’s, taken in a one-stage cluster sample, give The same amount of information as one SSU from an SRS e.g, ICC=1/2, M=5, then 1+(M-1)ICC=3 → 300 SSUs in the cluster sample = 100 SSUs in an SRS If ICC<0, cluster sampling is more efficient than SRS ICC is rarely negative in naturally occurring clusters

The GPA example 41 The population ANOVA table (estimated)

The GPA example 42 The population ANOVA table (estimated) The sample mean square total should not be used to estimate when n is small The data were collected as a cluster sample. They do not reflect enough of the cluster-to-cluster variability. Multiply the unbiased estimates of MSB and MSW by the df from the population ANOVA table to estimate the population sums of squares

The GPA example 43 The population ANOVA table (estimated)

The GPA example 44

Clusters of unequal sizes 45 The adjusted R2 measures the relative amount of variability in the population explained by the cluster means, adjusted fro the number of degrees of freedom If the clusters are homogeneous, then the cluster means are highly variable relative to the variation within cluster, and R2 will be high.

An example 46

An example 47

The GPA example 48

The GPA example 49 The population ANOVA table (estimated)

Clusters of unequal sizes In social surveys, clusters are usually of equal sizes In a one-stage sample, we will introduce two methods to estimate the population total/mean – Unbiased estimation – Ratio estimation 50

Unbiased estimation for cluster sampling with unequal sizes 51

Unbiased estimation for cluster sampling with unequal sizes Nothing is different from cluster sampling with equal sizes The problem is that the between cluster variance is large when the sizes of clusters are quite different from each other, as we expect large total from clusters of large sizes Therefore, we consider another estimator 52

Ratio estimation for cluster sampling with unequal sizes 53

Ratio estimation for cluster sampling with unequal sizes 54 where

Ratio estimation for cluster sampling with unequal sizes 55 Note, it is not difficult to find that The variance of the ratio estimator depends on the variability of the means per element in the clusters It can be much smaller than that of the unbiased estimator The ratio estimator requires the total number of elements in the population, K. The unbiased estimator does not require K.

Two-stage cluster sampling In one-stage cluster sampling, we – Examine all the SSU’s within the selected PSU’s – Obtain redundant information because SSU’s in a PSU tend to be similar – Expensive An alternative: taking a subsample within each selected PSU – two stage cluster sampling 56

Two-stage cluster sampling with equal probability 57

Two-stage cluster sampling with equal probability Compared with the one-stage cluster sampling, the two-stage uses one extra stage. The extra stage complicates the notation and estimators, as one needs to consider variability arising from both stages of data collection The points estimates are similar to those in one-stage, but variances are much more complicated 58

Two-stage cluster sampling with equal probability: an unbiased estimator Since we do not observe every SSU in the sampled PSU’s, we need to estimate the totals for the sampled PSU’s An unbiased estimator of the population total is 59

Two-stage cluster sampling with equal probability: an unbiased estimator The estimator is unbiased 60

Two-stage cluster sampling with equal probability: an unbiased estimator Because are random variables, the variance of has two components – The variability between PSU’s – The variability within PSU’s 61 Recall that Var[Y]=Var[E[Y|X]] + E[Var[Y|X]] Here

Two-stage cluster sampling with equal probability: an unbiased estimator 62

Two-stage cluster sampling with equal probability 63

Two-stage cluster sampling with equal probability: an unbiased estimator It can be shown that an unbiased estimator of the variance is For the population mean 64

Two-stage cluster sampling with equal probability: a ratio estimator 65 As in one-stage cluster sampling with unequal sizes, the between-PSU variance can be very large since it is affected both by variations in the cluster sizes and by variation in y.

66 where

The egg volume example A study (Arnold 1991) on egg volume of American coot eggs in Minnesota. We looked at volumes of a subsample of eggs in clutches (nests of eggs) with at least two eggs. For each sampled clutch, two eggs were measured 67

The egg volume example 68

The egg volume example 69

The egg volume example 70

The egg volume example 71 N is unknown but presumably to be large.

Using weights in cluster samples For estimating overall means and totals in cluster samples, most survey statisticians use sampling weights. Weights can be used to find a point estimate of almost any quantity of interest For cluster sampling: 72

Using weights in cluster samples 73

SRS : one-stage cluster: two-stage cluster For simplicity, we only consider One estimator from each of the three sampling methods 74

SRS : one-stage cluster: two-stage cluster Assume (nm) SSUs are sampled 75

SRS : one-stage cluster: two-stage cluster Recall that Therefore, 76

SRS : one-stage cluster: two-stage cluster We have defined ICC (ρ) 77

SRS : one-stage cluster: two-stage cluster 78

SRS : one-stage cluster: two-stage cluster If we use nm SSU’s in a one-stage cluster sampling, #PSU’s=n’=nm/M 79

SRS : one-stage cluster: two-stage cluster If we use nm SSU’s in an SRS 80

SRS : one-stage cluster: two-stage cluster 81

Design a cluster survey It is worth spending a great deal of effort on designing the survey for an expensive and large- scale survey It can take several years to design and pre-test For designing a cluster sample – What overall precision is needed? – What size should the PSU’s be? – How many SSU’s should be sampled in each sampled PSU? – How many PSU’s should be sampled? 82

Choosing the PSU size In many situations, the PSU size exists naturally. E.g, a clutch of eggs, a household In some situations, one needs to choose PSU sizes. E.g., area of a region, 1km 2, 2km 2,… Many ways to “try out” different PSU sizes Pilot study, perform an experiment The goal is get the most information for the least cost and inconvenience 83

Two-stage cluster design with equal cluster size and equal variance 84

Two-stage cluster design with equal cluster size and equal variance 85

Two-stage cluster design with equal cluster size and equal variance Graphing variance of varying m and n gives more information It is useful to examine – What if the costs or the cost function are slightly different? – What if changes slightly? 86

The GPA example 87

The GPA example 88

Summary of two-stage cluster Cluster sampling is widely used in large surveys Variances from cluster samples are usually greater than SRSs with the same SSUs Less expensive – the per-dollar information from cluster sampling might be greater than that of SRS 89

Summary of two-stage cluster 90