Lecture 9 Sampling Procedures and Testing Independence

Lecture 9 Sampling Procedures and Testing Independence
Outline of Today Sampling Procedures Testing Independence 1/17/2019 SA3202, Lecture 9

Procedure 1 Procedure 1: Simple Random Sampling A random sample of size n, say, is drawn from the whole population, and each individual is cross-classified with respect to both variables R and C. Table ……… c Total X X X1c X X X2c ………………………….. r Xr Xr Xrc Total n Feature The grand total is fixed as n but the row totals and the column totals are not fixed. 1/17/2019 SA3202, Lecture 9

Procedure 2 Procedure 2: Stratified Random Sampling
Step 1 The population is stratified with respect to the stratification variable C (the classification variable). (thus the sample sizes for the stratum are now fixed) Step2 Within each stratum, a random sample is drawn and each individual is classified with respect to the response variable R. Table ……… c Total X X X1c X X X2c ………………………….. r Xr Xr Xrc Total n n nc n Feature The grand total is fixed and the column totals are fixed, but the row totals are not fixed. 1/17/2019 SA3202, Lecture 9

Example Example 1 Consider the population of the example used in the last lecture. Recall that R=Smoking (Yes, No) and C=Sex (Male, Female). There are three methods for collecting data on the two variables. Male Female Total Smoker % % % Nonsmoker 20% % % Total % % % Method 1 Procedure We draw a random sample of size 200, say, from the whole population and classify each individual with respect to both Smoking and Sex. Feature The number of males and females in the sample are not fixed (but random); only the total number of individuals in the sample, 200, is fixed. Male Female Total Smoker ? ? ? Non-smoker ? ? ? Total ? ? Distribution The distribution within the whole table is Multinomial: (X11,X12, X21,X22)~Mult(200;.3,.1,.2,.4) 1/17/2019 SA3202, Lecture 9

Example Method 2 Procedure We draw a random sample of size 100, say, males , and another sample of 100, say, females, and classify each individual with respect to smoking habit Feature The column totals are fixed (not random). Male Female Total Smoker ? ? ? Non-smoker ? ? ? Total Distribution The distribution within each column is Multinomial (binomial) with the probabilities given by the conditional distribution of Smoking given Sex: (X11, X21)~Mult(100; .6, .4) (the distribution of smokers among males) (X12, X22)~Mult(100; .2, .8) (the distribution of smokers among females) 1/17/2019 SA3202, Lecture 9

Example Method 3 Procedure We draw a random sample of size 100, say, smokers , and another sample of 100, say, non-smokers, and classify each individual with respect to Sex Feature The row totals are fixed (not random). Male Female Total Smoker ? ? Non-smoker ? ? Total ? ? Distribution The distribution within each row is Multinomial (binomial) with the probabilities given by the conditional distribution of Sex given the Smoking habit: (X11, X12)~Mult(100; .75,.25) (the distribution of Sex among smokers) (X21, X22)~Mult(100; .33, .67) (the distribution of Sex among non-smokers) 1/17/2019 SA3202, Lecture 9

Summary Method 1: Simple Random Sampling. The distribution within the whole table is multinomial, with probability given by the joint distribution of R and C. Method 2: Stratified Sampling with C as the stratification variable and sample sizes n1, n2, …, nc. The distribution within each column is multinomial with probabilities given by the conditional distribution of R given C. Method 3: Stratified Sampling with R as the stratification variable and sample sizes n1, n2, …, nr, the distribution with each row is multinomial with probabilities given by the conditional distribution of C given R. Remarks: The distinction between the two sampling procedures is important when making inferences about the parameters: with simple random sampling, we have information about the joint, marginal, and conditional probabilities. But with stratified sampling ( i.e., with either the row or the column totals fixed), we do not have information about the joint or marginal probabilities, we have information only about the corresponding conditional probabilities ( probabilities within rows or within columns). 1/17/2019 SA3202, Lecture 9

Example Example 2 Consider the following (hypothetical ) data concerning the distribution of 100 individuals with respect to Sex and Smoking Male Female Total Smoker Non-smoker Total Under the Simple Sampling Procedure, we have Pr(Males)=60% Pr(Female)=40% Pr(Smokers)=50% Pr(Non-smokers)=50% Under the Stratified Sampling Procedure with Sex as stratification variable, we have Pr(Smokers|Male)=40/60=67%, Pr(Smokers|Female)=10/40=25% Under the Stratified Sampling Procedure with Smoking as stratification variable, we have Pr(Males|Smoker)=40/50=80%, Pr(Female|Nonsmoker)=20/50=40% Remark: We cannot tell the sampling procedure just by looking at the data table. We need to know how the data were actually collected. 1/17/2019 SA3202, Lecture 9

Testing Independence Problem of Interest Whether the column variable C and the row variable R are independent: H0: Pr (R=i, C=j)=Pr(R=i) Pr(C=j) , i=1,2, …,r;j=1,2,….,c Testing Procedure Step 1. Find the Expected frequencies under H0 and under a given sampling procedure. Step Apply the Pearson’s Goodness of Fit Test or the Wilk’s Likelihood Ratio Test Feature The estimated expected frequencies ( and the associated d.f. ) are the same under different sampling procedures, and given by Row total X Column Total Estimated Expected Frequency= Grand Total Test Statistic d.f.= (r-1)(c-1) The Pearson’s Goodness of Fit Test The Wilk’s Likelihood Ratio Test 1/17/2019 SA3202, Lecture 9

Proof We shall prove the statements about the estimated expected frequency and about the degrees of freedom under different sampling procedures. Under Simple Sampling Procedure Under H0, the Expected Frequencies are mij=E(Xij)=n pij= The Estimated Expected Frequencies are 1/17/2019 SA3202, Lecture 9

Proof The df is obtained by applying the general rule:
df=the total number of cells-1-the number of free parameters estimated under H0 =rc-1- ((r-1)+(c-1))=(r-1)(c-1) By noting that (1). The total number of cells =rc (2). We lose (r-1) dfs by estimating the r row marginal probabilities (3). We lose (c-1) dfs by estimating the c column marginal probabilities 1/17/2019 SA3202, Lecture 9

Proof Under Stratified Sampling Procedure (Fixed Row or Column Totals)
As an example, consider the stratified sampling with C as the stratification variable (fixed the column totals). Note that n=n1+n2+…+nc nj the j-th total of sampling units, n the grand total The Expected Frequencies are E(Xij)=nj Pr(R=i| C=j)=nj Pr(R=i) The Estimated Expected Frequencies are then 1/17/2019 SA3202, Lecture 9

Proof As for the df, keep in mind that
(1). We are dealing with c Multinomial distributions. (2). The df associated with each multinomial distribution is r-1, the total df is c(r-1) (3). We lose r-1 df by estimating the r row marginal probabilities Therefore df=c(r-1)-(r-1)=(c-1)(r-1). 1/17/2019 SA3202, Lecture 9

Lecture 9 Sampling Procedures and Testing Independence

Similar presentations

Presentation on theme: "Lecture 9 Sampling Procedures and Testing Independence"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 9 Sampling Procedures and Testing Independence

Similar presentations

Presentation on theme: "Lecture 9 Sampling Procedures and Testing Independence"— Presentation transcript:

Similar presentations

About project

Feedback