Lecture 9 Sampling Procedures and Testing Independence

Slides:

Advertisements

Similar presentations

CHI-SQUARE(X2) DISTRIBUTION

Advertisements

1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)

Inference about the Difference Between the

Hypothesis Testing IV Chi Square.

Statistical Inference for Frequency Data Chapter 16.

Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.

Text books: (1)Medical Statistics A commonsense approach A commonsense approach By By Michael J. Campbell & David Machin Michael J. Campbell & David Machin.

Applications of the Chi-Square Statistic Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.

Previous Lecture: Analysis of Variance

Chi-Square Tests and the F-Distribution

1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.

1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.

AM Recitation 2/10/11.

Analysis of Categorical Data

1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.

Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.

Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.

Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.

Chapter Outline Goodness of Fit test Test of Independence.

Slide 1 Copyright © 2004 Pearson Education, Inc..

1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.

Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.

1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.

Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.

THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.

Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.

Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.

Chi Square Test Dr. Asif Rehman.

CHI-SQUARE(X2) DISTRIBUTION

The Chi-square Statistic

Inference about the slope parameter and correlation

Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.

Making inferences from collected data involve two possible tasks:

Presentation 12 Chi-Square test.

5.1 INTRODUCTORY CHI-SQUARE TEST

10 Chapter Chi-Square Tests and the F-Distribution Chapter 10

Analysis of Discrete Variables

The binomial applied: absolute and relative risks, chi-square

John Loucks St. Edward’s University . SLIDES . BY.

Virtual University of Pakistan

Sampling: Theory and Methods

Data Analysis for Two-Way Tables

Lecture 4. The Multinomial Distribution (II)

Elementary Statistics

Testing for Independence

Lecture 7 The Odds/ Log Odds Ratios

Econ 3790: Business and Economics Statistics

Solution for Tutorial 6 We have three types of sampling procedure: Simple Random Sampling, Stratified Sampling with column totals fixed, and with row.

Lecture 5, Goodness of Fit Test

Hypothesis testing. Chi-square test

Lecture 10 Comparing 2xk Tables

Statistical Analysis Chi-Square.

Addition of Independent Normal Random Variables

Categorical Data Analysis

Chapter 13 – Applications of the Chi-Square Statistic

Inference on Categorical Data

Lecture 3. The Multinomial Distribution

Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007

Joyful mood is a meritorious deed that cheers up people around you

BUSINESS MARKET RESEARCH

Sampling Distribution of a Sample Proportion

UNIT V CHISQUARE DISTRIBUTION

S.M.JOSHI COLLEGE, HADAPSAR

Summary Table of Influence Procedures for a Single Sample (I)

Chapter Outline Goodness of Fit test Test of Independence.

Quadrat sampling & the Chi-squared test

Quadrat sampling & the Chi-squared test

Presentation transcript:

Lecture 9 Sampling Procedures and Testing Independence Outline of Today Sampling Procedures Testing Independence 1/17/2019 SA3202, Lecture 9

Procedure 1 Procedure 1: Simple Random Sampling A random sample of size n, say, is drawn from the whole population, and each individual is cross-classified with respect to both variables R and C. Table 1 2 ……… c Total 1 X11 X12 X1c 2 X21 X22 X2c ………………………….. r Xr1 Xr2 Xrc Total n Feature The grand total is fixed as n but the row totals and the column totals are not fixed. 1/17/2019 SA3202, Lecture 9

Procedure 2 Procedure 2: Stratified Random Sampling Step 1 The population is stratified with respect to the stratification variable C (the classification variable). (thus the sample sizes for the stratum are now fixed) Step2 Within each stratum, a random sample is drawn and each individual is classified with respect to the response variable R. Table 1 2 ……… c Total 1 X11 X12 X1c 2 X21 X22 X2c ………………………….. r Xr1 Xr2 Xrc Total n1 n2 nc n Feature The grand total is fixed and the column totals are fixed, but the row totals are not fixed. 1/17/2019 SA3202, Lecture 9

Example Example 1 Consider the population of the example used in the last lecture. Recall that R=Smoking (Yes, No) and C=Sex (Male, Female). There are three methods for collecting data on the two variables. Male Female Total Smoker 30% 10% 40% Nonsmoker 20% 40% 60% Total 50% 50% 100% Method 1 Procedure We draw a random sample of size 200, say, from the whole population and classify each individual with respect to both Smoking and Sex. Feature The number of males and females in the sample are not fixed (but random); only the total number of individuals in the sample, 200, is fixed. Male Female Total Smoker ? ? ? Non-smoker ? ? ? Total ? ? 200 Distribution The distribution within the whole table is Multinomial: (X11,X12, X21,X22)~Mult(200;.3,.1,.2,.4) 1/17/2019 SA3202, Lecture 9

Example Method 2 Procedure We draw a random sample of size 100, say, males , and another sample of 100, say, females, and classify each individual with respect to smoking habit Feature The column totals are fixed (not random). Male Female Total Smoker ? ? ? Non-smoker ? ? ? Total 100 100 200 Distribution The distribution within each column is Multinomial (binomial) with the probabilities given by the conditional distribution of Smoking given Sex: (X11, X21)~Mult(100; .6, .4) (the distribution of smokers among males) (X12, X22)~Mult(100; .2, .8) (the distribution of smokers among females) 1/17/2019 SA3202, Lecture 9

Example Method 3 Procedure We draw a random sample of size 100, say, smokers , and another sample of 100, say, non-smokers, and classify each individual with respect to Sex Feature The row totals are fixed (not random). Male Female Total Smoker ? ? 100 Non-smoker ? ? 100 Total ? ? 200 Distribution The distribution within each row is Multinomial (binomial) with the probabilities given by the conditional distribution of Sex given the Smoking habit: (X11, X12)~Mult(100; .75,.25) (the distribution of Sex among smokers) (X21, X22)~Mult(100; .33, .67) (the distribution of Sex among non-smokers) 1/17/2019 SA3202, Lecture 9

Summary Method 1: Simple Random Sampling. The distribution within the whole table is multinomial, with probability given by the joint distribution of R and C. Method 2: Stratified Sampling with C as the stratification variable and sample sizes n1, n2, …, nc. The distribution within each column is multinomial with probabilities given by the conditional distribution of R given C. Method 3: Stratified Sampling with R as the stratification variable and sample sizes n1, n2, …, nr, the distribution with each row is multinomial with probabilities given by the conditional distribution of C given R. Remarks: The distinction between the two sampling procedures is important when making inferences about the parameters: with simple random sampling, we have information about the joint, marginal, and conditional probabilities. But with stratified sampling ( i.e., with either the row or the column totals fixed), we do not have information about the joint or marginal probabilities, we have information only about the corresponding conditional probabilities ( probabilities within rows or within columns). 1/17/2019 SA3202, Lecture 9

Example Example 2 Consider the following (hypothetical ) data concerning the distribution of 100 individuals with respect to Sex and Smoking Male Female Total Smoker 40 10 50 Non-smoker 20 30 50 Total 60 40 100 Under the Simple Sampling Procedure, we have Pr(Males)=60% Pr(Female)=40% Pr(Smokers)=50% Pr(Non-smokers)=50% Under the Stratified Sampling Procedure with Sex as stratification variable, we have Pr(Smokers|Male)=40/60=67%, Pr(Smokers|Female)=10/40=25% Under the Stratified Sampling Procedure with Smoking as stratification variable, we have Pr(Males|Smoker)=40/50=80%, Pr(Female|Nonsmoker)=20/50=40% Remark: We cannot tell the sampling procedure just by looking at the data table. We need to know how the data were actually collected. 1/17/2019 SA3202, Lecture 9

Testing Independence Problem of Interest Whether the column variable C and the row variable R are independent: H0: Pr (R=i, C=j)=Pr(R=i) Pr(C=j) , i=1,2, …,r;j=1,2,….,c Testing Procedure Step 1. Find the Expected frequencies under H0 and under a given sampling procedure. Step 2. Apply the Pearson’s Goodness of Fit Test or the Wilk’s Likelihood Ratio Test Feature The estimated expected frequencies ( and the associated d.f. ) are the same under different sampling procedures, and given by Row total X Column Total Estimated Expected Frequency= ------------------------------------------ Grand Total Test Statistic d.f.= (r-1)(c-1) The Pearson’s Goodness of Fit Test The Wilk’s Likelihood Ratio Test 1/17/2019 SA3202, Lecture 9

Proof We shall prove the statements about the estimated expected frequency and about the degrees of freedom under different sampling procedures. Under Simple Sampling Procedure Under H0, the Expected Frequencies are mij=E(Xij)=n pij= The Estimated Expected Frequencies are 1/17/2019 SA3202, Lecture 9

Proof The df is obtained by applying the general rule: df=the total number of cells-1-the number of free parameters estimated under H0 =rc-1- ((r-1)+(c-1))=(r-1)(c-1) By noting that (1). The total number of cells =rc (2). We lose (r-1) dfs by estimating the r row marginal probabilities (3). We lose (c-1) dfs by estimating the c column marginal probabilities 1/17/2019 SA3202, Lecture 9

Proof Under Stratified Sampling Procedure (Fixed Row or Column Totals) As an example, consider the stratified sampling with C as the stratification variable (fixed the column totals). Note that n=n1+n2+…+nc nj the j-th total of sampling units, n the grand total The Expected Frequencies are E(Xij)=nj Pr(R=i| C=j)=nj Pr(R=i) The Estimated Expected Frequencies are then 1/17/2019 SA3202, Lecture 9

Proof As for the df, keep in mind that (1). We are dealing with c Multinomial distributions. (2). The df associated with each multinomial distribution is r-1, the total df is c(r-1) (3). We lose r-1 df by estimating the r row marginal probabilities Therefore df=c(r-1)-(r-1)=(c-1)(r-1). 1/17/2019 SA3202, Lecture 9