Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong.

Slides:



Advertisements
Similar presentations
Categorical Data Analysis
Advertisements

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Basic Statistics The Chi Square Test of Independence.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.
Lesson #29 2  2 Contingency Tables. In general, contingency tables are used to present data that has been “cross-classified” by two categorical variables.
Chapter 16 Chi Squared Tests.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inferences About Process Quality
BCOR 1020 Business Statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chi-Square Tests and the F-Distribution
Nonparametrics and goodness of fit Petter Mostad
Presentation 12 Chi-Square test.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
QNT 531 Advanced Problems in Statistics and Research Methods
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Goodness-of-Fit Tests and Categorical Data Analysis
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Multinomial Distribution
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Distributions of Categorical Variables (C26 BVD)
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
Chapter 12 Chi-Square Tests and Nonparametric Tests.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Categorical Analysis STAT120C 1. Review of Tests Learned in STAT120C Which test(s) should be used to answer the following questions? – Is husband’s BMI.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Presentation 12 Chi-Square test.
Chapter 11 Chi-Square Tests.
Chapter 12 Tests with Qualitative Data
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
Chapter 11 Goodness-of-Fit and Contingency Tables
SA3202 Statistical Methods for Social Sciences
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Overview and Chi-Square
Chapter 11 Chi-Square Tests.
Analyzing the Association Between Categorical Variables
Copyright © Cengage Learning. All rights reserved.
Inference for Two Way Tables
Chapter 11 Chi-Square Tests.
What is Chi-Square and its used in Hypothesis? Kinza malik 1.
Presentation transcript:

Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Outline Introduction – Kansas Farmer Data – Notation Modified Pearson Based Statistic – Nonparametric Bootstrap – Bootstrap p-Value Methods Simulation Study Conclusion

Introduction pick any (or pick any/c) or multiple-response categorical variables Survey data arising from multiple-response categorical variables questions present a unique challenge for analysis because of the dependence among responses provided by individual subjects. Testing for independence between two categorical variables is often of interest When at least one of the categorical variables can have multiple responses, traditional Pearson chisquare tests for independence should not be used because of the within-subject dependence among responses

Intro contd A special kind of independence, called marginal independence, becomes of interest in the presence of multiple response categorical variables The purpose of this article is to develop new approaches to the testing of marginal independence between two multiple-response categorical variables Agresti and Liu (1999) call this a test for simultaneous pair wise marginal independence (SPMI) The proposed tests are extensions to the traditional Pearson chi-square tests for independence testing between single-response categorical variables

Kansas Farmer Data Comes from Loughin (1998) and Agresti and Liu (1999) Conducted by the Department of Animal Sciences at Kansas State University Two questions in the survey asked Kansas farmers about their sources of veterinary information and their swine waste storage methods Farmers were permitted to select as many responses as applied from a list of items

Data contd Interest lies in determining whether sources of veterinary information are independent of waste storage methods in a similar manner as would be done in a traditional Pearson chi-square test applied to a contingency table with single- response categorical variables A test for SPMI can be performed to determine whether each source of veterinary information is simultaneously independent of each swine waste storage method

Data contd 4 × 5 = 20 different 2 × 2 tables can be formed to marginally summarize all possible responses to item pairs Independence is tested in each of the 20 2 × 2 tables simultaneously for a test of SPMI Professional consultant 10 Lagoon

Data contd The test is marginal because responses are summed over the other item choices for each of the multiple-response categorical variables If SPMI is rejected, examination of the individual 2 × 2 tables can follow to determine why the rejection occurs

Notation Let W and Y = multiple-response categorical variables for an r × c tables row and column variables, respectively Sources of veterinary information are denoted by Y and waste storage methods are denoted by W The categories for each multiple-response categorical variable are called items (Agresti and Liu, 1999) ; For example, lagoon is one of the items for waste storage method Suppose W has r items and Y has c items. Also, suppose n subjects are sampled at random

Notation contd Let W si = 1 if a positive response is given for item i by subject s for i = 1,..,r and s = 1,..,n; W si = 0 for a negative response. Let Y sj for j = 1,.., c and s = 1..,n be similarly defined. The abbreviated notation, W i and Y j, refers generally to the binary response random variable for item i and j, respectively The set of correlated binary item responses for subject s are Y s = (Y s1, Y s2,…,Y sc ) and W s = (W s1, W s2,…,W sr )

Notation contd Cell counts in the joint table are denoted by n gh for the g th possible (W 1 …,W r ) and h th possible (Y 1 …,Y c ) The corresponding probability is denoted by τ gh. Multinomial sampling is assumed to occur within the entire joint table; thus, g,h τ gh = 1 Let m ij denote the number of observed positive responses to W i and Y j The marginal probability of a positive response to W i and Y j is denoted by π ij and its maximum likelihood estimate (MLE) is m ij /n.

Joint Table

SPMI Defined in Hypothesis Ho: π ij = π i π j for i = 1,...,r and j = 1,...,c Ha: At least one equality does not hold where π ij = P(W i = 1, Y j = 1), π i = P(W i = 1), and π j = P(Y j = 1). This specifies marginal independence between each W i and Y j pair P(W i = 1, Y j = 1) = π ij P(W i = 1, Y j = 0) =π i π ij P(W i = 0, Y j = 1) = πj π ij P(W i =0, Y j = 0) = 1 π i πj + π ij

Hypothesis SPMI can be written as OR WY,ij =1 for i = 1,…,r and j = 1,…,c where OR is the abbreviation for odds ratio and – OR WY,ij = π ij (1 π i πj + π ij )/[(π i π ij )(π j π ij )] Therefore, SPMI represents simultaneous independence in the rc 2 × 2 pairwise item response tables formed for each W i and Y j pair Join independence implies SPMI but the reverse is not true

Modified Pearson Statistic Under the Null (1,1), (1,0), (0,1), (1,1) YjYj WiWi 10 1π ij π i π ij πi 0πj π ij 1 π i πj + π ij 1-πi πj 1-πj

The Statistic

Nonparametric Bootstrap To resample under independence of W and Y, W s and Y s are independently resampled with replacement from the data set. The test statistic calculated for the b th resample of size n is denoted by X 2 S,b. The p-value is calculated as – B -1 b I(X 2 S,b X 2 S ) where B is the number of resamples taken and I() is the indicator function

Bootstrap p-Value Combination Methods Each X 2 S,i,j gives a test for independence between each W i and Y j pair for i = 1,…,r and j = 1,…,c. The p-values from each of these tests (using a χ 2 1 approximation) can be combined to form a new statistic p tilde the product of the r×c p-values or the minimum of the r×c p-values could be used as p tilde The p-value is calculated as – B -1 b I(p* tilde p tilde)

Results from the Farmer Data MethodMy p-valueAuthors p-value Bootstrap X 2 s < Bootstrap product of p-values Bootstrap minimum p-values

Interpretation and Follow-Up The p-values show strong evidence against SPMI Since X 2 S is the sum of rc different Pearson chi-square test statistics, each X 2 S,i,j can be used to measure why SPMI is rejected The individual tests can be done using an asymptotic χ 2 1 approximation or the estimated sampling distribution of the individual statistics calculated in the proposed bootstrap procedures When this is done, the significant combinations are (Lagoon, pro consultant), (Lagoon, Veterinarian), (Pit, Veterinarian), (Pit, Feed companies & representatives), (Natural drainage, pro consultant), (Natural drainage, Magazines)

Simulation Study which testing procedures hold the correct size under a range of different situations and have power to detect various alternative hypotheses 500 data sets for each simulation setting investigated The SPMI testing methods are applied (B = 1000), and for each method the proportion of data sets are recorded for which SPMI is rejected at the 0.05 nominal level

My Results n=100 2×2 marginal table OR = 25 MethodMy p-valueAuthors p-value Bootstrap X 2 s Bootstrap product of p-values Bootstrap minimum p-values

Conclusion The bootstrap methods generally hold the correct size