EPI809/Spring 20081 Fishers Exact Test Fishers Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and.

Slides:

Advertisements

Similar presentations

1 2 Test for Independence 2 Test for Independence.

Advertisements

1 2 Test for Independence 2 Test for Independence.

NIPRL Chapter 10. Discrete Data Analysis 10.1 Inferences on a Population Proportion 10.2 Comparing Two Population Proportions 10.3 Goodness of Fit Tests.

EPI809/Spring Chapter 10 Hypothesis testing: Categorical Data Analysis.

Chapter 16: Association and Fisher’s Exact Test

Contingency Table Analysis Mary Whiteside, Ph.D..

Functions of Random Variables. Method of Distribution Functions X 1,…,X n ~ f(x 1,…,x n ) U=g(X 1,…,X n ) – Want to obtain f U (u) Find values in (x 1,…,x.

Categorical Data Analysis

Statistical inference uses impersonal chance to draw conclusions about a population or process based on data drawn from a random sample or randomized.

Special random variables Chapter 5 Some discrete or continuous probability distributions.

Hypothesis Testing IV Chi Square.

1 Set #3: Discrete Probability Functions Define: Random Variable – numerical measure of the outcome of a probability experiment Value determined by chance.

Unit 5 Section : The Binomial Distribution  Situations that can be reduces to two outcomes are known as binomial experiments.  Binomial experiments.

Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population Mean:  Unknown Population Mean:  Unknown n Determining the.

Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.

Assuming normally distributed data! Naïve Bayes Classifier.

12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.

AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.

Stat 301 – Day 20 Fisher’s Exact Test. Announcements HW 4 (45 pts)  Classify variables (categorical vs. quantitative)  Round vs. truncating to produce.

Stat 301 – Day 21 Large sample methods. Announcements HW 4  Updated solutions Especially Simpson’s Paradox  Should always show your work and explain.

Less on Third Tee Moron Con, Tin Gents See Tables.

Previous Lecture: Analysis of Variance

Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.

Quiz 4  Probability Distributions. 1. In families of three children what is the mean number of girls (assuming P(girl)=0.500)? a) 1 b) 1.5 c) 2 d) 2.5.

HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 5.2.

Lesson 6 – 2b Hyper-Geometric Probability Distribution.

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.

Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal

Lecture 4 1 Discrete distributions Four important discrete distributions: 1.The Uniform distribution (discrete) 2.The Binomial distribution 3.The Hyper-geometric.

Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.

Bernoulli Trials Two Possible Outcomes –Success, with probability p –Failure, with probability q = 1  p Trials are independent.

Discrete Probability Distributions. Random Variable Random variable is a variable whose value is subject to variations due to chance. A random variable.

Binomial Probability Distribution

The Binomial Distribution

Pemodelan Kualitas Proses Kode Matakuliah: I0092 – Statistik Pengendalian Kualitas Pertemuan : 2.

Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.

Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.

The Binomial Distribution

4.2 Binomial Distributions

Statistics 3502/6304 Prof. Eric A. Suess Chapter 4.

AP Statistics Section 11.1 B More on Significance Tests.

Session IV: Contingency Tables Tests of Association and Independence (Zar, Chapters 23, 24.10)

Section 7.5: Binomial and Geometric Distributions

Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.

MATH 4030 – 11A ANALYSIS OF R X C TABLES (GOODNESS-OF-FIT TEST) 1.

This is a discrete distribution. Situations that can be modeled with the binomial distribution must have these 4 properties: Only two possible outcomes.

Warm Up When rolling an unloaded die 10 times, the number of time you roll a 1 is the count X of successes in each independent observations. 1. Is this.

The Hypergeometric Distribution. Properties Exactly two outcomes Fixed number of trials Outcomes must be mutually exclusive Probabilities are dependent.

Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.

Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.

What parameter is being tested? Categorical  proportionNumeric  mean.

+ Chapter 8 Day 3. + Warm - Up Shelly is a telemarketer selling cookies over the phone. When a customer picks up the phone, she sells cookies 25% of the.

Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.

Chapter 11 Chi Square Distribution and Its applications.

Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.

Exact probability test (Fisher’s Test) Oh Hun Kwon.

MATH 2311 Section 3.3.

Ch3.5 Hypergeometric Distribution

Hypothesis testing. Chi-square test

Data Analysis for Two-Way Tables

Review of Hypothesis Testing

Relations in Categorical Data

Exam 4/19 40 – 50 questions 1.5 hours long

Examining Two Proportions

Section 11.7 Probability.

Exact Test Fisher’s Statistics

Hypothesis Testing - Chi Square

Presentation transcript:

EPI809/Spring Fishers Exact Test Fishers Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. The test holds the marginal totals fixed and computes the hypergeometric probability that n 11 is at least as large as the observed value Fishers Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. The test holds the marginal totals fixed and computes the hypergeometric probability that n 11 is at least as large as the observed value Useful when E(cell counts) < 5. Useful when E(cell counts) < 5.

EPI809/Spring Hypergeometric distribution Example: 2x2 table with cell counts a, b, c, d. Assuming marginal totals are fixed: Example: 2x2 table with cell counts a, b, c, d. Assuming marginal totals are fixed: M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d. for convenience assume N1<N2, M1<M2. possible value of a are: 0, 1, …min(M1,N1). Probability distribution of cell count a follows a hypergeometric distribution: Probability distribution of cell count a follows a hypergeometric distribution: N = a + b + c + d = N1 + N2 = M1 + M2 Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!] Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!] Mean (x) = M1N1/ N Mean (x) = M1N1/ N Var (x) = M1M2N1N2 / [N 2 (N-1)] Var (x) = M1M2N1N2 / [N 2 (N-1)] Fisher exact test is based on this hypergeometric distr. Fisher exact test is based on this hypergeometric distr.

EPI809/Spring Fishers Exact Test Example Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% level. Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% level. yesnototal yes3710 no51015 total817 HIV Infection Hx of STDs

EPI809/Spring Hypergeometric prob. Probability of observing this specific table given fixed marginal totals is Probability of observing this specific table given fixed marginal totals is Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!] = Note the above is not the p-value. Why? Note the above is not the p-value. Why? Not the accumulative probability, or not the tail probability. Not the accumulative probability, or not the tail probability. Tail prob = sum of all values (a = 3, 2, 1, 0). Tail prob = sum of all values (a = 3, 2, 1, 0).

EPI809/Spring Hypergeometric prob Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!] = Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!] = Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!] = Tail prob = =.6068

EPI809/Spring Data dis; input STDs $ HIV $ count; cards; no no 10 No Yes 5 yes no 7 yes yes 3 ;run; proc freq data=dis order=data; weight Count; weight Count; tables STDs*HIV/chisq fisher; tables STDs*HIV/chisq fisher;run; Fishers Exact Test SAS Codes

EPI809/Spring Pearson Chi-squares test Yates correction Pearson Chi-squares test Pearson Chi-squares test χ 2 = i (O i -E i ) 2 /E i follows a chi-squares distribution with df = (r-1)(c-1) χ 2 = i (O i -E i ) 2 /E i follows a chi-squares distribution with df = (r-1)(c-1) if E i 5. Yates correction for more accurate p-value Yates correction for more accurate p-value χ 2 = i (|O i -E i | - 0.5) 2 /E i χ 2 = i (|O i -E i | - 0.5) 2 /E i when O i and E i are close to each other.

EPI809/Spring Fishers Exact Test SAS Output Statistics for Table of STDs by HIV Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square Likelihood Ratio Chi-Square Continuity Adj. Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F Right-sided Pr >= F Table Probability (P) Two-sided Pr <= P

EPI809/Spring Fishers Exact Test The output consists of three p-values: Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. 2-Tail: Use this when there is no prior alternative.

EPI809/Spring Useful Measures of Association - Nominal Data Cohens Kappa ( ) Cohens Kappa ( ) Also referred to as Cohens General Index of Agreement. It was originally developed to assess the degree of agreement between two judges or raters assessing n items on the basis of a nominal classification for 2 categories. Subsequent work by Fleiss and Light presented extensions of this statistic to more than 2 categories. Also referred to as Cohens General Index of Agreement. It was originally developed to assess the degree of agreement between two judges or raters assessing n items on the basis of a nominal classification for 2 categories. Subsequent work by Fleiss and Light presented extensions of this statistic to more than 2 categories.

EPI809/Spring Useful Measures of Association - Nominal Data Cohens Kappa ( ) Cohens Kappa ( ) Inspector A GoodBad Inspector B Bad Good.40 (p B ).60 (q B ) (p A ) (q A )

EPI809/Spring Useful Measures of Association - Nominal Data Cohens Kappa ( ) Cohens Kappa ( ) Cohens requires that we calculate two values: Cohens requires that we calculate two values: p o : the proportion of cases in which agreement occurs. In our example, this value equals p o : the proportion of cases in which agreement occurs. In our example, this value equals Pe : the proportion of cases in which agreement would have been expected due purely to chance, based upon the marginal frequencies; where Pe : the proportion of cases in which agreement would have been expected due purely to chance, based upon the marginal frequencies; where p e = p A p B + q A q B = for our data

EPI809/Spring Useful Measures of Association - Nominal Data Cohens Kappa ( ) Cohens Kappa ( ) Then, Cohens measures the agreement between two variables and is defined by Then, Cohens measures the agreement between two variables and is defined by = p o - p e 1 - p e = 0.593

EPI809/Spring Useful Measures of Association - Nominal Data Cohens Kappa ( ) Cohens Kappa ( ) To test the Null Hypothesis that the true kappa = 0, we use the Standard Error: To test the Null Hypothesis that the true kappa = 0, we use the Standard Error: then z = / ~N(0,1) then z = / ~N(0,1) = 1 (1 - p e ) n p e + p e 2 - p i. p.i (p i. + p.i ) i = 1 k where p i. & p.i refer to row and column proportions (in textbook, ai= p i. & bi=p.i )

EPI809/Spring Useful Measures of Association - Nominal Data- SAS CODES Data kap; input B $ A $ prob; n=100; count=prob*n;cards; Good Good.33 Good Bad.07 Bad Good.13 Bad Bad.47 ; run; proc freq data=kap order=data; weight Count; weight Count; tables B*A/chisq; tables B*A/chisq; test kappa; test kappa; run;

EPI809/Spring Useful Measures of Association - Nominal Data- SAS OUTPUT The FREQ Procedure Statistics for Table of B by A Simple Kappa Coefficient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Kappa ASE % Lower Conf Limit % Upper Conf Limit Test of H0: Kappa = 0 ASE under H Z One-sided Pr > Z <.0001 Two-sided Pr > |Z| <.0001 Sample Size = 100

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions The approximate test previously presented for assessing a difference in proportions is based upon the assumption that the two samples are independent. The approximate test previously presented for assessing a difference in proportions is based upon the assumption that the two samples are independent. Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20% have flu. Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20% have flu. Basis / Rationale for the Test

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions We might be tempted to suppose that no hypothesis test is required under these conditions, in that the Before and After p values are identical, and would surely result in a test statistic value of We might be tempted to suppose that no hypothesis test is required under these conditions, in that the Before and After p values are identical, and would surely result in a test statistic value of The problem with this thinking, however, is that the two sample p values are dependent, in that each person was assessed twice. It is possible that the 20 people that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test were a completely different set of 20 people! The problem with this thinking, however, is that the two sample p values are dependent, in that each person was assessed twice. It is possible that the 20 people that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test were a completely different set of 20 people!

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions It is for precisely this type of situation that McNemars Test for Correlated (Dependent) Proportions is applicable. It is for precisely this type of situation that McNemars Test for Correlated (Dependent) Proportions is applicable. McNemars Test employs two unique features for testing the two proportions: McNemars Test employs two unique features for testing the two proportions: * a special fourfold contingency table; with a * special-purpose chi-square ( 2 ) test statistic (the approximate test).

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions Nomenclature for the Fourfold (2 x 2) Contingency Table AB DC (B + D) (A + C) (C + D) (A + B) n where (A+B) + (C+D) = (A+C) + (B+D) = n = number of units evaluated and where df = 1

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions 1. Construct a 2x2 table where the paired observations are the sampling units. 2. Each observation must represent a single joint event possibility; that is, classifiable in only one cell of the contingency table. 3. In its Exact form, this test may be conducted as a One Sample Binomial for the B & C cells Underlying Assumptions of the Test

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions 4. The expected frequency (f e ) for the B and C cells on the contingency table must be equal to or greater than 5; where 4. The expected frequency (f e ) for the B and C cells on the contingency table must be equal to or greater than 5; where f e = (B + C) / 2 from the Fourfold table Underlying Assumptions of the Test

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions Sample Problem A randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions Sample Problem The summary data for this study appear as follows:

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions The data are then entered into the Fourfold Contingency table: Before PassFail Pass Fail After

EPI809/Spring Step I : State the Null & Research Hypotheses Step I : State the Null & Research Hypotheses H 0 : = H 0 : = H 1 : H 1 : where and relate to the proportion of observations reflecting changes in status (the B & C cells in the table) Step II : 0.05 Step II : 0.05 McNemars Test for Correlated (Dependent) Proportions

EPI809/Spring Step III : State the Associated Test Statistic Step III : State the Associated Test Statistic ABS (B - C) - 1 } 2 2 = B + C McNemars Test for Correlated (Dependent) Proportions

EPI809/Spring Step IV : State the distribution of the Test Statistic When H o is True Step IV : State the distribution of the Test Statistic When H o is True 2 = 2 with 1 df when H o is True 2 = 2 with 1 df when H o is True d McNemars Test for Correlated (Dependent) Proportions

EPI809/Spring Step V : Reject H o if ABS ( 2 ) > 3.84 McNemars Test for Correlated (Dependent) Proportions X² Distribution X² Chi-Square Curve w. df=1

EPI809/Spring Step VI : Calculate the Value of the Test Statistic Step VI : Calculate the Value of the Test Statistic McNemars Test for Correlated (Dependent) Proportions ABS (56 - 4) - 1 } 2 2 = =

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions-SAS Codes Data test; input Before $ After $ count; cards; pass pass 56 pass fail 56 fail pass 4 fail fail 4; run; proc freq data=test order=data; weight Count; weight Count; tables Before*After/agree; tables Before*After/agree; run;

EPI809/Spring McNemars Test for Correlated (Dependent) Proportions-SAS Output Statistics for Table of Before by After McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) DF 1 Pr > S <.0001 Sample Size = 120 Without the correction

EPI809/Spring Conclusion: What we have learned 1. Comparison of binomial proportion using Z and 2 Test. 2. Explain 2 Test for Independence of 2 variables 3. Explain The Fishers test for independence 4. McNemars tests for correlated data 5. Kappa Statistic 6. Use of SAS Proc FREQ

EPI809/Spring Conclusion: Further readings Read textbook for Read textbook for 1. Power and sample size calculation 2. Tests for trends