Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University.

Similar presentations


Presentation on theme: "Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University."— Presentation transcript:

1

2 Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: fuw@msu.edu fuw@msu.edu www: http://www.msu.edu/~fuw http://www.msu.edu/~fuw

3 Categorical Data Analysis Examples of categorical data. Examples of categorical data. Qualitative Random Variables Yield Responses That Can Be Put In Categories. Example: Gender (Male, Female), disease (+, -). Measurement or Count Reflect # in Category Measurement or Count Reflect # in Category Nominal (no order) or Ordinal Scale (order) Nominal (no order) or Ordinal Scale (order) Data can be collected as continuous but recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, hypertension ) Data can be collected as continuous but recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, hypertension ) Counts of episodes of incidence. Counts of episodes of incidence.

4 Categorical Data Analysis Responses are not continuous, cannot use linear regression model or ANOVA model. Responses are not continuous, cannot use linear regression model or ANOVA model. Not even to use transformations to make response variable meaningful under new scale. Not even to use transformations to make response variable meaningful under new scale. Need to introduce new methods to analyze data. Need to introduce new methods to analyze data.

5 Test for Two Proportions

6 Z Test for Two Proportions 1.Assumptions Populations Are Independent Populations Are Independent Populations Follow Binomial Distribution Populations Follow Binomial Distribution Normal Approximation Can Be Used for large samples (All Expected Counts  5) Normal Approximation Can Be Used for large samples (All Expected Counts  5) 2. Z-Test Statistic for Two Proportions

7 Sample Distribution for Difference Between Proportions

8 Z Test for Two Proportions Thinking Challenge You’re an epidemiologist for the US Department of Health and Human Services. You’re studying the prevalence of disease X in two states (MA and CA). In MA, 74 of 1500 people surveyed were diseased and in CA, 129 of 1500 were diseased. At.05 level, does MA have a lower prevalence rate? You’re an epidemiologist for the US Department of Health and Human Services. You’re studying the prevalence of disease X in two states (MA and CA). In MA, 74 of 1500 people surveyed were diseased and in CA, 129 of 1500 were diseased. At.05 level, does MA have a lower prevalence rate? MA CA

9 Test Statistic: Decision:Conclusion: Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0  =.05 n MA = 1500 n CA = 1500 Critical Value(s):

10 Z Test for Two Proportions Solution*

11 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0  =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion:

12 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0  =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at  =.05

13 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0  =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at  =.05 There is evidence MA is less than CA

14 Chi-Square (  2 ) Test for k Proportions 1.Tests Equality (=) of Proportions Only Example: p 1 =.2, p 2 =.3, p 3 =.5 2.One Variable With Several Levels 3.Assumptions Multinomial Experiment Large Sample Size (All Expected Counts  5) 4.Uses One-Way Contingency Table

15 Multinomial Experiment 1. n Identical Trials 2. k Outcomes to Each Trial 3. Constant Outcome Probability, p 1,…, p k 4. Independent Trials 5. Random Variable is Count, n 1,…, n k 6. Example: In health services research, ask 100 workers Which of 3 health insurance plans (k=3) they prefer

16 One-Way Contingency Table 1.Shows # Observations in k Independent Groups (Outcomes or Variable Levels) Outcomes (k = 3) Number of responses

17 1.Hypotheses H 0 : p 1 = p 1,0, p 2 = p 2,0,..., p k = p k,0 H a : Not all p i are equal 2.Test Statistic 3.Degrees of Freedom: k - 1  2 Test for k Proportions Hypotheses & Statistic Observed count Expected count Number of outcomes Hypothesized probability

18  2 Test Basic Idea 1.Compares Observed Count to Expected Count under the Null Hypothesis 2.The Closer Observed Count to Expected Count, the More Likely the H 0 Is True Measured by Squared Difference Relative to Expected Count Measured by Squared Difference Relative to Expected Count Reject Large Values Reject Large Values

19 Finding Critical Value Example What is the critical  2 value if k = 3, &  =.05?  =.05  2 Table (Portion) df= k - 1 = 2 If n i = E(n i ),  2 = 0. Do not reject H 0

20 As an MD epidemiologist, you want to test the difference in prevention measures for patients with CVD diseases Hx. Of 180 patients, 63 were following strict prevention measures regularly (taking beta- blockers, aspirin, etc…), 45 were taking some prevention measures but not on a regular basis and 72 were not following any preventions. At the.05 level, is there a difference in prevention measures? As an MD epidemiologist, you want to test the difference in prevention measures for patients with CVD diseases Hx. Of 180 patients, 63 were following strict prevention measures regularly (taking beta- blockers, aspirin, etc…), 45 were taking some prevention measures but not on a regular basis and 72 were not following any preventions. At the.05 level, is there a difference in prevention measures?  2 Test for k Proportions Example

21  2 Test for k Proportions Solution H 0 : p 1 = p 2 = p 3 = 1/3 H a : At least 1 is different  =.05 n 1 = 63 n 2 = 45 n 3 = 72 Critical Value(s): Test Statistic: Decision:Conclusion:  =.05

22  2 Test for k Proportions Solution

23 H 0 : p 1 = p 2 = p 3 = 1/3 H a : At least 1 is different  =.05 n 1 = 63 n 2 = 45 n 3 = 72 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at  =.05 There is evidence of a difference in proportions  =.05  2 = 6.3

24  2 Test for k Proportions SAS Codes Data CVD; input method count; datalines; 1 63 2 45 3 72 ; run; proc freq data=CVD order=data; weight Count; weight Count; tables method/nocum testp=(0.333 0.333 0.333); tables method/nocum testp=(0.333 0.333 0.333); run;

25  2 Test for k Proportions SAS Output The FREQ Procedure Test method Frequency Percent Percent --------------------------------------------------------- 1 63 35.00 33.30 2 45 25.00 33.30 3 72 40.00 33.30 Chi-Square Test for Specified Proportions -------------------------------------------- Chi-Square 6.3065 DF 2 Pr > ChiSq 0.0427 Sample Size = 180

26 R program for Chisq.test chisq.test(c(63,45,72),p=c(1/3,1/3,1/3)) chisq.test(c(63,45,72),p=c(1/3,1/3,1/3)) Chi-squared test for given probabilities data: c(63, 45, 72) X-squared = 6.3, df = 2, p-value = 0.04285 X-squared = 6.3, df = 2, p-value = 0.04285

27 Relation to the Z test for comparison of two proportions? When k=2, only two groups are to be compared, the Z test and the  2 Test for 2 Proportions are equivalent tests.(p 1 =p 2 =1/2) When k=2, only two groups are to be compared, the Z test and the  2 Test for 2 Proportions are equivalent tests.(p 1 =p 2 =1/2) Notice that when k=2, the  2 Test for 2 Proportions will have 1df Notice that when k=2, the  2 Test for 2 Proportions will have 1df In distribution,  2 = Z 2 In distribution,  2 = Z 2

28  2 Test of Homogeneity 1.Shows If a Relationship Exists Between 2 Qualitative Variables, but does Not Show Causality (no specification of dependency) 2.Assumptions Multinomial Experiment All Expected Counts  5 3.Uses Two-Way Contingency Table

29  2 Test of Homogeneity Contingency Table 1.Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables Levels of variable 2 Levels of variable 1

30  2 Test of Homogeneity Hypotheses & Statistic 1.Hypotheses H 0 : Homogeneity or equal proportions H a : Heterogeneity or unequal proportions 2.Test Statistic Degrees of Freedom: (r - 1)(c - 1) Rows Columns Observed count Expected count

31  2 Test of Homogeneity Expected Counts 1.Statistical Independence Means Joint Probability Equals Product of Marginal Probabilities 2.Compute Marginal Probabilities & Multiply for Joint Probability 3.Expected Count Is Sample Size Times Joint Probability

32 Expected Count Example 112 160 Marginal probability =

33 Expected Count Example 112 160 78 160 Marginal probability =

34 Expected Count Example 112 160 78 160 Marginal probability = Joint probability = 112 160 78 160 Expected count = 160· 112 160 78 160 = 54.6

35 Expected Count Calculation 112x82 160 48x78 160 48x82 160 112x78 160

36 You randomly sample 286 sexually active individuals and collect information on their HIV status and History of STDs. At the.05 level, is there evidence of a relationship? You randomly sample 286 sexually active individuals and collect information on their HIV status and History of STDs. At the.05 level, is there evidence of a relationship?  2 Test of Homogeneity Example on HIV

37  2 Test of Homogeneity Solution H 0 : No Relationship H a : Relationship  =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion:  =.05

38 E(n ij )  5 in all cells 170x132 286 170x154 286 116x132 286 154x132 286  2 Test of Homogeneity Solution

39

40 H 0 : No Relationship H a : Relationship  =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at  =.05  =.05  2 = 54.29

41  2 Test of Homogeneity SAS CODES Data dis; input STDs HIV count; cards; 1 1 84 1 2 32 2 1 48 2 2 122 ; run; Proc freq data=dis order=data; weight Count; weight Count; tables STDs*HIV/chisq; tables STDs*HIV/chisq; run;

42  2 Test of Homogeneity SAS OUTPUT Statistics for Table of STDs by HIV Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 54.1502 <.0001 Likelihood Ratio Chi-Square 1 55.7826 <.0001 Continuity Adj. Chi-Square 1 52.3871 <.0001 Mantel-Haenszel Chi-Square 1 53.9609 <.0001 Phi Coefficient 0.4351 Contingency Coefficient 0.3990 Cramer's V 0.4351 Continuity Correction ∑ i (|O i -E i |-0.5) 2 / E i

43 Fisher’s Exact Test Fisher’s Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. The test holds the marginal totals fixed and computes the hypergeometric probability that n 11 is at least as large as the observed value Fisher’s Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and the expected values are small. The test holds the marginal totals fixed and computes the hypergeometric probability that n 11 is at least as large as the observed value

44 Fisher’s Exact Test Example Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% level. Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% level. yesnototal yes3710 no51015 total817 HIV Infection Hx of STDs

45 Data dis; input STDs $ HIV $ count; cards; no no 10 No Yes 5 yes no 7 yes yes 3 ;run; proc freq data=dis order=data; weight Count; weight Count; tables STDs*HIV/chisq; tables STDs*HIV/chisq;run; Fisher’s Exact Test SAS Codes

46 Fisher’s Exact Test SAS Output Statistics for Table of STDs by HIV Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350 WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263 Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000

47 Fisher’s Exact Test The output consists of three p-values and one table probability. Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. 2-Tail: Use this when there is no prior alternative. Table probability – the probability for this table to occur – not a tail probability or not a p-value.

48 McNemar’s Test for Correlated (Dependent) Proportions The approximate test previously presented for assessing a difference in proportions is based upon the assumption that the two samples are independent. The approximate test previously presented for assessing a difference in proportions is based upon the assumption that the two samples are independent. Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; and on a post- test, we find that 20% have flu. Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; and on a post- test, we find that 20% have flu. Basis / Rationale for the Test

49 McNemar’s Test for Correlated (Dependent) Proportions We might be tempted to suppose that no hypothesis test is required under these conditions, in that the ‘Before’ and ‘After’ probabilities are identical, and would surely result in a test statistic value of 0.00. We might be tempted to suppose that no hypothesis test is required under these conditions, in that the ‘Before’ and ‘After’ probabilities are identical, and would surely result in a test statistic value of 0.00. The problem with this thinking, however, is that the two sample p values are dependent, in that each person was assessed twice. It is possible that the 20 people that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test were a completely different set of 20 people! The problem with this thinking, however, is that the two sample p values are dependent, in that each person was assessed twice. It is possible that the 20 people that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test were a completely different set of 20 people!

50 McNemar’s Test for Correlated (Dependent) Proportions It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) Proportions is applicable. It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) Proportions is applicable. McNemar’s Test employs two unique features for testing the two proportions: McNemar’s Test employs two unique features for testing the two proportions: * a special fourfold contingency table; with a * special-purpose chi-square (  2 ) test statistic (the approximate test).

51 McNemar’s Test for Correlated (Dependent) Proportions Sample Problem A randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.

52 McNemar’s Test for Correlated (Dependent) Proportions Sample Problem The summary data for this study appear as follows:

53 McNemar’s Test for Correlated (Dependent) Proportions Nomenclature for the Fourfold (2 x 2) Contingency Table AB DC (B + D) (A + C) (C + D) (A + B) n where (A+B) + (C+D) = (A+C) + (B+D) = n = number of units evaluated and where df = 1

54 McNemar’s Test for Correlated (Dependent) Proportions 1. Construct a 2x2 table where the paired observations are the sampling units. 2. Each observation must represent a single joint event possibility; that is, classifiable in only one cell of the contingency table. 3. In it’s Exact form, this test may be conducted as a One Sample Binomial for the B & C cells Underlying Assumptions of the Test

55 McNemar’s Test for Correlated (Dependent) Proportions 4. The expected frequency (f e ) for the B and C cells on the contingency table must be equal to or greater than 5; where 4. The expected frequency (f e ) for the B and C cells on the contingency table must be equal to or greater than 5; where f e = (B + C) / 2 from the Fourfold table Underlying Assumptions of the Test

56 McNemar’s Test for Correlated (Dependent) Proportions The data are then entered into the Fourfold Contingency table: 5656 44 6060 8 112 120 Before PassFail Pass Fail After

57 Step I : State the Null & Research Hypotheses Step I : State the Null & Research Hypotheses H 0 :   =   H 1 :      where   and    relate to the proportion of observations reflecting changes in status (the B & C cells in the table) Step II :   0.05 Step II :   0.05 McNemar’s Test for Correlated (Dependent) Proportions

58 Step III : State the Associated Test Statistic Step III : State the Associated Test Statistic   B – C| - 1 } 2  2 = B + C McNemar’s Test for Correlated (Dependent) Proportions Continuity Correction

59 Step IV : State the distribution of the Test Statistic When H o is True  2 =  2 with 1 df when H o is True  2 =  2 with 1 df when H o is True Step V : Reject Ho if test statistic  2 > 3.84 d McNemar’s Test for Correlated (Dependent) Proportions X² Distribution X²0 3.84 Chi-Square Curve w. 1 df

60 Step VI : Calculate the Value of the Test Statistic Step VI : Calculate the Value of the Test Statistic McNemar’s Test for Correlated (Dependent) Proportions   56 – 4| - 1 } 2  2 = = 43.35 56 + 4

61 McNemar’s Test for Correlated (Dependent) Proportions-SAS Codes Data test; input Before $ After $ count; cards; pass pass 56 pass fail 56 fail pass 4 fail fail 4; run; proc freq data=test order=data; weight Count; weight Count; tables Before*After/agree; tables Before*After/agree; run;

62 McNemar’s Test for Correlated (Dependent) Proportions-SAS Output Statistics for Table of Before by After McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001 Sample Size = 120 Without the correction

63 R program for McNemar’s Test > mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T) > mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T) McNemar's Chi-squared test with continuity correction McNemar's Chi-squared test with continuity correction data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11 McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11 > mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F) > mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F) McNemar's Chi-squared test McNemar's Chi-squared test data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-11 McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-11


Download ppt "Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University."

Similar presentations


Ads by Google