Analysis of Categorical Data

Slides:



Advertisements
Similar presentations
Contingency Tables Prepared by Yu-Fen Li.
Advertisements

M2 Medical Epidemiology
CHI-SQUARE(X2) DISTRIBUTION
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Tests for Homogeneity.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Text books: (1)Medical Statistics A commonsense approach A commonsense approach By By Michael J. Campbell & David Machin Michael J. Campbell & David Machin.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Analysis of frequency counts with Chi square
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Chi Square Test Dealing with categorical dependant variable.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
11-3 Contingency Tables In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data.
Cross-Tabulations.
Comparing Population Parameters (Z-test, t-tests and Chi-Square test) Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director,
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Amsterdam Rehabilitation Research Center | Reade Testing significance - categorical data Martin van der Esch, PhD.
A random sample of 300 doctoral degree
SIMPLE TWO GROUP TESTS Prof Peter T Donnan Prof Peter T Donnan.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
HSRP 734: Advanced Statistical Methods May 29, 2008.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
+ Chi Square Test Homogeneity or Independence( Association)
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
Contingency Tables 1.Explain  2 Test of Independence 2.Measure of Association.
CHI SQUARE TESTS.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Case Control Study : Analysis. Odds and Probability.
Chapter Outline Goodness of Fit test Test of Independence.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
More Contingency Tables & Paired Categorical Data Lecture 8.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Types of Categorical Data Qualitative/Categorical Data Nominal CategoriesOrdinal Categories.
1 Always be mindful of the kindness and not the faults of others.
Chi Square Tests PhD Özgür Tosun. IMPORTANCE OF EVIDENCE BASED MEDICINE.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc Tests for Homogeneity and Independence in a Two-Way Table Data resulting from observations.
Chi Square Test Dr. Asif Rehman.
I. ANOVA revisited & reviewed
CHI-SQUARE(X2) DISTRIBUTION
The Chi-square Statistic
Lecture8 Test forcomparison of proportion
Association between two categorical variables
Qualitative data – tests of association
Elementary Statistics
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Analyzing the Association Between Categorical Variables
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Presentation transcript:

Analysis of Categorical Data Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Outline Types of categorical analysis Steps to analysis

Overview univariable analysis Dependent variable Independent variable Number of groups in independent variable Parametric test Non parametric test Numerical (one) - One sample t Sign test Categorical 2 groups (independent) Independent t Mann Whitney 2 groups (dependent) Paired t Signed rank test > 2 groups (independent) One way ANOVA Kruskal Wallis (2 groups) Chi square test Fisher exact test McNemar test

Introduction Categorical data analysis deals with discrete data that can be organized into categories. The data are organized into a contingency table.

Types of categorical data analysis Statistical tests One proportion Chi-square goodness of fit Two proportion Independent sample Pearson chi-square / Fisher exact Dependent sample McNemar test Stratified sampling to control confounder Mantel-Haenszel test

Hypothesis testing Step 1: State the hypotheses Step 2: Set the significance level Step 3: Check the assumptions Step 4: Perform the statistical analysis Step 5: Make interpretation Step 6: Draw conclusion

Contingency table Consists of two columns and two rows. Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk factors Column: dependent variable / outcome

Example of contingency table CHD present CHD absent Total Smoker 138 32 170 Non-smoker 263 105 368 137 401 538

Pearson Chi-square To test the association between two categorical variables Independent sample Result of test: Not significant: no association Significant: an association

Research Question Does estrogen receptor associated with breast cancer status? Data: Breast cancer.sav

Step 1: State the hypothesis HO: There is no association between estrogen receptor and breast cancer status. HA: There is an association between estrogen receptor and breast cancer status.

Step 2: Set the significance level α = 0.05

Step 3: Check the assumption Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Breast Ca Total Died Alive ER - ve 310 28 338 ER + ve 508 23 531 818 51 869

Step 3: Check the assumption Variable Breast Ca Total Died Alive ER - ve 310 E = 318.2 28 E = 19.8 338 ER + ve 508 E = 499.8 23 E = 31.2 531 818 51 869

Step 4: Statistical test Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.01 – 0.02

Step 4: Statistical test 1 5 3 7 2 6 8 10 9

Step 5: Interpretation p value = 0.016 < 0.05 – reject HO, accept HA

Step 6: Conclusion There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).

Fisher’s Exact Test To test the association between two categorical variables Independent sample Sample sizes are small

Research Question Does gender associated with coronary heart disease? Data: CHD data.sav

Step 1: State the hypothesis HO: There is no association between gender and coronary heart disease. HA: There is an association between gender and coronary heart disease.

Step 2: Set the significance level α = 0.05

Step 3: Check the assumption Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Coronary Heart Disease Total Presence Absent Male 15 5 20 Female 10 25 30

Step 3: Check the assumption Variable Coronary Heart Disease Total Presence Absent Male 15 E = 16.7 5 E = 3.3 20 Female 10 E = 8.3 E = 1.7 25 30 2 cells (50%) – expected count < 5

Step 4: Statistical test Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 3.0968 df = (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.1 – 0.05

Step 4: Statistical test 1 5 3 7 6 2 8 10 9

Step 5: Interpretation p value = 0.140 > 0.05 – accept HO

Step 6: Conclusion There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).

McNemar Test Categorical data Dependent sample - Matched sample - Cross over design - Before & after (same subject) To determine whether the row and column marginal frequencies are equal (marginal homogeneity)

Hypotheses Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the same HO : PB = PC HA : PB ≠ PC A & D = concordant pair B & C = discordant pair Discordant pair is pair of different outcome

Research Question Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer? The sample were breast cancer patients - matched for age (same decade of age) - same clinical condition Data: breast ca.sav

Step 1: State the hypothesis HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer. HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

Step 2: Set the significance level α = 0.05

Step 3: Check the assumption Two variables are dependent Two variables are categorical

Step 4: Statistical test x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8) =6.125 df = (R-1)(C-1) = (2-1)(2-1) = 1 Calculated x2 > tabulated x2 *x2 = (|b-c|-0.5)2/(b + c)

Step 4: Statistical test 3 6 2 1 9 7 4 5 8

Step 5: Interpretation p value = 0.008 < 0.05 – reject HO, accept HA

Step 6: Conclusion There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).

Cochran Mantel-Haenszel Test Test is a method to compare the probability of an event among independent groups in stratified samples. The stratification factor can be study center, gender, race, age groups, obesity status or disease severity. Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables). The data are arranged in a series of associated 2 × 2 contingency tables.

Research Question Does the type of treatment associated with response of treatment among migraine patients after controlling for gender? Confounder: gender Active Placebo Female No of patients 27 25 No of better response 16 5 Male 28 26 12 7

Step 1: 2x2 contingency table Better Same Total Reasons of failure Strata 1 Female Active 16 11 27 Placebo 5 20 25 Strata 2 Male 12 28 7 19 26

Step 2: Check the assumption Random sampling Stratified sampling

Step 3: State the hypothesis HO: There is no association between type of treatment and response of treatment among female and male migraine patients. HA: There is an association between type of treatment and response of treatment among female and male migraine patients.

Step 4: Statistical test Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) Compute Mantel-Haenszel statistics x2MH = ∑(ai –ei)2 ∑vi

Step 4: Statistical test Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni e1 = (16 +11)(16+ 5) 52 = 10.9038 e2 = (12 +16)(12+ 7) 54 = 9.8519

Step 4: Statistical test Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) v1 = (16 + 11)(5 + 20)(16 + 5)(11+20) (52)2(52-1) = 3.1865 v2 = (12 + 16)(7 + 19)(12 + 7)(16+19) (54)2(54-1) = 3.1325

Step 4: Statistical test Compute Mantel-Haenszel statistics x2MH = (∑ai –∑ei)2 ∑vi = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31

Step 4: Statistical test Compute odd ratio ORMH = ∑(ai di/ ni) ∑(bi ci/ ni) = (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313

Step 4: Statistical test Data: Migraine.sav 1 3 2 4 6 5

Step 5: Interpretation Compute Mantel-Haenszel statistics x2MH = (∑ai –∑ei)2 ∑vi = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31 Calculated value > tabulated value Reject HO

Step 5: Interpretation HO = OR1 = OR2 Association homogenous *Tarone’s - adjusted HO = OR1 = 1 HO = OR2 = 1 Conditionally independent The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios.

Step 6: Conclusion There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004). We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.