How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015.

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Hypothesis Testing IV Chi Square.
Chapter 13: Inference for Distributions of Categorical Data
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Statistical Tests Karen H. Hagglund, M.S.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
Two-Way Tables Two-way tables come about when we are interested in the relationship between two categorical variables. –One of the variables is the row.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Statistics for Managers Using Microsoft® Excel 5th Edition
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
BCOR 1020 Business Statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Nonparametrics and goodness of fit Petter Mostad
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
David Yens, Ph.D. NYCOM PASW-SPSS STATISTICS David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT l PRESENTATION.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-1 CHAPTER 17 BIVARIATE STATISTICS: NONPARAMETRIC TESTS.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Analysis of Categorical Data
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Amsterdam Rehabilitation Research Center | Reade Testing significance - categorical data Martin van der Esch, PhD.
NONPARAMETRIC STATISTICS
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
1 Always be mindful of the kindness and not the faults of others.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
+ Chi Square Test Homogeneity or Independence( Association)
CHI SQUARE TESTS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Nonparametric tests, rank-based tests, chi-square tests 1.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 Always be mindful of the kindness and not the faults of others.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 4 Selected Nonparemetric Techniques: PARAMETRIC VS. NONPARAMETRIC.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chi Square Test Dr. Asif Rehman.
I. ANOVA revisited & reviewed
March 28 Analyses of binary outcomes 2 x 2 tables
Chapter 12 Chi-Square Tests and Nonparametric Tests
Lecture8 Test forcomparison of proportion
Association between two categorical variables
Chapter 12 Tests with Qualitative Data
Hypothesis testing. Chi-square test
Basic Statistics Overview
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
We’ll now consider 2x2 contingency tables, a table which has only 2 rows and 2 columns along with a special way to analyze it called Fisher’s Exact Test.
Hypothesis testing. Chi-square test
Association, correlation and regression in biomedical research
Chapter 10 Analyzing the Association Between Categorical Variables
Chi Square (2) Dr. Richard Jackson
Analyzing the Association Between Categorical Variables
Presentation transcript:

How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015

Objectives Non-parametric statistical procedures Non-parametric statistical procedures Basic concept Basic concept The Wilcoxon Signed Rank Test The Wilcoxon Signed Rank Test The Wilcoxon Rank Sum Test The Wilcoxon Rank Sum Test Contingency tables Contingency tables Definitions and use Definitions and use Measures of association Measures of association

Nonparametric Tests Distribution free tests Distribution free tests valid for most distributions (may not be the most powerful). valid for most distributions (may not be the most powerful). If appropriate, the parametric methods should be used, since they reject false null hypotheses with higher probability. If appropriate, the parametric methods should be used, since they reject false null hypotheses with higher probability.

Nonparametric Tests, cont’d If we know the mean and the standard deviation of a normal distribution we can calculate probabilities. If we know the mean and the standard deviation of a normal distribution we can calculate probabilities. Means and standard deviations are called Parameters. Means and standard deviations are called Parameters. Statistical tests that assume a distribution and use parameters are called parametric tests Statistical tests that assume a distribution and use parameters are called parametric tests Statistical tests that don't assume a distribution or use parameters are called nonparametric tests Statistical tests that don't assume a distribution or use parameters are called nonparametric tests

Use of Nonparametric Tests things not normally distributed things not normally distributed Using a t-test could be inappropriate and misleading. Using a t-test could be inappropriate and misleading. Nonparametric tests have fewer assumptions Nonparametric tests have fewer assumptions data only needs to be rankable (smallest to largest). data only needs to be rankable (smallest to largest). Examples: Examples: Pain: mild, moderate, severe Pain: mild, moderate, severe

Example UntrainedTrained The maximum oxygen intake adjusted for body weight (ml/kg/min).

Example Observation Ranks Observations and ranks. If there is no difference between the two groups, the ranks should be approximately same.

The Wilcoxon Signed Rank Test Assumptions: Assumptions: The scale of observations is at least ordinal within each pair of observations. The scale of observations is at least ordinal within each pair of observations. Which observation within each pair is “larger” can be determined (they can be “equal”). Which observation within each pair is “larger” can be determined (they can be “equal”). Symmetry. Symmetry.

Calculate differences Calculate differences Rank differences from smallest to largest Rank differences from smallest to largest Add signs of differences to ranks. Add signs of differences to ranks. Sum number of positive ranks. Sum number of positive ranks. Sum number of negative ranks. Sum number of negative ranks. Use smallest number as the test statistics. Use smallest number as the test statistics. Basic steps in the WSRT

Example (paired data) A new drug to lower blood pressure is given to 10 people. Diastolic blood pressure was measured before taking the medicine and one week later. A new drug to lower blood pressure is given to 10 people. Diastolic blood pressure was measured before taking the medicine and one week later. Null hypothesis: no change in blood pressure (median of differences is 0) Null hypothesis: no change in blood pressure (median of differences is 0)

PatientPrePostDiffRank w/o signs SignedRank Data

The Wilcoxon Rank Sum Test AKA Mann-Whitney “U” test or Rank Sum Test. AKA Mann-Whitney “U” test or Rank Sum Test. Compare the medians of two populations. Compare the medians of two populations. H 0 : The populations from which the two samples are taken have identical median values. H 0 : The populations from which the two samples are taken have identical median values. Based on ranking the observations. Based on ranking the observations. If H 0 is true, we expect approximately the same number of small observations in both samples (low rank) and the same number of high observations (high rank). If H 0 is true, we expect approximately the same number of small observations in both samples (low rank) and the same number of high observations (high rank).

Example Cancer of the oesophagus. Cancer of the oesophagus. Compare tobacco consumption between pts (N 1 =200 and controls(N 2 =778). Compare tobacco consumption between pts (N 1 =200 and controls(N 2 =778). H 0 : η 1 = η 2 H 0 : η 1 = η 2 H 1 : η 1 ≠ η 2

Data CaseTobacco ……

How do we know when to use non- parametric procedures? Use the t-test if sample size large. Use the t-test if sample size large. Otherwise: Otherwise: Examine whether data normally distributed. Examine whether data normally distributed. Use proc univariate Use proc univariate Descriptive statistics in Minitab Descriptive statistics in Minitab Three approaches: Three approaches: test for normality, test for normality, qqplot, qqplot, histogram. histogram.

Example: “tobacco” variable for cases proc univariate data=eso normal; var tobacco; qqplot tobacco / normal(mu=est sigma=est color=red L=1); run;

Contingency tables Used to record and analyse the relationship between two or more categorical variables. Used to record and analyse the relationship between two or more categorical variables. Has r rows and c columns (for a total of r×c cells). Has r rows and c columns (for a total of r×c cells). One application is in a test of independence between 2 variables. One application is in a test of independence between 2 variables.

× Example: 2 × 2 Table Wearing Helmet Head Injury YesNoTotal Yes No Total Is there an association between wearing a helmet and head injury? Is there an association between wearing a helmet and head injury?

Example: 3×4 Contingency Table Column headings Row headings Col 1 Col 2 Col 3 Col 4 Row marginal totals Row 1 n 11 n 12 n 13 n 14 n 11 + n 12 + n 13 + n 14 Row 2 n 21 etc. Row 3 n 31 n34n34n34n34etc. Column marginal totals n 11 + n 21 + n 31 etc.etc.etc. Grand total

The Chi-squared Test of Independence Test the null hypothesis that 2 variables are independent. Test the null hypothesis that 2 variables are independent. Use a measure of the discrepancy between the observed counts and the counts that are expected if the hypothesis of independence was correct. Use a measure of the discrepancy between the observed counts and the counts that are expected if the hypothesis of independence was correct. O – observed counts O – observed counts E – expected counts.

Independence hypothesis There is no relationship between the row category into which an individual falls and the column category into which that individual falls There is no relationship between the row category into which an individual falls and the column category into which that individual falls The row variable and the column variable are independent. The row variable and the column variable are independent. Knowing how someone is classified using the row (column) variable does not give any information about how they are classified using the column (row) variable. Knowing how someone is classified using the row (column) variable does not give any information about how they are classified using the column (row) variable. P(row r | column c) = P(row r), or, equivalently, P(column c | row r) = P(column c). P(row r | column c) = P(row r), or, equivalently, P(column c | row r) = P(column c).

The Chi-squared Test of Independence If the null hypothesis is true (variables re independent), the count we expect in any cell (E) is given by: If the null hypothesis is true (variables re independent), the count we expect in any cell (E) is given by: The test statistic: The test statistic:

The Chi-squared Test of Independence The test statistic has a chi-squared distribution with (r-1)(c-1) degrees of freedom. The test statistic has a chi-squared distribution with (r-1)(c-1) degrees of freedom. Chi-squared distribution is completely indexed by degrees of freedom. Chi-squared distribution is completely indexed by degrees of freedom. Non-symetric. Non-symetric. As d.f. increases, it resembles more and more the normal distribution. As d.f. increases, it resembles more and more the normal distribution.

Example Helmet and head injuries. Helmet and head injuries. Are they independent? Are they independent? H 0 : head injuries are independent of wearing a helmet. H 0 : head injuries are independent of wearing a helmet. Wearing Helmet Head Injury YesNoTotal Yes No Total

SAS First we input the data. First we input the data. data helmet; input helmet $ injury $ count; cards; yes yes 17 yes no 130 no yes 218 no no 428 ;run;

SAS proc freq data=helmet; tables helmet * injury tables helmet * injury / chisq; weight count; run;

Important note (small cell sizes) The chi-squared test of independence makes an assumption that the expected cell counts in all cells are > 5. The chi-squared test of independence makes an assumption that the expected cell counts in all cells are > 5. If this is not the case, for 2×2 tables, use Fisher’s exact test. If this is not the case, for 2×2 tables, use Fisher’s exact test. Based on computing the exact probabilities of observing a particular table. Based on computing the exact probabilities of observing a particular table.

Example - small cell sizes Cardio vascular death and diet. Cardio vascular death and diet. data cvd; input diet $ cod $ count; cards; high_sal cvd 5 high_sal cvd_non 2 low_sal cvd 30 low_sal cvd_non 23 ;run;

Example - small cell sizes proc freq data=cvd; tables cod * diet / chisq exact; tables cod * diet / chisq exact; weight count; weight count;run;

Paired Data outcome of treatment B patient outcome of treatment A patient surviveddiedtotal survived died total Comparing 2 treatments of cancer, patients are paired on sex, age and stage, and assigned to a Tx A or Tx B.

Paired Data For paired data (i.e. not independent) the observations are the pair, not the individual For paired data (i.e. not independent) the observations are the pair, not the individual Use McNemar's Test Use McNemar's Test Only discordant (off-diagonal cells) are used Only discordant (off-diagonal cells) are used H 0 : No association between survival and cancer Tx or the 2 discordant cells are approximately equal H 0 : No association between survival and cancer Tx or the 2 discordant cells are approximately equal

Example - Paired Data data cancer; input outcome_A $ outcome_B $ count; input outcome_A $ outcome_B $ count; cards; cards; survive survive 90 die survive 5 survive die 16 die die 510 ; proc freq; tables outcome_A * outcome_B / agree; tables outcome_A * outcome_B / agree; weight count; weight count;run;

How strong is the association? If we determine there is an association, we would like to know If we determine there is an association, we would like to know how strong the association is, how strong the association is, the direction of the association, the direction of the association, how to measure it. how to measure it.

Relative risk A measure of the effect of exposure. A measure of the effect of exposure. Compare the probability of disease among the exposed to the probability of disease among the non-exposed. Compare the probability of disease among the exposed to the probability of disease among the non-exposed. P(disease+ | exposure+) P(disease+ | exposure+) P(disease+ | exposure-) P(disease+ | exposure-)

Relative risk The relative risk of 2 means that an exposed person is 2 times as likely to have the disease as a non-exposed person. The relative risk of 2 means that an exposed person is 2 times as likely to have the disease as a non-exposed person. Exposure Disease Yes (+) No (-) Yes (+) 1040 No (-) 545

Odds Given that a person has the exposure, the odds of getting the disease is Given that a person has the exposure, the odds of getting the disease is Similarly, given no exposure, the odds of getting the disease is Similarly, given no exposure, the odds of getting the disease is

Odds-ratio Another measure of association of disease and exposure. Another measure of association of disease and exposure. It turns out that if the disease is rare It turns out that if the disease is rare OR  RR

How do we compute OR (RR)? n 1. = n 11 + n 12 n.2 = n 12 + n 22 n.. = n 11 + n 12 + n 21 + n 22 Exposure Disease YesNo Yes n 11 n 12 No n 21 n 22 Observed data (from a sample). Observed data (from a sample).

Estimate RR and OR from data The estimate of RR from the sample data is given by The estimate of RR from the sample data is given by The estimate of OR from the sample data is given by The estimate of OR from the sample data is given by

Cross-sectional studies Sample is taken from the population. Sample is taken from the population. Exposure and disease are measured. Exposure and disease are measured. Perinatal mortality and maternal smoking. Perinatal mortality and maternal smoking. Maternalsmoking Perinatal mortality Total YesNo Yes No Total

Prospective studies Sample of n 1  and n 2  of individuals with and without exposure are followed. Sample of n 1  and n 2  of individuals with and without exposure are followed. Screening and breast cancer. Screening and breast cancer. Screened Breast cancer death Total YesNo Yes No

Retrospective studies Sample of n  1 and n  2 of individuals with and without disease are examined for exposure. Sample of n  1 and n  2 of individuals with and without disease are examined for exposure. Association between herniated disks and motor vehicle jobs. Association between herniated disks and motor vehicle jobs. Motor vehicle job Herniated disc YesNo Yes81 No4726 Total5527

What can be estimated? Type of study Totals for Can one estimate the ColumnRowRR?OR? Cross-sectionalRandomRandomYesYes ProspectiveRandomFixedYesYes RetrospectiveFixedRandomNoYes

SAS Helmet and head injury. Helmet and head injury. proc freq; tables helmet * injury tables helmet * injury / relrisk; weight count; run;

SAS data helmet; input helmet $ injury $ count; cards; yes ayes 17 yes no 130 no ayes 218 no no 428 ;run;

Tests of Agreement Paired data Paired data Kappa statistic measures the level of agreement Kappa statistic measures the level of agreement H 0 : No association, Kappa=0 H 0 : No association, Kappa=0 Biased toward null as the number of categories increase Biased toward null as the number of categories increase Number of categories must be equal (table must be square) Number of categories must be equal (table must be square) Kappa = (p o - p e ) / (1 - p e ) where Kappa = (p o - p e ) / (1 - p e ) where p o = observed agreement and p o = observed agreement and p e = expected agreement p e = expected agreement

Example - kappa Dermatologist 2 Dermatologist 1 terriblepoormarginalcleartotal terrible poor marginal clear total Two dermatologists evaluating same pts. Two dermatologists evaluating same pts.

SAS data skin; data skin; input derm1 $ derm2 $ count; input derm1 $ derm2 $ count; datalines; datalines; terrible terrible 10 terrible terrible 10 terrible poor 4 terrible poor 4 terrible marginal 1 terrible marginal clear clear 13 clear clear 13 ; proc freq data=skin order=data; proc freq data=skin order=data; weight count; weight count; tables derm1*derm2 / agree; tables derm1*derm2 / agree; run; run;

Kappa statistic Measure of Agreement Guidelines: Measure of Agreement Guidelines: Excellent agreement, k > 0.75 Excellent agreement, k > 0.75 Good agreement 0.4 <= k <= 0.75 Good agreement 0.4 <= k <= 0.75 Marginal agreement, k < 0.4 Marginal agreement, k < 0.4