Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015.

Similar presentations


Presentation on theme: "How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015."— Presentation transcript:

1 How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015

2 Objectives Non-parametric statistical procedures Non-parametric statistical procedures Basic concept Basic concept The Wilcoxon Signed Rank Test The Wilcoxon Signed Rank Test The Wilcoxon Rank Sum Test The Wilcoxon Rank Sum Test Contingency tables Contingency tables Definitions and use Definitions and use Measures of association Measures of association

3 Nonparametric Tests Distribution free tests Distribution free tests valid for most distributions (may not be the most powerful). valid for most distributions (may not be the most powerful). If appropriate, the parametric methods should be used, since they reject false null hypotheses with higher probability. If appropriate, the parametric methods should be used, since they reject false null hypotheses with higher probability.

4 Nonparametric Tests, cont’d If we know the mean and the standard deviation of a normal distribution we can calculate probabilities. If we know the mean and the standard deviation of a normal distribution we can calculate probabilities. Means and standard deviations are called Parameters. Means and standard deviations are called Parameters. Statistical tests that assume a distribution and use parameters are called parametric tests Statistical tests that assume a distribution and use parameters are called parametric tests Statistical tests that don't assume a distribution or use parameters are called nonparametric tests Statistical tests that don't assume a distribution or use parameters are called nonparametric tests

5 Use of Nonparametric Tests things not normally distributed things not normally distributed Using a t-test could be inappropriate and misleading. Using a t-test could be inappropriate and misleading. Nonparametric tests have fewer assumptions Nonparametric tests have fewer assumptions data only needs to be rankable (smallest to largest). data only needs to be rankable (smallest to largest). Examples: Examples: Pain: mild, moderate, severe Pain: mild, moderate, severe

6 Example UntrainedTrained 4563 3855 4859 5165 5177 The maximum oxygen intake adjusted for body weight (ml/kg/min).

7 Example Observation38454851515559636577 Ranks1234.54.5678910 Observations and ranks. If there is no difference between the two groups, the ranks should be approximately same.

8 The Wilcoxon Signed Rank Test Assumptions: Assumptions: The scale of observations is at least ordinal within each pair of observations. The scale of observations is at least ordinal within each pair of observations. Which observation within each pair is “larger” can be determined (they can be “equal”). Which observation within each pair is “larger” can be determined (they can be “equal”). Symmetry. Symmetry.

9 Calculate differences Calculate differences Rank differences from smallest to largest Rank differences from smallest to largest Add signs of differences to ranks. Add signs of differences to ranks. Sum number of positive ranks. Sum number of positive ranks. Sum number of negative ranks. Sum number of negative ranks. Use smallest number as the test statistics. Use smallest number as the test statistics. Basic steps in the WSRT

10 Example (paired data) A new drug to lower blood pressure is given to 10 people. Diastolic blood pressure was measured before taking the medicine and one week later. A new drug to lower blood pressure is given to 10 people. Diastolic blood pressure was measured before taking the medicine and one week later. Null hypothesis: no change in blood pressure (median of differences is 0) Null hypothesis: no change in blood pressure (median of differences is 0)

11 PatientPrePostDiffRank w/o signs SignedRank 1908644.54.5 2868244.54.5 38588-33-3 49085577 581821 68782577 7918569.59.5 88075577 9868069.59.5 109290222Data

12

13 The Wilcoxon Rank Sum Test AKA Mann-Whitney “U” test or Rank Sum Test. AKA Mann-Whitney “U” test or Rank Sum Test. Compare the medians of two populations. Compare the medians of two populations. H 0 : The populations from which the two samples are taken have identical median values. H 0 : The populations from which the two samples are taken have identical median values. Based on ranking the observations. Based on ranking the observations. If H 0 is true, we expect approximately the same number of small observations in both samples (low rank) and the same number of high observations (high rank). If H 0 is true, we expect approximately the same number of small observations in both samples (low rank) and the same number of high observations (high rank).

14 Example Cancer of the oesophagus. Cancer of the oesophagus. Compare tobacco consumption between pts (N 1 =200 and controls(N 2 =778). Compare tobacco consumption between pts (N 1 =200 and controls(N 2 =778). H 0 : η 1 = η 2 H 0 : η 1 = η 2 H 1 : η 1 ≠ η 2

15 Data CaseTobacco 017.5 08 04 …… 113 119.5 112.5

16

17 How do we know when to use non- parametric procedures? Use the t-test if sample size large. Use the t-test if sample size large. Otherwise: Otherwise: Examine whether data normally distributed. Examine whether data normally distributed. Use proc univariate Use proc univariate Descriptive statistics in Minitab Descriptive statistics in Minitab Three approaches: Three approaches: test for normality, test for normality, qqplot, qqplot, histogram. histogram.

18 Example: “tobacco” variable for cases proc univariate data=eso normal; var tobacco; qqplot tobacco / normal(mu=est sigma=est color=red L=1); run;

19

20

21 Contingency tables Used to record and analyse the relationship between two or more categorical variables. Used to record and analyse the relationship between two or more categorical variables. Has r rows and c columns (for a total of r×c cells). Has r rows and c columns (for a total of r×c cells). One application is in a test of independence between 2 variables. One application is in a test of independence between 2 variables.

22 × Example: 2 × 2 Table Wearing Helmet Head Injury YesNoTotal Yes17218235 No130428558 Total147646793 Is there an association between wearing a helmet and head injury? Is there an association between wearing a helmet and head injury?

23 Example: 3×4 Contingency Table Column headings Row headings Col 1 Col 2 Col 3 Col 4 Row marginal totals Row 1 n 11 n 12 n 13 n 14 n 11 + n 12 + n 13 + n 14 Row 2 n 21 etc. Row 3 n 31 n34n34n34n34etc. Column marginal totals n 11 + n 21 + n 31 etc.etc.etc. Grand total

24 The Chi-squared Test of Independence Test the null hypothesis that 2 variables are independent. Test the null hypothesis that 2 variables are independent. Use a measure of the discrepancy between the observed counts and the counts that are expected if the hypothesis of independence was correct. Use a measure of the discrepancy between the observed counts and the counts that are expected if the hypothesis of independence was correct. O – observed counts O – observed counts E – expected counts.

25 Independence hypothesis There is no relationship between the row category into which an individual falls and the column category into which that individual falls There is no relationship between the row category into which an individual falls and the column category into which that individual falls The row variable and the column variable are independent. The row variable and the column variable are independent. Knowing how someone is classified using the row (column) variable does not give any information about how they are classified using the column (row) variable. Knowing how someone is classified using the row (column) variable does not give any information about how they are classified using the column (row) variable. P(row r | column c) = P(row r), or, equivalently, P(column c | row r) = P(column c). P(row r | column c) = P(row r), or, equivalently, P(column c | row r) = P(column c).

26 The Chi-squared Test of Independence If the null hypothesis is true (variables re independent), the count we expect in any cell (E) is given by: If the null hypothesis is true (variables re independent), the count we expect in any cell (E) is given by: The test statistic: The test statistic:

27 The Chi-squared Test of Independence The test statistic has a chi-squared distribution with (r-1)(c-1) degrees of freedom. The test statistic has a chi-squared distribution with (r-1)(c-1) degrees of freedom. Chi-squared distribution is completely indexed by degrees of freedom. Chi-squared distribution is completely indexed by degrees of freedom. Non-symetric. Non-symetric. As d.f. increases, it resembles more and more the normal distribution. As d.f. increases, it resembles more and more the normal distribution.

28

29 Example Helmet and head injuries. Helmet and head injuries. Are they independent? Are they independent? H 0 : head injuries are independent of wearing a helmet. H 0 : head injuries are independent of wearing a helmet. Wearing Helmet Head Injury YesNoTotal Yes17218235 No130428558 Total147646793

30 SAS First we input the data. First we input the data. data helmet; input helmet $ injury $ count; cards; yes yes 17 yes no 130 no yes 218 no no 428 ;run;

31 SAS proc freq data=helmet; tables helmet * injury tables helmet * injury / chisq; weight count; run;

32

33 Important note (small cell sizes) The chi-squared test of independence makes an assumption that the expected cell counts in all cells are > 5. The chi-squared test of independence makes an assumption that the expected cell counts in all cells are > 5. If this is not the case, for 2×2 tables, use Fisher’s exact test. If this is not the case, for 2×2 tables, use Fisher’s exact test. Based on computing the exact probabilities of observing a particular table. Based on computing the exact probabilities of observing a particular table.

34 Example - small cell sizes Cardio vascular death and diet. Cardio vascular death and diet. data cvd; input diet $ cod $ count; cards; high_sal cvd 5 high_sal cvd_non 2 low_sal cvd 30 low_sal cvd_non 23 ;run;

35

36 Example - small cell sizes proc freq data=cvd; tables cod * diet / chisq exact; tables cod * diet / chisq exact; weight count; weight count;run;

37

38 Paired Data outcome of treatment B patient outcome of treatment A patient surviveddiedtotal survived9016106 died5510515 total95526621 Comparing 2 treatments of cancer, patients are paired on sex, age and stage, and assigned to a Tx A or Tx B.

39 Paired Data For paired data (i.e. not independent) the observations are the pair, not the individual For paired data (i.e. not independent) the observations are the pair, not the individual Use McNemar's Test Use McNemar's Test Only discordant (off-diagonal cells) are used Only discordant (off-diagonal cells) are used H 0 : No association between survival and cancer Tx or the 2 discordant cells are approximately equal H 0 : No association between survival and cancer Tx or the 2 discordant cells are approximately equal

40 Example - Paired Data data cancer; input outcome_A $ outcome_B $ count; input outcome_A $ outcome_B $ count; cards; cards; survive survive 90 die survive 5 survive die 16 die die 510 ; proc freq; tables outcome_A * outcome_B / agree; tables outcome_A * outcome_B / agree; weight count; weight count;run;

41

42 How strong is the association? If we determine there is an association, we would like to know If we determine there is an association, we would like to know how strong the association is, how strong the association is, the direction of the association, the direction of the association, how to measure it. how to measure it.

43 Relative risk A measure of the effect of exposure. A measure of the effect of exposure. Compare the probability of disease among the exposed to the probability of disease among the non-exposed. Compare the probability of disease among the exposed to the probability of disease among the non-exposed. P(disease+ | exposure+) P(disease+ | exposure+) P(disease+ | exposure-) P(disease+ | exposure-)

44 Relative risk The relative risk of 2 means that an exposed person is 2 times as likely to have the disease as a non-exposed person. The relative risk of 2 means that an exposed person is 2 times as likely to have the disease as a non-exposed person. Exposure Disease Yes (+) No (-) Yes (+) 1040 No (-) 545

45 Odds Given that a person has the exposure, the odds of getting the disease is Given that a person has the exposure, the odds of getting the disease is Similarly, given no exposure, the odds of getting the disease is Similarly, given no exposure, the odds of getting the disease is

46 Odds-ratio Another measure of association of disease and exposure. Another measure of association of disease and exposure. It turns out that if the disease is rare It turns out that if the disease is rare OR  RR

47 How do we compute OR (RR)? n 1. = n 11 + n 12 n.2 = n 12 + n 22 n.. = n 11 + n 12 + n 21 + n 22 Exposure Disease YesNo Yes n 11 n 12 No n 21 n 22 Observed data (from a sample). Observed data (from a sample).

48 Estimate RR and OR from data The estimate of RR from the sample data is given by The estimate of RR from the sample data is given by The estimate of OR from the sample data is given by The estimate of OR from the sample data is given by

49 Cross-sectional studies Sample is taken from the population. Sample is taken from the population. Exposure and disease are measured. Exposure and disease are measured. Perinatal mortality and maternal smoking. Perinatal mortality and maternal smoking. Maternalsmoking Perinatal mortality Total YesNo Yes6192044321062 No6342668227316 Total12534712548378

50 Prospective studies Sample of n 1  and n 2  of individuals with and without exposure are followed. Sample of n 1  and n 2  of individuals with and without exposure are followed. Screening and breast cancer. Screening and breast cancer. Screened Breast cancer death Total YesNo Yes403096031000 No633093731000

51 Retrospective studies Sample of n  1 and n  2 of individuals with and without disease are examined for exposure. Sample of n  1 and n  2 of individuals with and without disease are examined for exposure. Association between herniated disks and motor vehicle jobs. Association between herniated disks and motor vehicle jobs. Motor vehicle job Herniated disc YesNo Yes81 No4726 Total5527

52 What can be estimated? Type of study Totals for Can one estimate the ColumnRowRR?OR? Cross-sectionalRandomRandomYesYes ProspectiveRandomFixedYesYes RetrospectiveFixedRandomNoYes

53 SAS Helmet and head injury. Helmet and head injury. proc freq; tables helmet * injury tables helmet * injury / relrisk; weight count; run;

54 SAS data helmet; input helmet $ injury $ count; cards; yes ayes 17 yes no 130 no ayes 218 no no 428 ;run;

55

56 Tests of Agreement Paired data Paired data Kappa statistic measures the level of agreement Kappa statistic measures the level of agreement H 0 : No association, Kappa=0 H 0 : No association, Kappa=0 Biased toward null as the number of categories increase Biased toward null as the number of categories increase Number of categories must be equal (table must be square) Number of categories must be equal (table must be square) Kappa = (p o - p e ) / (1 - p e ) where Kappa = (p o - p e ) / (1 - p e ) where p o = observed agreement and p o = observed agreement and p e = expected agreement p e = expected agreement

57 Example - kappa Dermatologist 2 Dermatologist 1 terriblepoormarginalcleartotal terrible1041015 poor51012229 marginal2412523 clear0261321 total1720312088 Two dermatologists evaluating same pts. Two dermatologists evaluating same pts.

58 SAS data skin; data skin; input derm1 $ derm2 $ count; input derm1 $ derm2 $ count; datalines; datalines; terrible terrible 10 terrible terrible 10 terrible poor 4 terrible poor 4 terrible marginal 1 terrible marginal 1........ clear clear 13 clear clear 13 ; proc freq data=skin order=data; proc freq data=skin order=data; weight count; weight count; tables derm1*derm2 / agree; tables derm1*derm2 / agree; run; run;

59

60 Kappa statistic Measure of Agreement Guidelines: Measure of Agreement Guidelines: Excellent agreement, k > 0.75 Excellent agreement, k > 0.75 Good agreement 0.4 <= k <= 0.75 Good agreement 0.4 <= k <= 0.75 Marginal agreement, k < 0.4 Marginal agreement, k < 0.4


Download ppt "How to Carry Out Nonparametric Tests and Construct Contingency Tables 21 January 2015."

Similar presentations


Ads by Google