Fall 2002Biostat 511275 Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference.

Slides:



Advertisements
Similar presentations
Chapter 16 Introduction to Nonparametric Statistics
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
statistics NONPARAMETRIC TEST
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Nonparametric Inference
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
EPI 809 / Spring 2008 Wilcoxon Signed Rank Test. EPI 809 / Spring 2008 Signed Rank Test Example You work in the finance department. Is the new financial.
Chapter 14 Analysis of Categorical Data
Lesson #25 Nonparametric Tests for a Single Population.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
1 Distribution-free testing If the data are normally distributed, we may apply a z- test or t-test when the parameter of interest is . But what if this.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
Basic Business Statistics (9th Edition)
Student’s t statistic Use Test for equality of two means
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
5-3 Inference on the Means of Two Populations, Variances Unknown
Nonparametric and Resampling Statistics. Wilcoxon Rank-Sum Test To compare two independent samples Null is that the two populations are identical The.
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Nonparametric Inference
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
14 Elements of Nonparametric Statistics
NONPARAMETRIC STATISTICS
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Independent samples- Wilcoxon rank sum test. Example The main outcome measure in MS is the expanded disability status scale (EDSS) The main outcome measure.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
1 Uses both direction (sign) and magnitude. Applies to the case of symmetric continuous distributions: Mean equals median. Wilcoxon Signed-Rank Test.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
NON-PARAMETRIC STATISTICS
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Nonparametric Statistics - Dependent Samples How do we test differences from matched pairs of measurement data? If the differences are normally distributed,
13 Nonparametric Methods Introduction So far the underlying probability distribution functions (pdf) are assumed to be known, such as SND, t-distribution,
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
1 Nonparametric Statistical Techniques Chapter 18.
Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric methods.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
3. The X and Y samples are independent of one another.
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Lecture Slides Elementary Statistics Twelfth Edition
Nonparametric Statistics
Presentation transcript:

Fall 2002Biostat Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference on the sign or rank of the data as opposed to the actual data values. When normality can be assumed, nonparametric tests are less efficient than the corresponding t-tests. Sign test (binomial test on +/-) Wilcoxon signed rank (paired t-test on ranks) Wilcoxon rank sum (unpaired t-test on ranks)

Fall 2002Biostat Nonparametric Tests In the tests we have discussed so far (for continuous data) we have assumed that either the measurements were normally distributed or the sample size was large so that we could apply the central limit theorem. What can be done when neither of these apply? Transform the data so that normality is achieved. Use another probability model for the measurements e.g. exponential, Weibull, gamma, etc. Use a nonparametric procedure Nonparametric methods generally make fewer assumptions about the probability model and are, therefore, applicable in a broader range of problems. BUT! No such thing as a free lunch...

Fall 2002Biostat Nonparametric Tests These data are REE (resting energy expenditure, kcal/day) for patients with cytic fibrosis and healthy individuals matched on age, sex, height and weight.

Fall 2002Biostat Nonparametric Tests What’s your conclusion?

Fall 2002Biostat Nonparametric Tests Let’s simplify by just looking at the direction of the difference...

Fall 2002Biostat Nonparametric Tests We want to test: Can we construct a test based only on the sign of the difference (no normality assumption)? If  d = 0 then we might expect half the differences to be positive and half the differences to be negative. In this example we find 10 positive differences out of 13. What’s the probability of that (or more extreme)?

Fall 2002Biostat Sign Test Looks like a hypothesis test (p-value method). What we really tested was that the median difference was zero. The probability model used to calculate the p-value was the binomial model with p = P[positive difference] = 0.5. Note that there is no normality assumption involved. So the hypothesis that the Sign Test addresses is: Ho : median difference = 0 Ha : median difference > (<,  ) 0 Q: If it is more generally applicable then why not always use it? A: It is less efficient than the t-test when the population is normal. Using a sign test is like using only 2/3 of the data (when the “true” probability distribution is normal)

Fall 2002Biostat Sign Test Sign Test Overview: 1.Testing for a single sample (or differences from paired data). 2.Assign + to all data points where X i >  o for H o :  =  o. 3.Hypothesis is in terms of , the median. 4.Let T= total number of +’s out of n observations. 5.Under H 0, T is binomial with n and p=1/2. 6.Get the p-value from binomial distribution or approximating normal, T/n ~ N(n/2,n/4) 7.This is a valid test of the median without assuming a probability model for the original measurements.

Fall 2002Biostat Sign Test “I tend to use the sign test as a quick test or a screening device. If the data are clearly statistically significant and the sign test will prove this, it is a marvelous device for hurriedly getting the client out of the office. He or she will be happy because the data have received an official stamp of statistical significance, and you will be happy because you can get back to your own research. It is also useful for rapidly scanning data to acquire a feeling as to whether the data might be statistically significant. If the sign statistic… is near to being significant, then a more refined analysis may be worthwhile. If, on the other hand, (the sign statistic) is nowhere close to being significant, it is very unlikely that a significant result can be produced by more elaborate means.” Ruppert Miller, Beyond ANOVA, Basics of Applied Statistics

Fall 2002Biostat Nonparametric Tests Q: Can we use some sense of the magnitude of the observations, without using the observations themselves? A: Yes! We can consider the rank of the observations

Fall 2002Biostat Nonparametric Tests A nonparametric test that uses the ranked data is the Wilcoxon Signed-Rank Test. 1.Rank the absolute value of the differences (from the null median). 2.Let R + equal the sum of ranks of the positive differences. 3.Then 4.Let 5. Use normal approximation to the distribution of T (i.e. compute p-value based on  (T)).

Fall 2002Biostat Wilcoxon Signed Rank Test Note: If any d i = 0 we drop them from the analysis (but assuming continuous data, so shouldn’t be many). For “large” samples (number of non-zero d i > 15), can use a normal approximation. If there are many “ties” then a correction to V(R+) must be made (see Rosner pg 560); computer does this automatically. Efficiency relative to t-test is about 95% if the true distribution is normal.

Fall 2002Biostat Wilcoxon Signed Rank Test Example: For the REE example we find R + = = 71 Under the null hypothesis, H o : median difference = 0, we have E(R+) = n(n+1)/4 = 13*14/4 = 45.5 V(R+) = n(n+1)(2n+1)/24 = 13*14*27/24 = The standardized statistic is For a one-sided  = 0.05 test, we compare T to. Since T > 1.65, we reject H o.

Fall 2002Biostat Nonparametric Tests 2 samples The same issues that motivated nonparametric procedures for the 1-sample case arise in the 2-sample case, namely, non-normality in small samples, and the influence of a few observations. Consider the following data, taken from Miller (1991): These data are immune function measurements obtained on healthy volunteers. One group consisted of 16 Epstein-Barr virus (EBV) seropositive donors. The other group consisted of 10 EBV seronegative donors. The measurements represent lymphocyte blastogenesis with p3HR-1 virus as the antigen (Nikoskelain et al (1978) J. Immunology, 121: ).

Fall 2002Biostat Nonparametric Tests 2 samples

Fall 2002Biostat Nonparametric Tests 2 samples Can we transform to normality?

Fall 2002Biostat Nonparametric Tests 2 samples Does the 2-sample t statistic depend heavily on the transformation selected? Does our interpretation depend on the transformation selected?

Fall 2002Biostat Nonparametric Tests Wilcoxon Rank-Sum Test Idea: If the distribution for group 1 is the same as the distribution for group 2 then pooling the data should result in the two samples “mixing” evenly. That is, we wouldn’t expect one group to have many large values or many small values in the pooled sample. Procedure: 1.Pool the two samples 2. Order and rank the pooled sample. 3. Sum the ranks for each sample. R 1 = rank sum for group 1 R 2 = rank sum for group 2 4.The average rank is (n 1 +n 2 +1)/2. 5.Under H o : same distribution, we expect R 1 to be

Fall 2002Biostat The variance of R 1 is (an adjustment is required in the case of ties; this is done automatically by most software packages.) 7.We can base a test on the approximate normality of This is known as the Wilcoxon Rank-Sum Test.

Fall 2002Biostat Wilcoxon Rank-Sum Test Order and rank the pooled sample...

Fall 2002Biostat Wilcoxon Rank-Sum Test Find the sum of the ranks for either group … R 1 = 273n 1 = 16 Under the null hypothesis, H o : median 1 = median 2, we have: Here the “correction” is for tied values. The correction is given as

Fall 2002Biostat The standardized statistic is Therefore, since 2*P(Z>3.01) < 0.01, we reject H 0 at  = 0.05 or  = 0.01 How does this compare to the t-tests?

Fall 2002Biostat Wilcoxon Rank-Sum Test Notes: 1.The Wilcoxon test is testing for a difference in location between the two distributions, not for a difference in spread. 2.Use of the normal approximation is valid if each group has > 10 observations. Otherwise, the exact sampling distribution of R 1 can be used. Tables and computer routines are available in this situation (Rosner table 13). 3.The Wilcoxon rank-sum test is also known as the Mann-Whitney Test. These are equivalent tests. The only difference is that the Mann-Whitney test subtracts the minimum rank sum from R 1 :

Fall 2002Biostat Nonparametric Tests Summary Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference on the sign or rank of the data as opposed to the actual data values. When normality can be assumed, nonparametric tests are less efficient than the corresponding t-tests.