Nonparametric test based on ranks 1.  Concept of parametric and non-parametric testing  Wilcoxon’s signed rank test for paired design  Wilcoxon’s rank.

Slides:



Advertisements
Similar presentations
COMPLETE BUSINESS STATISTICS
Advertisements

Nonparametric tests I Back to basics. Lecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
Introduction to Nonparametric Statistics
Nonparametric Statistics Timothy C. Bates
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
statistics NONPARAMETRIC TEST
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Nonparametric tests and ANOVAs: What you need to know.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Analysis of variance (2) Lecture 10. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.
Chapter 2 Simple Comparative Experiments
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Non-parametric statistics
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Nonparametric or Distribution-free Tests
Review I volunteer in my son’s 2nd grade class on library day. Each kid gets to check out one book. Here are the types of books they picked this week:
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
NONPARAMETRIC STATISTICS
Independent samples- Wilcoxon rank sum test. Example The main outcome measure in MS is the expanded disability status scale (EDSS) The main outcome measure.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
CHAPTER 14: Nonparametric Methods
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
Common Nonparametric Statistical Techniques in Behavioral Sciences Chi Zhang, Ph.D. University of Miami June, 2005.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Nonparametric Statistical Methods: Overview and Examples ETM 568 ISE 468 Spring 2015 Dr. Joan Burtner.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.
1/23 Ch10 Nonparametric Tests. 2/23 Outline Introduction The sign test Rank-sum tests Tests of randomness The Kolmogorov-Smirnov and Anderson- Darling.
Nonparametric Statistical Methods: Overview and Examples IDM 404 ISE 482 Spring 2010 Dr. Joan Burtner.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Fall 2002Biostat Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference.
Ordinally Scale Variables
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Nonparametric Methods and Chi-Square Tests Session 5.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
NON-PARAMETRIC STATISTICS
PCB 3043L - General Ecology Data Analysis.
Nonparametric Statistics
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Value Stream Management for Lean Healthcare ISE 491 Fall 2009 Data Analysis - Lecture 7.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/6.
Non-Parametric Tests.
Non – Parametric Test Dr. Anshul Singh Thapa.
Presentation transcript:

Nonparametric test based on ranks 1

 Concept of parametric and non-parametric testing  Wilcoxon’s signed rank test for paired design  Wilcoxon’s rank sum test for two independent samples  Kruskal-Wallis’ H test for multiple independent samples and pairwise comparison Outline 2

Review of Parametric Test Example: independent sample t test  Assumptions: independent observations normal distribution equal variances  Objective the population means (parameter of the specified distribution ) are equal or not under such assumptions. 3

Key features:  Assume particular distribution  Make inference for the parameter of the specified distribution Therefore, they are called parametric ( specific distribution based ) tests Review of Parametric Test 4

Non-parametric tests (distribution-free tests) When conditions of parametric testing were not met. For instance:  Variables do not follow normal distribution;  The distribution belongs to some type we even do not know yet ( this distribution can not be specified with a finite number of parameters) Frequency Incidence number 5

Scenarios suitable for non-parametric tests a. Distribution unknown (condition of parametric methods not met) b. Ordinal data :data have a ranking but no clear numerical interpretation c. Non-precise data( i. e: >80); d. A quick and brief analysis ( for pilot study ). Non-parametric tests 6

Key features:  Distribution free--- free of particular distribution specification, not free of any assumption (independent, symmetric)  Not making inference for the parameters (goodness of fit testing…) Non-parametric tests 7

Fewer assumptions, wider applicability However, When the assumptions of parametric tests hold, the power of non-parametric test (if it is used) will be slightly lower. In other words, a larger sample size may be required to draw a significant conclusion under the same test level. So, the appropriateness of assumption is essential (statistic diagnostic) Justification of Parametric test and Non-parametric tests 8

1 Wilcoxon’s signed rank test  Used for  two related samples or repeated measurements on a single sample  alternative to the paired Student's t-test when the paired difference cannot be assumed to be normally distributed.  Named for Irish-born US statistician Frank Wilcoxon (1892– 1965) who, in a single paper in 1945, proposed both it and the rank-sum test for two independent samples. It was popularised by Siegel (1956) in his influential text book on non-parametric statistics 9

 Example 1 In order to study the difference of The Fluoride 氟化 物 concentration between the methods of electrode 电极法 and spectrophotometry 分光光度法, the concentrations of 11 paired industrial sewage were measured. The results are listed in Table 1 (According to normality test, the paired difference is not normally distributed) 1 Wilcoxon’s signed rank sum test 10

T + =43.5; T - =

Steps: (1) Hypotheses: H 0 : The median of the difference is 0 H 1 : The median of the difference is not 0 α=0.05. (2)Difference Ranking absolute differences (omit zero, mean rank for ties 持平值 ), and give back the signs Rank sum as statistic T : any one from positive sum or negative sum (3) P-value and conclusion Under null hypotheses, T will be not far from the mean rank of n(n+1)/4 (0 vs n(n+1)/2, an extreme ), From C 9, T is within the critical range(8-47), P>0.05, H 0 is not rejected. Conclusion: we can’t infer the concentrations from two methods are different. 12

Total rank sum is always n(n+1)/2 The middle point is the mean rank ( 8+47)/2= n(n+1)/4=

When sample size is beyond the critical value table, a Z test could be used  If there is no tie  If there are ties 1 Wilcoxon’s signed ranksum test 14

2 Wilcoxon’s rank sum test for two samples  Used for  independent samples when data is not normally distributed;  it is not sure whether the variable follows a normal distribution.  Named for  Frank Wilcoxon in 1945: equal sample sizes  Henry Berthold Mann( ), Austrian-born US mathematician and statistician; Donald Ransom Whitney in 1947: arbitrary sample sizes  Also called the Mann–Whitney U test or Mann–Whitney– Wilcoxon (MWW) test. 15

 Example 2 In order to study the difference of the lethal effect of two drugs, two groups of snails 钉螺 were separately killed by two drugs and the mortality rate of two groups was measured. (Not normally distributed) The results are listed in Table 2 2 Wilcoxon’s rank sum test for two samples 16

17

Steps (1)Hypotheses: H 0 : The distributions of two populations are the same H 1 : The two distributions are not the same α = 0.05 (2) Ranking all the observations in two samples. same way for ties Rank sum for smaller sample, T=T 1 = 71.5 (3) P-value and conclusion (C10 ) T 0.02,7,7 =34~71, T is outside the critical range, so we got P<0.02, H 0 should be rejected. Conclusion: There exists statistically significant difference between the two mortality rates. 18

19

When sample size is beyond the range of critical value table, a Z test could be used  If there is no tie  If there are ties 2 Wilcoxon’s rank sum test for two samples 20

3 Kruskal-Wallis’ H test for comparing more than 2 samples  Used for  testing equality of population medians among groups. Identical to one-way ANOVA with the data replaced by their ranks. Not assume a normal population, but does assume an identically-shaped and scaled distribution for each group, except for any difference in medians.  Named for  American mathematician and statistician William Henry Kruskal (1919–2005 )  American economist and statistician Wilson Allen Wallis ( ) in a 1952 paper. An extension of the Mann–Whitney U test to 3 or more groups. 21

3.1 Kruskal-Wallis’ H test for comparing more than 2 samples  Example chronic pharyngitis 慢性咽炎 patients were grouped into 3 categories, according to the treatments they received. A: treatment A; B: treatment B; C: treatment C; The efficacy are listed in Table 3. 22

23 R1=83182, R2=18070, R3=13229 R=mean rank * total

(1)Hypothesis: H 0 : The distributions of three populations are all the same H 1 : The distributions of three populations are not all the same α = 0.05 (2) Ranking all the observations in three samples (Same way for ties) Rank sums for each sample R 1 =83182, R 2 =18070, R 3 =

(3) Statistic H  If there is no tie  If there are ties t j : Number of individuals in j-th tie Example 3: 25

(4) P-value and conclusion Compare with critical value of H (C 11) Compare with critical value of H (C 11) ——group number =<3, and n of each group =<5 ——group number =<3, and n of each group =<5 Or k: Number of groups in example 3: from table C 8, Hc=51.41 >, we got P, we got P<0.05 Conclusion: efficacy is not all at an equal level. 26

3.2. multiple comparison of mean ranks  Used for When the comparison among three groups results in significant difference, multiple comparison is needed to know which pairs are different. t tests for pair-wise comparison could be used 27

(1)Hypothesis: H 0 : this pair of two population distributions have the same locations H 1 : this pair of two population distributions have different locations, α=0.05. (2) Calculate t value: Similarly , t A,C =0.7071, t B,C =

29

(3) P-value and conclusion, efficacy in group A has a different level from that of the other two groups. Since, The patients in group A may have better efficacy. Conclusion: treatment A may lead to a better efficacy. 30

The End summary 1.Concepts of parametric and non- parametric testing methods 2.Merits and limitations of nonparametric methods 3.Some nonparametric methods based on ranks 31

Nonparametric tests I Back to basics

Lecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to use

MTB > dotplot 'Male' 'Female'; SUBC> same.. : :: :..:::.. :..:: :....:..... : MALE..:. : : :..: ::::::.::.:. ::.: :. : FEMALE

MTB > dotplot 'Male' 'Female'; SUBC> same.. : :: :..:::.. :..:: :....:..... : MALE..:. : : :..: ::::::.::.:. ::.: :. : FEMALE MTB > desc 'Male' 'Female’ Variable N Mean Median TrMean StDev SEMean MALE FEMALE Variable Min Max Q1 Q3 MALE FEMALE

Lecture Outline What is a nonparametric test? –What is a parameter? –What are examples of non-parametric tests? Rank tests, distribution free tests and nonparametric tests Which type of test to use

Parameters are central to inference in GLM and ANOVA and represent assumptions about the underlying processes

LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'

LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Fitted value =  + Group 1  1 2  2 3-  1 -  2 Error has Normal Distribution with zero mean and standard deviation 

LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Fitted value =  + Group 1  1 2  2 3-  1 -  2 Error has Normal Distribution with zero mean and standard deviation 

Parameters are central to inference in GLM and ANOVA but represent assumptions about the underlying processes

Parameters are central to inference in GLM and ANOVA but represent assumptions about the underlying processes can be done without in some simple situations

Parameters are central to inference in GLM and ANOVA but represent assumptions about the underlying processes can be done without in some simple situations – BUT HOW?

RnkWtSex

RnkWtSex Remember ties

Mean Rank

The ‘Male’ mean rank = The ‘Female’ mean rank = Mean Rank

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median =

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200)

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W =

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26

The ‘Male’ mean rank = The ‘Female’ mean rank = Mean Rank

The ‘Male’ mean rank = The ‘Female’ mean rank = Mean Rank

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at The test is significant at (adjusted for ties)

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at The test is significant at (adjusted for ties) Cannot reject at alpha = 0.05

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at The test is significant at (adjusted for ties) Cannot reject at alpha = 0.05

MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = FEMALE N = 50 Median = Point estimate for ETA1-ETA2 is Percent CI for ETA1-ETA2 is ( ,0.1200) W = Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at The test is significant at (adjusted for ties) Cannot reject at alpha = 0.05 The null hypothesis is better expressed as “the distributions of male and female weights are the same”.

Parameters are central to inference in GLM and ANOVA but represent assumptions about the underlying processes can be done without in some simple situations

Nonparametric vs Parametric

Sign TestOne-sample t-test

Nonparametric vs Parametric Sign Test Mann-Whitney Test One-sample t-test Two-sample t-test

Nonparametric vs Parametric Sign Test Mann-Whitney Test Spearman Rank Test One-sample t-test Two-sample t-test Correlation/Regression

Nonparametric vs Parametric Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA

Nonparametric vs Parametric Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test Friedman Test One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA One-way blocked ANOVA

Lecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to use

A rose by any other name.. Non-parametric tests lack parameters Rank tests start by ranking the data Distribution-free tests don’t assume a Normal distribution (or any other) These are mainly but not completely overlapping sets of tests (and some are scale-invariant too).

Lecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to use

Fewer assumptions but... still some assumptions (including independence) limited range of situations –no more than 2 x-variables –can’t mix continuous and categorical x-variables provide p-values but estimation is dodgy loss of efficiency if parametric assumptions are upheld there is a grand scheme for parametric statistics (GLM) but a lot of separate strange names for nonparametrics

When is there a choice? when there is a non-parametric test –fewer than two or three variables altogether and prediction is not required

How to choose: If the assumptions of parametric test are upheld, use it – on grounds of efficiency If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical) If assumptions not fixable, use nonparametric test

MTB > dotplot 'LogM' 'LogF'; SUBC> same ::: :... :::.. :..::.:....: : :. : LogM.:. :... : ::.:: : :. ::.::. ::.:. :. : LogF

MTB > dotplot 'LogM' 'LogF'; SUBC> same ::: :... :::.. :..::.:....: : :. : LogM.:. :... : ::.:: : :. ::.::. ::.:. :. : LogF MTB > desc 'LogM' 'LogF' Variable N Mean Median TrMean StDev SEMean LogM LogF Variable Min Max Q1 Q3 LogM LogF

Lecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to use

Last remarks Nonparametric tests are an opportunity to revise the basic ideas of statistical inference They are sometimes useful in biology They are often used in biology

PROC NPAR1WAY WILCOXON ; VAR c; CLASS g; FREQ n; RUN;