Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Introduction to Nonparametric Statistics
Nonparametric Statistics Timothy C. Bates
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
statistics NONPARAMETRIC TEST
Chapter 14 Analysis of Categorical Data
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 15 Nonparametric Statistics
Nonparametric or Distribution-free Tests
Inferential Statistics
Review I volunteer in my son’s 2nd grade class on library day. Each kid gets to check out one book. Here are the types of books they picked this week:
AM Recitation 2/10/11.
Marketing Research, 2 nd Edition Alan T. Shao Copyright © 2002 by South-Western PPT-1 CHAPTER 17 BIVARIATE STATISTICS: NONPARAMETRIC TESTS.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Biostatistics, statistical software IV
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
NONPARAMETRIC STATISTICS
Independent samples- Wilcoxon rank sum test. Example The main outcome measure in MS is the expanded disability status scale (EDSS) The main outcome measure.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Biostatistics, statistical software V
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Nonparametric Statistical Methods: Overview and Examples ETM 568 ISE 468 Spring 2015 Dr. Joan Burtner.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
© 2000 Prentice-Hall, Inc. Statistics Nonparametric Statistics Chapter 14.
Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.
Ordinally Scale Variables
Stats 2022n Non-Parametric Approaches to Data Chp 15.5 & Appendix E.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Two Sample t test Chapter 9.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Nonparametric tests, rank-based tests, chi-square tests 1.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Angela Hebel Department of Natural Sciences
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
Nonparametric Statistics
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
NONPARAMETRIC STATISTICS In general, a statistical technique is categorized as NPS if it has at least one of the following characteristics: 1. The method.
Nonparametric tests: Tests without population parameters (means and standard deviations)
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
1 Nonparametric Statistical Techniques Chapter 18.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis testing. Chi-square test
Hypothesis testing. Chi-square test
Nonparametric Statistics
Presentation transcript:

Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation. Krisztina Boda PhD Department of Medical Informatics, University of Szeged

Krisztina Boda INTERREG 2 Parametric tests Parameter: a parameter is a number characterizing an aspect of a population (such as the mean of some variable for the population), or that characterizes a theoretical distribution shape. Usually, population parameters cannot be known exactly; in many cases we make assumptions about them.

Krisztina Boda INTERREG 3 Parameters of the normal distribution: ,  Parameter of the binomial distribution: n, p Parameter of the Poisson distribution:

Krisztina Boda INTERREG 4 Normal distributions N( ,  ) N(0,1) N(1,1) N(0,2) ,  : parameters (a parameter is a number that describes the distribution)

Krisztina Boda INTERREG 5 Binomial distributions 1. Each trial results in one of two possible, mutually exclusive outcome. (success, failure) 2. The probability of a success, p, remains constant from trial to trial 3. The trials are independent. We are interested in being able to compute the probability of k successes in n trials. The binomial distribution is useful for describing distributions of binomial events, such as the number of males and females in a random sample of companies, or the number of defective components in samples of 20 units taken from a production process. The binomial distribution is defined as: p is the probability that the respective event will occur q is equal to 1-p n is the maximum number of independent trials.

Krisztina Boda INTERREG 6 Example Suppose that it is known that 30% of a certain population are immune to some disease. If a random sample of size n=10 is selected from this population, what is the probability that it will contain exactly k=4 immune persons?

Krisztina Boda INTERREG 7

Krisztina Boda INTERREG 8 Poisson distribution The Poisson distribution is also sometimes referred to as the distribution of rare events. Examples of Poisson distributed variables are number of accidents per person, number of sweepstakes won per person, or the number of catastrophic defects found in a production process. If n tends to infinity, but at the same time np= is kept constant the binomial distribution approaches a fixed distribution

Krisztina Boda INTERREG 9 Example. In a certain disease the number of new occurrences in a month is 3 in average. Assuming that the number of new occurrences follows a Poisson distribution, what is the probability that  Nobody becomes ill (0.0498)  There are exactly 2 new occurrences (0.224)

Krisztina Boda INTERREG 10 Parametric tests The null hypothesis contains a parameter of a distribution. The assumptions of the tests are that the samples are drawn from a normally distributed population. One sample t-test: H0:  =c, Two sample t-test: H 0 :  1 =  2, assumptions:  1 =  2

Krisztina Boda INTERREG 11 Nonparametric tests We do not need to make specific assumptions about the distribution of data. They can be used when  The distribution is not normal  The shape of the distribution is not evident  Data are measured on an ordinal scale (low- normal-high, passed – acceptable – good – very good)

Krisztina Boda INTERREG 12 Ranking data Nonparametric tests can't use the estimations of population parameters. They use ranks instead. Instead of the original sample data we have to use its rank. To show the ranking procedure suppose we have the following sample of measurements: 199, 126, 81, 68, 112, 112. Sort the data in ascending order: 68, 81,112,112,126,199 Give ranks from 1 to n: 1, 2, 3, 4, 5, 6 Cases 5 and 6 are equal, they are assigned a rank of 3.5, the average rank of 3 and 4. We say that case 5 and 6 are tied. Ranks corrected for ties: 1, 2, 3.5, 3.5, 5, 6

Krisztina Boda INTERREG 13 Result of ranking data Case Data Rank Ranks corrected for ties The sum of all ranks must be Using this formula we can check our computations. Now the sum of ranks is 21, and 6(7)/2=21.

Krisztina Boda INTERREG 14 Nonparametric tests for paired data (nonparametric alternatives of paired t-test) Sign test Wilcoxon’s matched pairs test Null hypothesis: the paired samples are drawn from the same population

Krisztina Boda INTERREG 15 The sign test Example: 13 students were measured in reading speed and comprehension at a course ending and after 1 month. Suppose we have reason to believe that the two distributions of reading scores are not normal. Number of positive signs: 6 Number of negative signs: 5 Cases with no change are omitted Student Score Score Difference Sign at course after ending 1 month

Krisztina Boda INTERREG 16 Table of the sign test The table contains the acceptance region for given sample size and 

Krisztina Boda INTERREG 17 Decision based on table If the distributions of the two variables are the same (If the null hypothesis is true), the numbers of positive and negative differences should be similar. The null hypothesis is accepted if both numbers lie in the interval given it table for the sign test Number of positive signs: 6 Number of negative signs: 5 For n=11 and  =0.05, this interval is As both 5 and 6 lies in the interval 1-10, we accept the null hypothesis at 5% level.

Krisztina Boda INTERREG 18 The Wilcoxon signed rank test Example: 13 students were measured in reading speed and comprehension at a course ending and after 1 month. Suppose we have reason to believe that the two distributions of reading scores are not normal. Sum of ranks belonging to positive signs: R + =40.5 Sum of ranks belonging to negative signs: R - =25.5 Cases with no change are omitted Student Score Score Difference Rank at course after ignoring ending 1 month signs

Krisztina Boda INTERREG 19 Table of the Wilcoxon signed rank test The table contains the acceptance region for given sample size and 

Krisztina Boda INTERREG 20 Decision based on table If the distributions of the two variables are the same (If the null hypothesis is true), the sum of positive and negative ranks should be similar. The null hypothesis is accepted if both numbers lie in the interval given it table for the test Sum of ranks belonging to positive signs: R + =40.5 Sum of ranks belonging to negative signs: R - =25.5 For n=11 and  =0.05, this interval is As both rank sums are in this interval, we do not reject the null hypothesis and claim that the difference is not significant at 5% level.

Krisztina Boda INTERREG 21 The case of large samples When the sample size is large, we can count the mean and standard deviation of the ranks and use the normal distribution to get the p-value. Computer packages use this normal approximation also in case of small sample size

Krisztina Boda INTERREG 22 Nonparametric test for data in independent groups (nonparametric alternatives of two sample t-test) Mann-Whitney U test Null hypothesis: the samples are drawn from the same population

Krisztina Boda INTERREG 23 Hypothetical example The change of body weight are compared in two groups: patients having a special diet and control patients. Null hypothesis: the diet is not effective, data are drawn from the same population. The original data are ranked and the sum of ranks in each group is computed. If the null hypothesis is true, the sum of ranks in the two groups are similar.

Krisztina Boda INTERREG 24

Krisztina Boda INTERREG 25 Table of the Mann- Whitney U test

Krisztina Boda INTERREG 26 Decision based on table If the distributions of the two variables are the same (If the null hypothesis is true), the sum of ranks in the two groups should be similar. The test statistic T is the sum of the ranks in the smaller group. The null hypothesis is accepted T lies in the interval given it table for the test Sum of ranks in the first group (n=10): R 1 =140 Sum of ranks in the second group (n=11): R 2 =91 The test statistic T is the sum of the ranks in the smaller group. T=140. For n1=10 and n2=11 and  =0.05, this interval is As T lies outside of this interval, we reject the null hypothesis and claim that the difference is significant at 5% level.

Krisztina Boda INTERREG 27 An alternative test statistic The statistic U (due to Mann Whitney) is the number of all possible pairs of observations comprising one from each sample, say x i and y i, for which x i <y i. This if the sample sizes are n 1 and n 2, the U/n 1 n 2 is the proportion of all such pairs, and so is also the estimated probability that a new observation from the first population will be less than a new observation sampled from the second population.

Krisztina Boda INTERREG 28 The case of large samples When the sample size is large, T test statistic T has an approximately Normal distribution And we can calculate the test statistic z according to the following formula: (n s and n L are the sample sizes in the smaller and larger group respectively). Computer packages use this normal approximation also in case of small sample size

Krisztina Boda INTERREG 29 Comparing several independent groups: the Kruskal-Wallis test It is also called nonparametric one-way ANOVA It tests whether k independent samples that are defined by a grouping variable are from the same population. This test assumes that there is no a priori ordering of the k populations from which the samples are drawn. As a result, it gives one p-value. If the null hypothesis is rejected, further tests are required to make pairwise comparisons. These pairwise comparisons are generally not available in standard statistical packages. Pairwise comparisons can be performed by Mann Whitney U tests and p-values can be corrected by Bonferroni correction.

Krisztina Boda INTERREG 30 Comparison of several related samples: the Friedman test The Friedman test is the nonparametric equivalent of a one-sample repeated measures design or a two-way analysis of variance with one observation per cell. Friedman tests the null hypothesis that k related variables come from the same population. For each case, the k variables are ranked from 1 to k. The test statistic is based on these ranks. As a result, it gives one p-value. If the null hypothesis is rejected, further tests are required to make pairwise comparisons. These pairwise comparisons are generally not available in standard statistical packages. Pairwise comparisons can be performed by Wilxocon signed rank tests and p-values can be corrected by Bonferroni correction.

Krisztina Boda INTERREG 31 Review questions and exercises Problems to be solved by hand- calculations ..\Handouts\Problems hand VII.doc..\Handouts\Problems hand VII.doc Solutions ..\Handouts\Problems hand VII solutions.doc..\Handouts\Problems hand VII solutions.doc Problems to be solved using computer - none

Krisztina Boda INTERREG 32 Useful WEB pages     2/index.html 2/index.html