Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
Introduction to Nonparametric Statistics
Nonparametric Statistics Timothy C. Bates
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
EPI 809 / Spring 2008 Chapter 9 Nonparametric Statistics.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
statistics NONPARAMETRIC TEST
Nonparametric Inference
© 2003 Pearson Prentice Hall Statistics for Business and Economics Nonparametric Statistics Chapter 14.
Elementary hypothesis testing
Lesson #25 Nonparametric Tests for a Single Population.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 17: Nonparametric Tests & Course Summary.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
Basic Business Statistics (9th Edition)
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Non-parametric statistics
Chapter 15 Nonparametric Statistics
Nonparametric or Distribution-free Tests
Stats 244.3(02) Review. Summarizing Data Graphical Methods.
Nonparametric Inference
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
NONPARAMETRIC STATISTICS
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Statistical Methods II Session 8 Non Parametric Testing – The Wilcoxon Signed Rank Test.
Where are we?. What we have covered: - How to write a primary research paper.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Wilcoxon rank sum test (or the Mann-Whitney U test) In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum.
© 2000 Prentice-Hall, Inc. Statistics Nonparametric Statistics Chapter 14.
1/23 Ch10 Nonparametric Tests. 2/23 Outline Introduction The sign test Rank-sum tests Tests of randomness The Kolmogorov-Smirnov and Anderson- Darling.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
1 Nonparametric Statistical Techniques Chapter 17.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Statistical Inference Making decisions regarding the population base on a sample.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Statistical Inference Making decisions regarding the population base on a sample.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Statistical Inference Making decisions regarding the population base on a sample.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
1 Nonparametric Statistical Techniques Chapter 18.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Nonparametric Statistics
Comparing Populations
The Rank-Sum Test Section 15.2.
Nonparametric Statistics
Presentation transcript:

Nonparametric Statistical Methods

Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a non- parametric model Statistical techniques that assume a non- parametric model are called non-parametric.

For example If you assume that your data has come from a normal distribution with mean  and standard deviation  (both unknown) then the data is generated from process (model) that is known except for two of parameters.(  and  ) The model is called a parametric model. Models that do not assume normality (or some other distribution with a finite no. of paramters) are non-parametric

We will consider two nonparametric tests 1.The sign test 2.Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

Nonparametric Statistical Methods

Single sample nonparametric tests for central location 1.The sign test 2.Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

The probability of a type I error may be different than the desired value (0.05 or 0.01) Both the z-test and the t-test assumes the data is coming from a normal population If the data is not coming from a normal population, properties of the z-test and the t- test that require this assumption will no longer be true.

1.The sign test 2.Wilcoxon’s signed tank test These tests do not assume the data is coming from a normal population Single sample non parametric tests If the data is not coming from a normal population we should then use one of the two nonparametric tests

The sign test A nonparametric test for the central location of a distribution

We want to test: H 0 : median =  0 H A : median ≠  0 against (or against a one-sided alternative)

The Sign test: S = the number of observations that exceed  0 Comment: If H 0 : median =  0 is true we would expect 50% of the observations to be above  0, and 50% of the observations to be below  0, 1.The test statistic:

50% median =  0 If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size.

median If H 0 is not true then S will still have a binomial distribution. However p will not be equal to 00 p  0 > median p < 0.50

median 00 p  0 < median p > 0.50 p = the probability that an observation is greater than  0.

n = 10 Summarizing: If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size.

n = 10 The critical and acceptance region: Choose the critical region so that  is close to 0.05 or e. g. If critical region is {0,1,9,10} then  = =.0216

n = 10 e. g. If critical region is {0,1,2,8,9,10} then  = =.1094

Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

The data

Suppose we want to test H 0 : the drug is not effective median reduction ≤ 0 against H A : the drug is effective median reduction > 0 The Sign test S = the no. of positive obs

The Sign test The test statistic S = the no. of positive obs = 8 We will use the p-value approach p-value = P[S ≥ 8] = = Since p-value > 0.05 we cannot reject H 0

Summarizing: To carry out Sign Test We 1.Compute S = The # of observations greater than  0 2.Let s observed = the observed value of S. 3.Compute the p-value = P[S ≤ s observed ] (2 P[S ≤ s observed ] for a two-tailed test). Use the table for the binomial dist’n (p = ½, n = sample size) 4.Conclude H A (Reject H 0 ) if p-value is less than 0.05 (or 0.01).

Sign Test for Large Samples

If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and standard deviation

Hence for large n,use as the test statistic (in place of S) Choose the critical region for z from the Standard Normal distribution. i.e. Reject H 0 if z z  /2 two tailed ( a one tailed test can also be set up.

Nonparametric Confidence Intervals

Now arrange the data x 1, x 2, x 3, … x n in increasing order Assume that the data, x 1, x 2, x 3, … x n is a sample from an unknown distribution. Hence x (1) < x (2) < x (3) < … < x (n) x (1) = the smallest observation x (2) = the 2 nd smallest observation x (n) = the largest observation

Consider the k th smallest observation and the k th largest observation in the data x 1, x 2, x 3, … x n Hence x (k) and x (n – k + 1) P[x (k) < median < x (n – k + 1) ] = P[at least k observations lie below the median and at least k observations lie above the median ] If at least k observations lie below the median than x (k) < median If at least k observations lie above the median than median < x (n – k + 1)

Thus P[x (k) < median < x (n – k + 1) ] = P[at least k observations lie below the median and at least k observations lie above the median ] = P[The number of observations below the median is at least k and at most n-k] = P[k ≤ S ≤ n-k] S has a binomial distribution with n = the sample size and p =1/2. where S = the number of observations below the median

Hence P[x (k) < median < x (n – k + 1) ] = p(k) + p(k + 1) + … + p(n-k) = P = P[k ≤ S ≤ n-k] where p(i)’s are binomial probabilities with n = the sample size and p =1/2. This means that x (k) to x (n – k + 1) is a P100% confidence interval for the median

Summarizing where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’s are binomial probabilities with n = the sample size and p =1/2. x (k) to x (n – k + 1) is a P100% confidence interval for the median

n = 10 and k =2 Example: P = p(2) + p(3) + p(4) + p(5) + p(6) + p(7) + p(8)=.9784 Binomial probabilities Hence x (2) to x (9) is a 97.84% confidence interval for the median

Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

The data

The data arranged in order x (2) = -3 to x (9) =15 is a 97.84% confidence interval for the median

Example In the previous example to repeat the study with n = 20 patients with high cholesterol.

The data

The binomial distribution with n = 20, p = 0.5 Note: p(6) + p(7) + p(8) + p(9) + p(10) + p(11) + p(12) + p(13) + p(14) = = Hence x (6) to x (15) is a 95.86% confidence interval for the median reduction in cholesterol

The data arranged in order x (6) = -1 to x (15) = 9 is a 95.86% confidence interval for the median

For large values of n one can use the normal approximation to the Binomial to find the value of k so that x (k) to x (n – k + 1) is a 95% confidence interval for the median. i.e. we want to find k so that

Next we will consider: 1.The Wilcoxon signed rank test The Wilcoxon signed rank test is an alternative to the Sign test, a test for the central location of a single population

The sign test A nonparametric test for the central location of a distribution

We want to test: H 0 : median =  0 H A : median ≠  0 against (or against a one-sided alternative)

The Sign test: S = the number of observations that exceed  0 Comment: If H 0 : median =  0 is true then The distribution of S is binomial -n = sample size, -p = The test statistic:

To carry out the The Sign test: S = the number of observations that exceed  0 = s observed p-value = P [S ≥ s observed ] ( = 2 P [S ≥ s observed ] for 2-tailed test) where S is binomial, n = sample size, p = Compute the test statistic: 2.Compute the p-value of test statistic, s observed : 3.Reject H 0 if p-value low (< 0.05)

Non-parametric confidence intervals for the median of a population P = p(k) + p(k + 1) + … + p(n-k) and p(i)’s are binomial probabilities with n = the sample size and p =1/2. x (k) to x (n – k + 1) is a (1 –  )100% = P100% confidence interval for the median where x (k) = k th smallest x i and x (n – k + 1) = k th largest x i

The Wilcoxon Signed Rank Test An Alternative to the sign test

Situation A sample of size n, (x 1, x 2, …, x n ) from an unknown distribution and we want to test H 0 : the centre of the distribution,  =  0, against H A :  ≠  0,

For the sign test we would count S, the number of positive values of (x 1 –  0, x 2 –  0, …, x n –  0 ). We would reject H 0 if S was not close to n/2

For Wicoxon’s signed-Rank test we would assign ranks to the absolute values of (x 1 –  0, x 2 –  0, …, x n –  0 ). A rank of 1 to the value of x i –  0 which is smallest in absolute value. A rank of n to the value of x i –  0 which is largest in absolute value. W + = the sum of the ranks associated with positive values of x i –  0. W - = the sum of the ranks associated with negative values of x i –  0.

Note: W + + W - = …n = n(n + 1)/2 If H 0 is true then W +  W -  n(n + 1)/4 If H 0 is not true then either 1.W + will be small (W - large) or 2.W + will be large (W - small)

True median 00 W + smallW - large  0 > True median

True median 00 W + largeW - small  0 < True median

Note: It is possible to work out the sampling distribution of W + ( and W - ) when H 0 is true. Note: We use the fact that if H 0 is true that there is an equal probability (1/2) that the sign attached to any rank is plus (+) or minus (-).

Example: n = 4. ranks W+W+ W-W- Prob / / /16

The distribution of W + and W - : n = 4. W+W+ Prob 01/ / / /16 W-W- Prob 01/ / / /16

If T = W + or W - : n = 4. TP[T = t]P[T ≤ t] 01/ / / / / / / / / / / These are the values found in the table A.6 in the textbook

table A.6 Page A- 15 Sample Size T Distribution of the test statistic for Wilcoxon signed-rank test

table A.6 Page A- 15 Only goes up to n = sample size = 12 For sample sizes, n > 12 we can use the fact that T (W + or W - ) has approximately a normal distribution with

Exact ValuesNormal Approximation

Example In this example we are interested in the quantity FVC (Forced Vital Capacity) in patients with cystic fibrosis FVC (Forced Vital Capacity) = the volume of air that a person can expel from the lungs in a 6 sec period. This will be reduced with time for cystic fibrosis patients The research question: Will this reduction be less when a new experimental drug is administered?

The Experimental Design The design will be a matched pair design Pairs of patients are matched (Using initial FVC readings) One member of the pair is given the new drug the other member is given a placebo We measure the reduction in FVC for each member and compute the difference –x i = Reduction in FVC (placebo) – Reduction in FVC (drug) These values will be generally positive if the drug is effective in minimizing the deterioration in Forced Vital Capacity (FVC). W + will be large (W - will be small)

Table: Reduction in forced vital capacity (FVC) for a matched pair sample of patients with cystic fibrosis Subject Reduction in FVC x i DifferenceRankSigned Rank PlaceboDrug W + = 86W - = 19

We have to judge if W + = 86 is large (or W - = 19 is small) Since the p-value is small (< 0.05) we conclude the drug is effective in reducing the deterioration of FVC

Summarizing: To carry out Wilcoxon’s signed rank test We 1.Compute T = W + or W - (usually it would be the smaller of the two) 2.Let t observed = the observed value of T. 3.Compute the p-value = P[T ≤ t observed ] (2 P[T ≤ t observed ] for a two-tailed test). i.For n ≤ 12 use the table. ii.For n > 12 use the Normal approximation. 4.Conclude H A (Reject H 0 ) if p-value is less than 0.05 (or 0.01).

Alternative tests for this example 1.The t – test 2.The sign test

1.The t – test i.This test requires the assumption of normality. ii.If the data is not normally distributed the test is invalid The probability of a type I error may not be equal to its desired value (0.05 or 0.01) iii.If the data is normally distributed, the t-test commits type II errors with a smaller probability than any other test (In particular Wilcoxon’s signed rank test or the sign test) 2.The sign test i.This test does not require the assumption of normality (true also for Wilcoxon’s signed rank test). ii.This test ignores the magnitude of the observations completely. Wilcoxon’s test takes the magnitude into account by ranking them Comments

Nonparametric Statistical Methods

Single sample nonparametric tests for central location 1.The sign test 2.Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

The Sign test

Summarizing: To carry out Sign Test We 1.Compute S = The # of observations greater than  0 2.Let s observed = the observed value of S. 3.Compute the p-value = P[S ≤ s observed ] (2 P[S ≤ s observed ] for a two-tailed test). Use the table for the binomial dist’n (p = ½, n = sample size) 4.Conclude H A (Reject H 0 ) if p-value is less than 0.05 (or 0.01).

True median  0 S ≈ n/2  0 = True median

True median 00 S small (close to 0)  0 > True median

True median 00  0 < True median S large (close to n)

Wilcoxon’s signed-Rank test

For Wilcoxon’s signed-Rank test we would assign ranks to the absolute values of (x 1 –  0, x 2 –  0, …, x n –  0 ). A rank of 1 to the value of x i –  0 which is smallest in absolute value. A rank of n to the value of x i –  0 which is largest in absolute value. W + = the sum of the ranks associated with positive values of x i –  0. W - = the sum of the ranks associated with negative values of x i –  0.

Note: W + + W - = …n = n(n + 1)/2 If H 0 is true then W +  W -  n(n + 1)/4 If H 0 is not true then either 1.W + will be small (W - large) or 2.W + will be large (W - small)

True median  0 W + ≈ W - ≈ n(n + 1)/4  0 = True median

True median 00 W + smallW - large  0 > True median

True median 00 W + smallW - large  0 > True median

True median 00 W + largeW - small  0 < True median

Summarizing: To carry out Wilcoxon’s signed rank test We 1.Compute T = W + or W - (usually it would be the smaller of the two) 2.Let t observed = the observed value of T. 3.Compute the p-value = P[T ≤ t observed ] (2 P[T ≤ t observed ] for a two-tailed test). i.For n ≤ 12 use the table. ii.For n > 12 use the Normal approximation. 4.Conclude H A (Reject H 0 ) if p-value is less than 0.05 (or 0.01).

Two-sample – Non-parametic tests

Mann-Whitney Test A non-parametric two sample test for comparison of central location

The Mann-Whitney Test This is a non parametric alternative to the two sample t test (or z test) for independent samples. These tests (t and z) assume the data is normal The Mann- Whitney test does not make this assumption. Sample of n from population 1 x 1, x 2, x 3, …, x n Sample of m from population 2 y 1, y 2, y 3, …, y m

The Mann-Whitney test statistic U 1 counts the number of times an observation in sample 1 precedes an observation in sample 2. An Equivalent statistic U 2 that counts the number of times an observation in sample 2 precedes an observation in sample 1 can also be computed

Example n = m = 4 measurements of bacteria counts per unit volume were made for two type of cultures. The n = 4 measurements for culture 1 were 27, 31, 26, 25 The m = 4 measurements for culture 2 were 32, 29, 35, 28

To compute the Mann-Whitney test statistics U 1 and U 2, arrange the observations from the two samples combined in increasing order (retaining sample membership). 25 (1), 26 (1), 27 (1), 28 (2), 29 (2), 31 (1), 32 (2), 35 (2) For each observation in sample 2 let u i demote the number of observations in sample 1 that precede that value. u 1 = 3, u 2 = 3, u 3 = 4, u 4 = 4, Then U 1 = u 1 + u 2 + u 3 + u 4 = =14

To compute U 2, repeat the process for the second sample 25 (1), 26 (1), 27 (1), 28 (2), 29 (2), 31 (1), 32 (2), 35 (2) For each observation in sample 1 let v i demote the number of observations in sample 2 that precede that value. v 1 = 0, v 2 = 0, v 3 = 0, v 4 = 2, Then U 2 = v 1 + v 2 + v 3 + v 4 = =2

Note: U 1 + U 2 = mn = 16. This is true in general For each pair (x i,y j ) either x i y j (Assume no ties) In one case U 1 will be increased by 1 while in the other case U 2 will be increased by 1. There are mn such pairs.

An Alternative way of o computing the Mann-Whitney test statistic U Arrange the observations from the two samples combined in increasing order (retaining sample membership) and assign ranks to the observations. 25 (1) 26 (1) 27 (1) 28 (2) 29 (2) 31 (1) 32 (2) 35 (2) Let W 1 = the sum of the ranks for sample 1. = = 12 Let W 2 = the sum of the ranks for sample 2. = = 24

It can be shown that and Note:

The distribution function of U (U 1 or U 2 ) has been tabled for various values of n and m (<n) when the two observations are coming from the same distribution. These tables can be used to set up critical regions for the Mann-Whitney U test.

Example A researcher was interested in comparing “brightness” of paper prepared by two different processes A measure of brightness in paper was made for n = m = 9 samples drawn randomly from each of the two processes. The data is presented below:

W A = 94 Ranks averaged because observations are tied W B = 77

It can be shown that and From table for n = m = 9 Hence we will reject H 0 if either U A or U B ≤ 18. Thus H 0 is accepted

The Mann-Whitney test for large samples For large samples (n > 10 and m >10) the statistics U 1 and U 2 have approximately a Normal distribution with mean and standard deviation

Thus we can convert U i to a standard normal statistic And reject H 0 if z z  /2 (for a two tailed test)

The Kruskal Wallis Test Comparing the central location for k populations An nonparametric alternative to the one-way ANOVA F-test

Situation: Data is collected from k populations. The sample size from population i is n i. The data from population i is:

The computation of The Kruskal-Wallis statistic We group the N = n 1 + n 2 + … + n k observation from k populations together and rank these observations from 1 to N. Let r ij be the rank associated with with the observation x ij. Handling of “tied” observations If a group of observations are equal the ranks that would have been assigned to those observations are averaged

The computation of The Kruskal-Wallis statistic Let Note: If the k populations do not differ in central location the

The Kruskal-Wallis statistic where = the sum of the ranks for the i th sample

The Kruskal-Wallis test Reject H 0 : the k populations have same central location

Example In this example we are measuring an enzyme level in three groups of patients who have received “open heart” surgery. The three groups of patients differ in age: 1.Age 30 – 45 2.Age 46 – 60 3.Age 61+

The data

Computation of the Kruskal-Wallis statistic The raw dataThe data ranked

The Kruskal-Wallis statistic since H 0 is rejected. There are significant differences in the central enzyme levels between the three age groups