Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!

Slides:



Advertisements
Similar presentations
Chapter 16 Introduction to Nonparametric Statistics
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
statistics NONPARAMETRIC TEST
Chapter 14 Analysis of Categorical Data
Lesson #25 Nonparametric Tests for a Single Population.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
Student’s t statistic Use Test for equality of two means
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Non-parametric statistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Nonparametric or Distribution-free Tests
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Chapter 14: Nonparametric Statistics
Hypothesis Testing. Testing your beliefs Ultimately in most research, the aim is to investigate whether the data supports a particular hypothesis, or.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
1 Chapter 15: Nonparametric Statistics Section 15.1 How Can We Compare Two Groups by Ranking?
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Non-Parametric Methods Professor of Epidemiology and Biostatistics
NONPARAMETRIC STATISTICS
Choosing and using statistics to test ecological hypotheses
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Parametric & Non-parametric Parametric Non-Parametric  A parameter to compare Mean, S.D.  Normal Distribution & Homogeneity  No parameter is compared.
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Confidence intervals. Estimation and uncertainty Theoretical distributions require input parameters. For example, the weight of male students in NUS follows.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
NON-PARAMETRIC STATISTICS
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
1 Nonparametric Statistical Techniques Chapter 18.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 15 Nonparametric Statistics Section 15.1 Compare Two Groups by Ranking.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Chapter 12 Chi-Square Tests and Nonparametric Tests
NONPARAMETRIC STATISTICS
Hypothesis testing. Chi-square test
Hypothesis testing. Chi-square test
Presentation transcript:

Non-parametric Tests

With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!

1.Data checking, identifying problems and characteristics 2.Understanding chance and uncertainty 3.How will the data for one attribute behave, in a theoretical framework? 4.Theoretical framework assumes complete information, need to address uncertainties in real data 5.Testing your beliefs, do the data support what you think is true? 6.What happens when the assumptions of the theoretical framework are not valid Data exploration and Statistical analysis

Data Data exploration, categorical / numerical outcomes Estimation of parameters, quantifying uncertainty Hypothesis testing Parametric tests (t-tests, ANOVA, test of proportions) Non-parametric tests (Wilcoxon, Kruskal- Wallis, rank test) Model each outcome with a theoretical distribution

Non-parametric tests Used when: - assumptions on the distributions of the data are clearly not valid; - a small fraction of the data are considered outliers (either tail), but are not removed as they are experimentally valid (related to first reason above actually) - occasionally when sample sizes are small and it becomes meaningless looking at histograms or even using the Shapiro- Wilk tests

Non-parametric equivalent For most parametric tests, there are non-parametric equivalents:

Conceptual difference of non-parametric tests Parametric tests Uses the actual observed values of the outcomes in calculating the test statistics. Non-parametric tests Converts the observed values to ranks, and uses these ranks for comparisons (to calculate the test statistics).

Test for one-sample One-sample t-test compares the mean of the sample to the hypothesized mean value under the null hypothesis. For the non-parametric test: One-sample Wilcoxon Signed Rank test It compares the median of the sample to the hypothesized median value under the null hypothesis. Null hypothesis: Median = some value, m Alternative hypothesis: Median  m (two-sided hypothesis) Median > m (one-sided hypothesis) Median < m (one-sided hypothesis)

Sign test Simple idea: Count the number of observations > m, out of a total number of observations N. If the null hypothesis is true, then on average, we expect about N / 2 observations to be greater than m, and about N / 2 observations to be less than m. So suppose x = number of observations > m. We can actually calculate the Binomial probability of having at least x observations out of N, when the probability that any observation will be > m is 50%. Let X ~ Binomial(N, 0.5) P(observation > m) = 0.5 P(at least x observations out of N > m) = Binomial probability of P(X  x) However, this ignores the magnitude of the data, or the distance of each value from m.

Wilcoxon Signed rank test 1.Calculate the difference between each value with m. 2.Note the sign of the difference, whether it is +ve or –ve difference. 3.Drop the sign, and rank the unsigned differences, from smallest (assigned a rank of 1) to largest (assigned a rank of N). 4.Restore the sign to the assigned ranks. 5.Sum up the positive ranks (W+), and also the negative ranks (W-, defined without the negative sign). 6.Conceptually, if the median of the data is genuinely closed to m, then W+ will be similar to W-. 7.If W+ >> W-, then this implies that there is evidence that the median of the data is > m. 8.If W- >> W+, then this implies that there is evidence that the median of the data is < m.

Test for two independent samples Mann-Whitney U test (a.k.a. Wilcoxon rank-sum test) Again comparing the sum of ranks from two collection of data. Null hypothesis: Distributions of both groups are the same* Alternative hypothesis: Distributions of both groups are different * Formally speaking: Probability of (a randomly chosen observation from one group is > than a randomly chosen observation from the second group) = 0.5.

Mann-Whitney U test Conceptually - Combine all the observations from the two groups into a single collection; - Assign ranks to these observations, from smallest as rank 1 to largest as rank N (where N represents the total number of observations when combining both groups) - Return the observations to the two groups they were originally from - Sum up the ranks of the observations in each of the two groups. - Conceptually expect the average ranks from both groups to be similar. The details of the test are actually more complex, and the student is strongly encouraged to find out more from the recommended textbooks (or online).

Test for paired samples Sign test Same as the sign test encountered during the 1-sample test. However, here we are comparing the differences between the paired observations (which eventually yields only one set of outcome, and thus a “1-sample” test) Null hypothesis: Probability that the difference is greater than 0 = 0.5 Alternative hypothesis: Probability that the difference is greater than 0  0.5 Probability that the difference is greater than 0 > 0.5 Probability that the difference is greater than 0 < 0.5 Caveat: As before, the sign test only looks at the sign of the difference, but not the magnitude of the difference. So there is actually additional information that can be used. Remember! Statistics is about understanding and minimizing uncertainty, while trying to maximizing information! (or to make use of as much data as possible)

Test for paired samples Wilcoxon signed rank test Again similar as the procedure introduced for 1-sample testing. But the focus here is on the differences between the paired observations. Null hypothesis: Median of differences = 0 Alternative hypothesis: Median of differences  0 Median of differences > 0 Median of differences < 0

Test for  2 independent samples Kruskal-Wallis test Compares the medians of all the groups to see whether they are equal. Null hypothesis: Medians of all the groups are identical Alternative hypothesis: At least one group has a different median Compare this with the hypotheses for ANOVA: Null hypothesis: Means of all the groups are identical Alternative hypothesis: At least one group has a different mean Test statistic Which follows a chi-square distribution with k – 1 degrees of freedom. R i = sum of ranks in group i n i = number of observations in group i

Kruskal-Wallis test As with ANOVA: - Tests a global hypothesis of no difference between any of the groups - Need to identify which groups are different in the event of a significant p-value (post-hoc tests of every possible pairwise comparisons with Mann-Whitney U test) - Post-hoc tests incur problem of multiple testing, standard Bonferroni correction required.

Non-parametric equivalent For most parametric tests, there are non-parametric equivalents:

Non-parametric versus parametric Question: If non-parametric tests are robust to issues pertaining to outliers, sample sizes and distributional assumptions, then why are they not the default tests to use? Answer: Parametric tests use the actual values for the comparisons, whereas non-parametric tests use only the ranks. This means the magnitude of the differences between the observations are not used, and a difference of 1 or a 100 may be reduced to just a difference in rank of 1. This actually reduces the power of the non-parametric test, relative to the parametric equivalent.

Non-parametric tests in SPSS

Consider the mathematics.xls dataset again. 1. It is traditionally believed that male students tend to outperform female students in mathematics. Based on the marks before the start of the trial, is there any evidence in support of this hypothesis. 2. Is there any evidence that consuming omega 3 improves the performance in the mathematics exam? 3. Is there any difference in the marks before the trial between the three schools? If there is, which school exhibited the best performance? 4. Is there any difference in the omega 3 consumption between male and female students? Let’s approach all these questions from the non-parametric perspective!

1.It is traditionally believed that male students tend to outperform female students in mathematics. Based on the marks before the start of the trial, is there any evidence in support of this hypothesis. Test of two-independent samples

Mann-Whitney U test H0: Distributions of both groups are the same H1: Mean ranks for females < mean ranks for males

2.Is there any evidence that consuming omega 3 improves the performance in the mathematics exam? Test of two-related samples

Sign test H0: Probability that the difference is greater than 0 = 0.5 H1: Probability that the difference is greater than 0 > 0.5

Wilcoxon Signed Rank Test H0: Median of differences = 0 H1: Median of differences > 0 Again to derive one tailed p-value, we need to half the p-value.

3.Is there any difference in the marks before the trial between the three schools? If there is, which school exhibited the best performance? Test of K-independent samples

Kruskal-Wallis test H0: Medians of all the groups are identical H1: At least one group has a different median Remember the need to perform separate 2-independent samples tests to identify the schools that are different – should the Kruskal Wallis test yields a significant result.

4.Is there any difference in the omega 3 consumption between male and female students? Test of 2-independent samples

Mann-Whitney U test Based on the outcome of this analysis, there is no evidence to suggest that there is a difference in omega 3 consumption between male and female students.

understand the difference between a parametric and non- parametric test know when a parametric test should be used and when a non- parametric test should be used instead know the relative advantages and disadvantages of a non- parametric test know which non-parametric test should be used under the specific scenario perform the appropriate analyses in SPSS and RExcel Students should be able to