Statistical Data Analysis 2011/2012 M. de Gunst Lecture 7.

Slides:



Advertisements
Similar presentations
CHOOSING A STATISTICAL TEST © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Advertisements

Categorical and discrete data. Non-parametric tests.
Chapter 16 Introduction to Nonparametric Statistics
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
Dealing With Statistical Uncertainty
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
statistics NONPARAMETRIC TEST
Biomedical Presentation Name: 牟汝振 Teach Professor: 蔡章仁.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
K sample problems and non-parametric tests. Two-Sample T-test (unpaired)
Nonparametric Statistics Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Dealing With Statistical Uncertainty
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
1 Distribution-free testing If the data are normally distributed, we may apply a z- test or t-test when the parameter of interest is . But what if this.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
Chapter 11: Inference for Distributions
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Nonparametrics and goodness of fit Petter Mostad
Chapter 15 Nonparametric Statistics
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Statistical Methods II
Nonparametric or Distribution-free Tests
Review I volunteer in my son’s 2nd grade class on library day. Each kid gets to check out one book. Here are the types of books they picked this week:
Chapter 14: Nonparametric Statistics
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Independent samples- Wilcoxon rank sum test. Example The main outcome measure in MS is the expanded disability status scale (EDSS) The main outcome measure.
Nonparametrics and goodness-of-fit
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Nonparametric Statistical Methods: Overview and Examples ETM 568 ISE 468 Spring 2015 Dr. Joan Burtner.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.
1/23 Ch10 Nonparametric Tests. 2/23 Outline Introduction The sign test Rank-sum tests Tests of randomness The Kolmogorov-Smirnov and Anderson- Darling.
Nonparametric Hypothesis tests The approach to explore the small-sized sample and the unspecified population.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Nonparametrics.Zip (a compressed version of nonparamtrics) Tom Hettmansperger Department of Statistics, Penn State University References: 1.Higgins (2004)
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Angela Hebel Department of Natural Sciences
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Nonparametric Statistics
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 2.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 3.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 6.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Midterm. T/F (a) False—step function (b) False, F n (x)~Bin(n,F(x)) so Inverting and estimating the standard error we see that a factor of n -1/2 is missing.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 5.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Some Nonparametric Methods
Presentation transcript:

Statistical Data Analysis 2011/2012 M. de Gunst Lecture 7

Statistical Data Analysis 2 Statistical Data Analysis: Introduction Topics Summarizing data Exploring distributions Bootstrap Robust methods Nonparametric tests (continued) Analysis of categorical data Multiple linear regression

Statistical Data Analysis 3 Today’s topic: Nonparametric methods for two sample problems (Chapter 6: 6.3, 6.4 ) 6. Nonparametric methods (continued) 6.1. One sample: two nonparametric tests for location 6.2. Aymptotic efficiency 6.3. Two samples: nonparametric tests for equality of distributions Median test Wilcoxon two-sample test Kolmogorov-Smirnov two-sample test Permutation tests Asymptotic efficiency (read yourself) 6.4. Two samples: nonparametric tests for correlation Rank correlation test of Spearman Rank correlation test of Kendall Permutation tests

Statistical Data Analysis 4 Nonparametric methods: Introduction-recap Nonparametric tests No assumption of parametric family for underlying distribution of data For problems with large class of distributions belonging to H 0 Distribution of test statistic same under every distribution that belongs to H 0 Why these tests? Robust w.r.t. confidence level: conf level α for large class of distributions More efficient than tests with more assumptions when these assumptions do not hold: fewer observations necessary for same power (= onderscheidend vermogen)

Statistical Data Analysis Two-sample problem: equality of distributions (1) Data: Thromboglubine data of Raynaud patients without organ defects (x) and of patients with other auto-immune disease (z) > mean(x) = > mean(z) = > median(x) = > median(z) = 62.5 > sd(x) = > sd(z) = > length(x) = 32 > length(z) = 23 Is distribution of x in some way smaller than that of z? Example

Statistical Data Analysis 6 Two-sample problem: equality of distributions (2) (C ontinued) Is distribution of x same as that of z? How to investigate with plot? Better: Empirical qqplot of x and z (In)equality of distributions not clear (see also Chapter 3), so investigate further with test(s) Boxplots in one figure Example

Statistical Data Analysis 7 Two-sample problem: equality of distributions (3) Situation realizations of, independent, unknown distr. F realizations of, independent, unknown distr. G Are F and G the same? Which aspect? Location, spread, general shape, … Case 1. Paired observations, m = n Case 2. Unpaired observations and two independent groups of random variables

Statistical Data Analysis 8 Paired-samples: equality of distributions ~ F ~ G Case 1. Paired observations Main interest: difference in location of F and G Consider differences → one sample Now investigate location of distribution of with one sample test(s).

Statistical Data Analysis 9 Paired-samples: 3 one-sample tests ~ F ~ G F = G ? Case 1. Paired observations Test whether location of distribution of differences equals 0 with one sample test(s): i) normal: t-test ii) dependent, independent: sign test, Wilcoxon’s signed rank test iii) independent, independent: Wilcoxon’s signed rank test, because then under H 0 symmetry around 0 automatic

Statistical Data Analysis 10 Unpaired samples: equality of distributions ~ F ~ G Case 2. Unpaired observations and two independent groups of random variables m and n may be different

Statistical Data Analysis 11 Unpaired samples : t-test Two-sample t-test Assumptions: F normal, mean μ; G normal, mean ν ; equal variances Test statistic: ~ t m+n-2 If variances not equal: adjusted denominator Note: this is default in R: ?t.test t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,...)

Statistical Data Analysis Unpaired samples: median test Median test Assumptions: F, G continuous distributions Test statistic: ~ Hyp (m+n, m, p) nonparametric “half” of the total number of observations Suited (efficient) for shift alternatives: H 1 : G = F(-θ) Does not use much information of data (NB. This is not Mood’s median test)

Statistical Data Analysis Unpaired samples: Wilcoxon two-sample test (1) Wilcoxon two-sample or Wilcoxon rank sum test or Mann-Whitney test Assumptions: F, G continuous distribution functions Test statistic: ranks of in combined sample of size N = m+n nonparametric With ties or for large n, m normal approximation Especially suited (efficient) for shift alternatives: H 1 : G = F(-θ) Uses more information from data

Wilcoxon rank sum test Assumptions: F, G continuous distributions Test statistic: ranks of in combined sample of size N = m+n Statistical Data Analysis 14 Unpaired samples: Wilcoxon two-sample test (2) Equivalent test statistics used under same name: and switched roles first and second sample = sum of ranks of first sample = m(m+1) used by R

Statistical Data Analysis Unpaired samples: Kolmogorov-Smirnov test Kolmogorov-Smirnov two-sample test Assumptions: F, G have continuous distribution functions nonparametric Test statistic: ranks of in combined sample of size N = m+n Especially suited (efficient) for general alternatives: F and G need not have same shape

Statistical Data Analysis Unpaired samples: permutation tests Permutation tests for unpaired data Assumptions: F, G have continuous distribution functions Test statistic: that gives info about difference F and G e.g., Med(X) m – Med(Y) n, etc. Test conditionally on ordered combined sample : (right-sided) nonparametric

Statistical Data Analysis 17 Unpaired samples: illustration (1) Data: Thromboglubine data of Raynaud patients without organ defects (x) and of patients with other auto-immune disease (z) F distribution of data x; G distribution of data z H0: F = G H1: F stochastically smaller than G (what does this mean??) Note: For different tests this H1 becomes in R: t.test: difference in expectations is less than 0; > t.test(x,z,alternative="less") median test: difference in location less than 0; # compute yourself with 1-phyper(18, 32, 23, 56/2) # check where the numbers come from!! Mann-Whitney/Wilcoxon: difference in location less than 0; > wilcox.test,alternative="less") Kolmogorov-Smirnov: CDF of X above of that of Y; > ks.test(x,z,alternative="greater") Example

Statistical Data Analysis 18 Unpaired samples: illustration (2) (C ontinued) F distribution of data x; G distribution of data z H0: F = G H1: F stochastically smaller than G p-values: t.test: 0.12 median test: 0.11 Mann-Whitney/Wilcoxon: 0.20 (normal approximation was used –due to ties) Kolmogorov-Smirnov: 0.31 (R-warning) H0 not rejected for these tests. Note: we have performed all tests, but - t.test is not good candidate, because data not likely to be normal based on plots; - whether distributions have same shape, i.e. whether shift-alternative is good choice, not clear: shapes look similar, but sd’s are quite different. If it is, then median and Mann- Whitney tests are good in terms of power; - Kolmogorov Smirnov is good test also for general types of alternatives. There are ties here and R does not know how to adjust for this, so consider p-value as an approximation. Example

Statistical Data Analysis Paired samples: correlation ~ F ; ~ G Only for paired observations : are X i and Y i correlated? How to start investigation? Make scatter plot Measures of correlation? (Pearson’s) sample correlation Kendall’s rank correlation Spearman’s rank correlation Can all be used for testing

Statistical Data Analysis / Paired samples: tests for correlation Only for paired observations : for all i (Pearson’s) (linear) correlation test Assumptions: F normal, G normal Test statistic: ~ t n-2 Kendall’s rank correlation test, Spearman’s rank correlation test Assumptions: F, G continuous distribution functions Test statistic: and, resp. Both based on ranks: nonparametric R: cor.test(x,y, method= "pearson“/ "kendall“/ "spearman", …)

Statistical Data Analysis Paired samples: permutation test for correlation Only for paired observations : for all i Permutation tests Assumptions: F, G continuous distribution functions Test statistic: that gives info about dependence X i and Y i e.g. Kendall’s, Spearman’s Test conditionally on combined first and ordered second sample : (right-sided) Conditional, so different results from former tests with same statistics nonparametric

Statistical Data Analysis 22 Permutation tests (1) 1. Unpaired observations and 2. Paired observations, m = n Bootstrap if not computable exactly: generate large number B of randomly chosen permutations π, and approximate p-value by fraction: 1. Replace by 2. Replace by

Statistical Data Analysis 23 Permutation tests (2) 1. Unpaired observations and 2. Paired observations, m = n How to permute in both cases? Instead of permutation π of 1,…, m+n, and 1,…, n, resp., easier to permute the data: 1. Permute (X 1,..,X m, Y 1,…,Y n ) and make new division in two samples of m and n observations. 2. Permute (Y 1,..,Y n ), leave (X 1,..,X n ) as it is, and make new pairs.

Statistical Data Analysis 24 Permutation test: illustration for unpaired samples (1) Data: Thromboglubine data of Raynaud patients without organ defects (x) and of patients with other auto-immune disease (z) F distribution of data x; G distribution of data z H0: F = G H1: F stochastically smaller than G If interested in specific characteristics: permutation test Unpaired data: conditional test, given the sorted values of the combined sample: Choose T; suppose H0 rejected for small (large) values of T then B times, do: u generate random permutation of (x 1,..,x 32,,z 1,…,z 23 ) and make new division in two samples of 32 and 23 observations with R-function `sample’ u xperm = first 32 elements of permuted data u zperm = last 23 elements of permuted data u determine T(xperm, zperm) Count fraction of B values T(xperm, zperm) smaller (larger) than T(x,z) of thrombo data: this is p-value Example

Statistical Data Analysis 25 Permutation test: illustration for unpaired samples (2) (C ontinued) F distribution of data x; G distribution of data z H0: F = G H1: F stochastically smaller than G Results of permutation tests Permutation test for difference in mean: T=mean(X)-mean(Y) Left p-value: (bootstrap approximation) (several times: 0.107, 0.091, 0.114, …) Permutation test for difference in median: T=median(X)-median(Y) Left p-value: (bootstrap approximation) (several times: 0.163, 0.19, 0.18, …) Permutation test for Mann-Whitney: T=U-tilde Left p-value: (bootstrap approximation) (several times: 0.195, 0.214, 0.201, …) (around value for unconditional Mann-Whitney). Permutation test for difference in sd: T=sd(X)-sd(Y) Left p-value: (bootstrap approximation) (several times: 0.043, 0.053, 0.059, …) Example

Statistical Data Analysis 26 Recap 6. Nonparametric methods (continued) 6.3. Two samples: nonparametric tests for equality of distributions Median test Wilcoxon two-sample test Kolmogorov-Smirnov two-sample test Permutation tests Asymptotic efficiency (read yourself) 6.4. Two samples: nonparametric tests for correlation Rank correlation test of Spearman Rank correlation test of Kendall Permutation tests

Statistical Data Analysis 27 Nonparametric methods for one sample problems The end