Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:

Slides:



Advertisements
Similar presentations
Chapter 16 Introduction to Nonparametric Statistics
Advertisements

Nonparametric Statistics Timothy C. Bates
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
Nonparametric Inference
Nonparametric tests and ANOVAs: What you need to know.
Independent Sample T-test Formula
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 T-tests and their Nonparametric Analogs.
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Student’s t statistic Use Test for equality of two means
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Chapter 15 Nonparametric Statistics
Statistical Methods II
Biostat 200 Lecture 8 1. Hypothesis testing recap Hypothesis testing – Choose a null hypothesis, one-sided or two sided test – Set , significance level,
Hypothesis testing – mean differences between populations
Nonparametric Inference
Education 793 Class Notes T-tests 29 October 2003.
More About Significance Tests
NONPARAMETRIC STATISTICS
The paired t-test, non-parametric tests, and ANOVA July 13, 2004.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Where are we?. What we have covered: - How to write a primary research paper.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 T-tests and their Nonparametric Analogs.
ANOVA (Analysis of Variance) by Aziza Munir
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Fall 2002Biostat Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference.
Ordinally Scale Variables
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Biostat 200 Lecture 8 1. The test statistics follow a theoretical distribution (t stat follows the t distribution, F statistic follows the F distribution,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Biostat 200 Lecture 8 1. Where are we Types of variables Descriptive statistics and graphs Probability Confidence intervals for means and proportions.
Biostat 200 Lecture 7 1. Outline for today Hypothesis tests so far – One mean, one proportion, 2 means, 2 proportions Comparison of means of multiple.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
NON-PARAMETRIC STATISTICS
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
13 Nonparametric Methods Introduction So far the underlying probability distribution functions (pdf) are assumed to be known, such as SND, t-distribution,
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Hypothesis testing. Chi-square test
Hypothesis testing. Chi-square test
Presentation transcript:

Biostat 200 Lecture 7 1

Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test: Null hypothesis µ 1 =µ 2 Independent samples t-test: Null hypothesis µ 1 =µ 2 – Assume equal or unequal variances Proportion test for two independent samples: Null hypothesis p 1 = p 2 2

Comparison of several means: analysis of variance Why is it called analysis of variance? – The test compares the between-group variability How different are the groups from the overall mean to the within- group variability Why can’t we just do t-tests on the pairs of groups? – Multiple comparisons problem – P(do not reject H 0 | H 0 ) on one test = 1-  – P(do not reject H 0 | H 0 ) on n tests = (1-  ) n – P(reject H 0 | H 0 ) on at least one test = 1-(1-  ) n – If  =0.05 and n=4 then this is Pagano and Gavreau, Chapter 12 3

Comparison of several means: analysis of variance We calculate the ratio of: – The between group variability The variability around the overall (or grand) mean – to the overall within group variability Pagano and Gavreau, Chapter 12 4

Comparison of several means: analysis of variance We calculate the ratio of: – The between group variability The variability around the overall (or grand) mean – to the within group variability A weighted average of the variances within each group k= the number of groups being compared n 1, n 2, n k = the number of observations in each group Pagano and Gavreau, Chapter 12 5

Comparison of several means: analysis of variance The test statistic is We compare our F statistic to the F- distribution, with k-1 and n-k degrees of freedom – k=the number of means being compared – n=the total number of observations Pagano and Gavreau, Chapter 12 6

F-distribution Pagano and Gavreau, Chapter 12 7

F distribution 8

ANOVA example CD4 count by drinking category Pagano and Gavreau, Chapter 12 9

10

ANOVA example tabstat cd4count, by(last_alc_cat) s(n mean sd min median max) Summary for variables: cd4count by categories of: last_alc_cat (last time took alcohol- never, past, current) last_alc_cat | N mean sd min p50 max Abstainer | Past drinker | Current drinker | Total | Pagano and Gavreau, Chapter 12 11

ANOVA example CD4 count, by alcohol consumption category oneway var groupvar oneway cd4count last_alc_cat Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = Pagano and Gavreau, Chapter 12 12

Multiple comparisons If we reject H 0, we might want to know which means differed from each other But as noted before, if you test all combinations, you increase your chance of rejecting the null To be conservative, we reduce the level of , that is we will reject the p-value at a level smaller than the original  The Bonferoni methods divides  by the number of possible pairs of tests 13

Multiple comparisons Use a t-test, but use the within group variance that weights over all the groups (not just the 2 being examined) The test statistic for each pair of means is: and the degrees of freedom is n-k where n is the total number of observations and k is the total number of groups (another difference with the ttest for 2 means) Reject if the p-value is <  * There are lots of other methods of dealing with the multiple comparisons issue 14

Multiple comparisons. oneway cd4count last_alc_cat, bonferroni Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = Comparison of cd4count by last time took alcohol- never, past, current (Bonferroni) Row Mean-| Col Mean | Abstaine Past dri Past dri | | | Current | | Difference between the 2 means p-value for the difference 15

Nonparametric tests The hypothesis tests that use the z-statistic (i.e. when σ, the population standard deviation, is known) assume that the underlying distribution of the parameter we are estimating (sample mean, sample proportion) is approximately normal. This will be true under the CLT if n is large enough. However, we usually do not know σ, or if the data originally came from a normal distribution. If are data are very skewed, we need to be wary of this assumption. Nonparametric techniques make fewer assumptions about the underlying distributions – they only assume the populations being compared have the same basic shape but do not assume an underlying distribution The 3 step procedure is the same: hypothesis, test, reject or fail to reject We will discuss nonparametric tests that might be used instead of the “parametric” tests we previously discussed Pagano and Gavreau, Chapter 13 16

Test assumptions The hypothesis tests that use the z-statistic (i.e. when σ, the population standard deviation, is known) assume that the underlying distribution of the parameter we are estimating (sample mean, sample proportion) is approximately normal. This will be true under the CLT if n is large enough. However, we usually do not know σ, and we use s 2 and compare our test statistic to the t-distribution. In theory the underlying distribution of the data must be normal, but in practicality, if n is fairly large and there are no extreme outliers, the t-test is valid. If the data are not normally distributed, the t-test is not the most powerful test to use. – E.g. outliers will inflate the sample variance, decreasing the test statistic and the chances of rejecting the null. Independence of your observations is more critical than normality. The 3 step procedure for nonparametric testing is the same: hypothesis, test, reject or fail to reject Pagano and Gavreau, Chapter 13 17

Nonparametric tests for paired observations The Sign test – For paired or matched observations (analogous to the paired t-test) – H 0 : median 1 = median 2 – Most useful if the sample size is small or the distribution of differences is very skewed – The differences between the pairs are given a sign: + if a positive difference – if a negative difference nothing if the difference=0 – Count the number of +s, denoted by D Pagano and Gavreau, Chapter 13 18

Nonparametric tests for paired observations – Under H 0, ½ the differences will be +s and ½ will be –s – This is equivalent to saying that the each difference is a Bernoulli random variable, that is, each is+ or – with probability p=.5 – Then the total number of + s (D) is a binomial random variable with p=0.5 and with n trials – So then the p value for the hypothesis test is the probability of observing D + differences if the true distribution is binomial with parameters and p=0.5 – You could use the binomialtail function for a one-sided hypothesis di binomialtail(n,D,.5) Pagano and Gavreau, Chapter 13 19

Nonparametric tests for paired observations – Under H 0, ½ the differences will be +s and ½ will be –s – This is equivalent to saying that the each difference is a Bernoulli random variable, that is, each is+ or – with probability p=.5 – Then the total number of + s (D) is a binomial random variable with p=0.5 and with n trials. – Binomial mean = np ; Binomial SD =  np(1-p) – So mean =.5n and SD= .25n SD – Then D ~ N(.5n, .25n)Because D using the normal approximation, and z ~ N(0,1) where z is: Pagano and Gavreau, Chapter 13 20

Nonparametric tests for paired observations Pagano and Gavreau, Chapter 13 Obs. noWeight before (kg) Weight after (kg) DifferenceDifference >0? Signed rank of difference Total7 21

signtest var1=var2. signtest wt1=wt2 Sign test sign | observed expected positive | 3 5 negative | 7 5 zero | all | One-sided tests: Ho: median of wt1 - wt2 = 0 vs. Ha: median of wt1 - wt2 > 0 Pr(#positive >= 3) = Binomial(n = 10, x >= 3, p = 0.5) = Ho: median of wt1 - wt2 = 0 vs. Ha: median of wt1 - wt2 < 0 Pr(#negative >= 7) = Binomial(n = 10, x >= 7, p = 0.5) = Two-sided test: Ho: median of wt1 - wt2 = 0 vs. Ha: median of wt1 - wt2 != 0 Pr(#positive >= 7 or #negative >= 7) = min(1, 2*Binomial(n = 10, x >= 7, p = 0.5)) = Pagano and Gavreau, Chapter 13 Uses the larger of the number of positive or negative signed pairs 22

Nonparametric tests for paired observations The sign test does not account for the magnitude of the difference The Wilcoxon Signed-Rank Test does – Samples from paired populations – analogous to the paired t-test – The entire sample is ranked, and the sums of the ranks compared for the two groups – Ties are given an average rank – The smaller sum of the ranks follows a normal distribution with mean n*(n+1)/4 and standard deviation under the null assumption of no difference in medians For n<12, use the exact distribution, table A.6 Pagano and Gavreau, Chapter 13 23

signrank var1 = var2. signrank wt1=wt2 Wilcoxon signed-rank test sign | obs sum ranks expected positive | negative | zero | all | unadjusted variance adjustment for ties adjustment for zeros adjusted variance Ho: wt1 = wt2 z = Prob > |z| =

Nonparametric tests for independent samples The Wilcoxon Rank Sum Test – Also called the Mann-Whitney U test (slightly different calculation) – Null hypothesis is that the distributions of the two groups are the same – Samples from independent populations – analogous to the t- test – The entire sample is ranked, and the sums of the ranks compared for the two groups – An algorithm to deal with ties – The test statistic is compared to the normal distribution – If the sample sizes are small (<10), exact distributions are needed – Table A.7 – Can be extended to multiple groups (Kruskal-Wallis test) Pagano and Gavreau, Chapter 13 25

ranksum var, by(byvar). ranksum bmi, by(sex) Two-sample Wilcoxon rank-sum (Mann-Whitney) test sex | obs rank sum expected male | female | combined | unadjusted variance adjustment for ties adjusted variance Ho: bmi(sex==male) = bmi(sex==female) z = Prob > |z| =

Nonparametric tests for independent samples kwallis var, by(byvar). kwallis bmi, by(racegrp) Kruskal-Wallis equality-of-populations rank test | racegrp | Obs | Rank Sum | | | | White, Caucasian | 312 | | | Asian/PI | 158 | | | Other | 65 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability = Pagano and Gavreau, Chapter 13 27

Parametric vs. non-parametric (distribution free) tests Non parametric tests: – No normality requirement – Do require that the underlying distributions being compared have the same basic shape – Ranks are less sensitive to outliers – Can be used for ordinal data If the underlying distributions are approximately normal, then parametric tests are more powerful 28

Statistical hypothesis tests Data and comparison type Alternative hypothesesParametric test Stata command Non-parametric test Stata command Continuous; One mean H a : μ≠ μ a (two-sided) H a : μ>μ a (one-sided) Z or t-test ttest var1=hypoth val. Dichotomous; One proportion H a : p≠ p a (two-sided) H a : p>p a (one-sided) Proportion test prtest var1=hypoth value Continuous; Two means, paired data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 (one-sided) Paired t-test ttest var1=var2 Sign test (signtest var1=var2) or Wilcoxon Signed- Rank (signrank var1=var2) Continuous; Two means, independent data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 (one-sided) T-test (equal or unequal variance) ttest var1, by(byvar) unequal (as needed) Wilcoxon rank-sum test ranksum var1, by(byvar) Dichotomous; two proportions H a : p 1 ≠ p 2 (two-sided) H a : p 1 >p 2 (one-sided) Proportion test (z-test) prtest var1, by(byvar) Continuous, Two or more means, independent data H a : μ 1 ≠ μ 2 or μ 1 ≠ μ 3 or μ 2 ≠ μ 3 etc. ANOVA oneway var1 byvar Kruskal Wallis test kwallis var1, by(byvar) 29

For next time Read Pagano and Gauvreau – Pagano and Gavreau Chapters (review) – Pagano and Gavreau Chapter 15