Biostat 200 Lecture 7 1. Outline for today Hypothesis tests so far – One mean, one proportion, 2 means, 2 proportions Comparison of means of multiple.

Slides:



Advertisements
Similar presentations
Nonparametric Statistics Timothy C. Bates
Advertisements

PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
Nonparametric Inference
Nonparametric tests and ANOVAs: What you need to know.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 T-tests and their Nonparametric Analogs.
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Topic 2: Statistical Concepts and Market Returns
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Parametric & Nonparametric Models for Within-Groups Comparisons overview X 2 tests parametric & nonparametric stats Mann-Whitney U-test Kruskal-Wallis.
Student’s t statistic Use Test for equality of two means
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 15 Nonparametric Statistics
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Nonparametric or Distribution-free Tests
Biostat 200 Lecture 8 1. Hypothesis testing recap Hypothesis testing – Choose a null hypothesis, one-sided or two sided test – Set , significance level,
AM Recitation 2/10/11.
Hypothesis testing – mean differences between populations
Nonparametric Inference
Education 793 Class Notes T-tests 29 October 2003.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
NONPARAMETRIC STATISTICS
The paired t-test, non-parametric tests, and ANOVA July 13, 2004.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Where are we?. What we have covered: - How to write a primary research paper.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
ANOVA (Analysis of Variance) by Aziza Munir
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Fall 2002Biostat Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference.
Ordinally Scale Variables
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Biostat 200 Lecture 8 1. The test statistics follow a theoretical distribution (t stat follows the t distribution, F statistic follows the F distribution,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Biostat 200 Lecture 8 1. Where are we Types of variables Descriptive statistics and graphs Probability Confidence intervals for means and proportions.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
NON-PARAMETRIC STATISTICS
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
366_7. T-distribution T-test vs. Z-test Z assumes we know, or can calculate the standard error of the distribution of something in a population We never.
Analysis of variance Tron Anders Moger
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Hypothesis testing. Chi-square test
Hypothesis testing. Chi-square test
Presentation transcript:

Biostat 200 Lecture 7 1

Outline for today Hypothesis tests so far – One mean, one proportion, 2 means, 2 proportions Comparison of means of multiple independent samples (ANOVA) Non parametric tests – For paired data – For 2 independent samples – For multiple independent samples 2

Hypothesis tests so far Dichotomous data Test of one proportion: Null hypothesis p=p 0 (two-sided) Test statistic z = (p̂ - p 0 ) /  (p 0 (1- p 0 )/n) Proportion test for two independent samples Null hypothesis p 1 =p 2 (two-sided) Test statistic 3

Hypothesis tests so far Numerical data T-test of one mean: Null hypothesis: µ=µ 0 (two-sided) Test statistic t = ( X - µ 0 )/(s/√n) n-1 degrees of freedom Paired t-test Null hypothesis µ 1 =µ 2 (two-sided) Test statistic t = d̅ / (s d /  n) where s d =  (∑(d i -d̅) 2 /(n-1)) n-1 degrees of freedom (n pairs) 4

Hypothesis tests so far Numerical data Independent samples t-test Null hypothesis µ 1 =µ 2 (two-sided) Test statistic t = ( x̅ 1 - x̅ 2 ) / SE(diff between means) SE and degrees of freedom depend on assumption of equal or unequal variances 5

T-test: equal or unequal variance? Why can’t we just do a test to see if the variances in the groups are equal, to decide which t-test to use? – “It is generally unwise to decide whether to perform one statistical test on the basis of the outcome of another”. – The reason has to do with Type I error (multiple comparisons, discussed next slide) – You are better off always assuming unequal variance if your data are approximately normal 6 Ruxton GD. Behavioral Ecology 2006

Statistical hypothesis tests Data and comparison type Alternative hypothesesTest and Stata command Numerical; One meanH a : μ≠ μ a (two-sided) H a : μ>μ a or μ<μ a (one-sided) Z or t-test ttest var1=hypoth val.* Numerical; Two means, paired data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) Paired t-test ttest var1=var2* Numerical; Two means, independent data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) T-test (equal or unequal variance) ttest var1, by(byvar) unequal Numerical; Two or more means, independent data Dichotomous; One proportionH a : p≠ p a (two-sided) H a : p>p a or p<p a (one-sided) Proportion test prtest var1=hypoth value* bitest var1=hypoth value Dichotomous; two proportionsH a : p 1 ≠ p 2 (two-sided) H a : p 1 >p 2 (one-sided) Proportion test (z-test) prtest var1, by(byvar) Categorical by categorical (nxk) 7

Comparison of several means The extension of the t-test to several independent groups is called analysis of variance or ANOVA Why is it called analysis of variance? – Even though your hypothesis is about the means, the test actually compares the variability between groups to the variability within groups 8

Analysis of variance The null hypothesis is: H 0 : all equal means μ 1 =μ 2 =μ 3 =… The alternative H A is that at least one of the means differs from the others 9

Analysis of variance Why can’t we just do t-tests on the pairs of means? – Multiple comparison problem – What is the probability that you will incorrectly reject H 0 at least once when you run n independent tests, when the probability of incorrectly rejecting the null on each test is 0.05? 10

Analysis of variance This is P(X≥1) with p=0.05, n=number of tests X=the number of times the null is incorrectly rejected P(X≥1) = 1-P(X=0) = 1- (1-.05) n For n=4 di 1-(1-.05)^ Using the binomial di binomialtail(4,1,.05)

Comparison of several means: analysis of variance We calculate the ratio of: – The between group variability The variability of the sample means around the overall (or grand) mean – to the overall within group variability 12

Between group variability The between group variability is the variability around the overall (or grand) mean x ̅ k= the number of groups being compared n 1, n 2, n k = the number of observations in each group X 1, X 2, …, X k are the group means X = the grand mean – the mean of all the data combined 13

Within group variability The within group variability is a weighted average of the sample variances within each group k= the number of groups being compared n 1, n 2, n k = the number of observations in each group s 1 2, s 2 2, …, s k 2 are the sample variances in each group 14

Comparison of several means: analysis of variance The test statistic is We compare the F statistic to the F- distribution, with k-1 and n-k degrees of freedom – k=the number of groups being compared – n=the total number of observations 15

F-distribution 16

ANOVA example Does CD4 count at time of testing differ by drinking category? 17 *Using vct_baseline_biostat200_v1.dta ** hist cd4count, by(lastalc_3) percent fcolor(blue)

18 graph box cd4count, over(lastalc_3)

ANOVA example tabstat cd4count, by(lastalc_3) s(n mean sd min median max) Summary for variables: cd4count by categories of: lastalc_3 (RECODE of lastalc (E1. Last time took alcohol)) lastalc_3 | N mean sd min p50 max Never | >1 year ago | Within the past | year | Total |

ANOVA example CD4 count, by alcohol consumption category oneway var groupvar. oneway cd4count lastalc_3 Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = k=3 groups, n=994 total observations. n-k=991. di Ftail(2,991,2.45)

ANOVA example. oneway cd4count lastalc_3 Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 =

ANOVA Note that if you only have two groups, you will reach the same conclusion running an ANOVA as you would with a t-test The test statistic F stat will equal (t stat ) 2 22

T-test vs. F test (ANOVA) example. oneway cd4count sex Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(1) = Prob>chi2 =

T-test vs. F test (ANOVA) example. ttest cd4count, by(sex) Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] | | combined | diff | diff = mean(1) - mean(2) t = Ho: diff = 0 degrees of freedom = 997 Ha: diff 0 Pr(T |t|) = Pr(T > t) = di ^

Multiple comparisons If we reject H 0, we might want to know which means differed from each other But as noted before, if you test all combinations, you increase your chance of rejecting the null incorrectly To be conservative, we reduce the level of , that is we will reject the p-value at a level smaller than the original  25

Bonferroni method for multiple comparisons The Bonferroni methods divides  by the number of possible pairs of tests Example: if you have 3 groups and you started with  =0.05 then  * = 0.05 / (3 choose 2) = 0.05 / 3 = This means that you will only reject if p<

Multiple comparisons with ANOVA Use a t-test, but use the within group variance s w 2 that weights over all the groups (not just the 2 being examined) The test statistic for each pair of means is: and the degrees of freedom are n-k where n is the total number of observations and k is the total number of groups (note difference from regular t-test) Reject if the p-value is <  * – (Note: This is if you are doing the test by hand; if you use Stata option Bonferroni reject if p<  ) 27

Multiple comparisons.. oneway cd4count lastalc_3, bonferroni Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = Comparison of CD4Count by RECODE of lastalc (E1. Last time took alcohol) (Bonferroni) Row Mean-| Col Mean | Never >1 year >1 year | | | Within t | | Difference between the 2 means p-value for the difference, already adjusted for the fact that you are doing multiple comparisons (so reject if p<  ) 28

Statistical hypothesis tests Data and comparison type Alternative hypothesesTest and Stata command Numerical; One meanH a : μ≠ μ a (two-sided) H a : μ>μ a or μ<μ a (one-sided) Z or t-test ttest var1=hypoth val.* Numerical; Two means, paired data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) Paired t-test ttest var1=var2* Numerical; Two means, independent data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) T-test (equal or unequal variance) ttest var1, by(byvar) unequal Numerical, Two or more means, independent data H a : μ 1 ≠ μ 2 or μ 1 ≠ μ 3 or μ 2 ≠ μ 3 etc.ANOVA oneway var1 byvar Dichotomous; One proportionH a : p≠ p a (two-sided) H a : p>p a or p<p a (one-sided) Proportion test prtest var1=hypoth value* bitest var1=hypoth value Dichotomous; two proportionsH a : p 1 ≠ p 2 (two-sided) H a : p 1 >p 2 (one-sided) Proportion test (z-test) prtest var1, by(byvar) Categorical by categorical (nxk) 29

Parametric hypothesis test assumptions The hypothesis tests that use the z-statistic (i.e. when σ is known) assume that the underlying distribution of the parameter we are estimating (sample mean, sample proportion) is approximately normal. – True under the CLT if n is large enough. However, we usually do not know σ, and we use s 2 and compare our test statistic to the t-distribution. In theory, for this to work, the underlying distribution of the data must be normal, but in practicality, if n is fairly large and there are no extreme outliers, the t- test is valid. 30

Test assumptions If the data are not normally distributed, the t-test is not the most powerful test to use. (Note: less powerful does not mean invalid) – E.g. outliers will inflate the sample variance, decreasing the test statistic, thereby decreasing the chances of rejecting the null when it is false. Non-parametric tests do not rely on assuming a distribution for the data and therefore can help with this. However, note that independence of your observations is more critical than normality. – If your data points are not independent and you treat them as if they are, you will be acting like you have more data than you actually do (making you more likely to reject the null) 31

Differences in AUDIT-C example 32 * Using auditc_2studies.dta * hist auditc_diff, fcolor(blue) freq bin(5)

Nonparametric tests for paired observations The Sign test  For paired or matched observations (analogous to the paired t-test)  H 0 : median 1 = median 2  Most useful when  the sample size is small  OR the distribution of differences is very skewed 33

Nonparametric tests for paired observations The Sign test  The differences between the pairs are given a sign: + if a positive difference – if a negative difference nothing if the difference=0  Count the number of +s, denoted by D 34

Nonparametric tests for paired observations Under H 0, ½ the differences will be +s and ½ will be –s – That is, D/n=.5 This is equivalent to saying that the each difference is a Bernoulli random variable, that is, each is + or – with probability p=.5 Then the total number of +s (D) is a binomial random variable with p=0.5 and with n trials 35

Nonparametric tests for paired observations So then the p-value for the hypothesis test is the probability of observing D + differences if the true distribution is binomial with parameters n and p=0.5 P(X=D) with n trials and p=0.5 You could use the binomialtail function For a one-sided hypothesis: di binomialtail(n,D,.5) For a two-sided hypothesis: di 2*binomialtail(n,D,.5) 36

AUDIT-C scores on 2 interviews | uarto_id auditc_s2 auditc_s1 auditc_diff | sign | | 1. | MBA |. 2. | MBA |. 3. | MBA | + 4. | MBA |. 5. | MBA |. | | 6. | MBA |. 7. | MBA | + 8. | MBA |. 9. | MBA |. 10. | MBA | ** Using auditc_2studies.dta ** 1st 10 observations *

Sign test tab auditc_diff auditc_diff | Freq. Percent Cum | | | | Total | D=9 positive differences N=9 (don’t count the 19 ties) Using binomial distribution. di 2*binomialtail(9,9,.5)

In Stata signtest var1=var2. signtest auditc_s2=auditc_s1 Sign test sign | observed expected positive | negative | zero | all | One-sided tests: Ho: median of auditc_s2 - auditc_s1 = 0 vs. Ha: median of auditc_s2 - auditc_s1 > 0 Pr(#positive >= 9) = Binomial(n = 9, x >= 9, p = 0.5) = Ho: median of auditc_s2 - auditc_s1 = 0 vs. Ha: median of auditc_s2 - auditc_s1 < 0 Pr(#negative >= 0) = Binomial(n = 9, x >= 0, p = 0.5) = Two-sided test: Ho: median of auditc_s2 - auditc_s1 = 0 vs. Ha: median of auditc_s2 - auditc_s1 != 0 Pr(#positive >= 9 or #negative >= 9) = min(1, 2*Binomial(n = 9, x >= 9, p = 0.5)) = Uses the larger of the number of positive or negative signed pairs 39 NOTE that there is only 1 = in the command!

Normal approximation to the sign test If we say the number of + differences follows a binomial distribution, then we can use the normal approximation to the binomial Binomial mean = np ; Binomial SD =  (p(1-p)n) So mean =.5n and SD=  (.5(1-.5)n) Then D ~ N(.5n, .25n) using the normal approximation, and z ~ N(0,1) where z is: 40

Normal approximation for sign test Do not use if n<20 We use it here for the example only n=# of non-tied observations Z=(9-.5*9)/sqrt(.25*9). di (9-.5*9)/sqrt(.25*9) 3. di 2*(1-normal(3))

Nonparametric tests for paired observations Note that the Sign test can be used for ordinal data The sign test does not account for the magnitude of the difference in the outcome variable Another test, the Wilcoxon Signed-Rank Test, ranks the differences in the pairs Null hypothesis : median 1 = median 2 42

Nonparametric tests for paired observations The differences in the pairs are ranked Ties are given the average rank of the tied observations Each rank is assigned a sign (+/-) depending on whether the difference is positive or negative The absolute value of the smaller sum of the ranks is called T 43

Nonparametric tests for paired observations – T follows a normal distribution with m T = n*(n+1)/4 (the rank sum if both medians were equal) The test statistic z T = ( T- m T )/ σ T Compare to the standard normal distribution For n<12, use the exact distribution, table A.6 44

| uarto_id auditc~2 auditc~1 auditc~f rankdiff | | | 1. | MBA | 2. | MBA | 3. | MBA | 4. | MBA | 5. | MBA | | | 6. | MBA | 7. | MBA | 8. | MBA | 9. | MBA | 10. | MBA | | | 11. | MBA | 12. | MBA | 13. | MBA | 14. | MBA | 15. | MBA | | | 16. | MBA | 17. | MBA | 18. | MBA | 19. | MBA | 20. | MBA | | | 21. | MBA | 22. | MBA | 23. | MBA | 24. | MBA | 25. | MBA | | | 26. | MBA | 27. | MBA | 28. | MBA | 29. | MBA | 30. | MBA | egen rankdiff=rank(auditc_diff) list | uarto_id auditc~2 auditc~1 auditc~f rankdiff | | | 1. | MBA | 2. | MBA | 3. | MBA | 4. | MBA | 5. | MBA | | | 6. | MBA | 7. | MBA | 8. | MBA | 9. | MBA | 10. | MBA | | | 11. | MBA | 12. | MBA | 13. | MBA | 14. | MBA | 15. | MBA | | | 16. | MBA | 17. | MBA | 18. | MBA | 19. | MBA | 20. | MBA | | | 21. | MBA | 22. | MBA | 23. | MBA | 24. | MBA | 25. | MBA | | | 26. | MBA | 27. | MBA | 28. | MBA | 29. | MBA | 30. | MBA |

signrank var1 = var2.. signrank auditc_s2=auditc_s1 Wilcoxon signed-rank test sign | obs sum ranks expected positive | negative | zero | all | unadjusted variance adjustment for ties adjustment for zeros adjusted variance Ho: auditc_s2 = auditc_s1 z = Prob > |z| = This is a two-sided p-value arrived at using di 2*(1-normal(2.986)).0028 If you wanted a one-sided test, use. di 1-normal(2.986)

Another example (Thanks to L. Huang!) Study question: Does Efavirenz (EFV; an HIV drug) interfere with the pharmacokinetics (PK) of artemether–lumefantrine (AL; an antimalarial drug)? Study design (16 healthy subjects): – Administer AL for 3 days; measure PK – Administer AL+EFZ for 3 days; measure PK Null/alternative hypothesis? 47

The data (excel file) Artemether (ARM) Pharmacokinetic parameters AUC ◦- , hrng/mL subject# ALAL+EFV IS IS NA 884 NA IS NA IS NA NA: No samples available. IS: insufficient data due to concentration below quantification limit. 48

Cut and pasted into Stata 49 list | subject al alefv | | | 1. | IS | 2. | | 3. | | 4. | 4 IS IS | 5. | 5 IS IS | | | 6. | | 7. | 7 97 NA | 8. | 8 84 NA | 9. | IS | 10. | | | | 11. | | 12. | NA | 13. | | 14. | IS | 15. | NA | | | 16. | |

Remove observations were no PK data drop if alevf=="NA“ Make string variables into numeric variables. Variables where PK data=“IS” are forced to missing destring al, gen(al_noIS) force destring alefv, gen(alefv_noIS) force Calculate the difference between the paired observations gen diff_noIS = al_noIS - alefv_noIS 50

51.. list al alefv al_noIS alefv_noIS diff_noIS | al alefv al_noIS alefv_~S diff_n~S | | | 1. | 77.8 IS | 2. | | 3. | | 4. | IS IS... | 5. | IS IS... | | | 6. | | 7. | 42.8 IS | 8. | | 9. | | 10. | | | | 11. | 32.3 IS | 12. | |

Signed rank test. signrank al_noIS=alefv_noIS Wilcoxon signed-rank test sign | obs sum ranks expected positive | negative | zero | all | unadjusted variance adjustment for ties 0.00 adjustment for zeros adjusted variance Ho: al_noIS = alefv_noIS z = Prob > |z| =

However, when outcome is “IS”, that is real data telling us the drug concentration was very low and should not be ignored The limit of quantification was 2, so we replace with 1 gen alefv_1=alefv_noIS replace alefv_1=1 if alefv_noIS==. gen al_1=al_noIS replace al_1=1 if al_noIS==. gen diff_1 = al_1 - alefv_1 53

. list al alefv al_1 alefv_1 diff_ | al alefv al_1 alefv_1 diff_1 | | | 1. | 77.8 IS | 2. | | 3. | | 4. | IS IS | 5. | IS IS | | | 6. | | 7. | 42.8 IS | 8. | | 9. | | 10. | | | | 11. | 32.3 IS | 12. | |

Signed rank test signrank al_1=alefv_1 Wilcoxon signed-rank test sign | obs sum ranks expected positive | negative | zero | all | unadjusted variance adjustment for ties 0.00 adjustment for zeros adjusted variance Ho: al_1 = alefv_1 z = Prob > |z| =

Nonparametric tests for two independent samples The Wilcoxon Rank Sum Test – Also called the Mann-Whitney U test Null hypothesis : median 1 = median 2 Samples from independent populations – analogous to the t-test Assumes that the distributions of the 2 groups have the same shape 56

Nonparametric tests for two independent samples The entire sample (including the members of both groups) is ranked Average rank is given to ties Sum the ranks for each of the 2 samples – smaller sum is W The test statistic z W = ( W- m W )/ σ W is compared to the normal distribution (see P+G page 310 for the formula) If the sample sizes are small (<10), exact distributions are needed – Table A.7 57

ranksum var, by(byvar). ranksum cd4count, by(sex) Two-sample Wilcoxon rank-sum (Mann-Whitney) test sex_b | obs rank sum expected | | combined | unadjusted variance adjustment for ties adjusted variance Ho: cd4count(sex_b==1) = cd4count(sex_b==2) z = Prob > |z| = ** Using vct_baseline_biostat200_v1.dta ** 58 This is a two-sided p-value arrived at using di 2*normal(-3.145) If you wanted a one-sided test, use. di normal(-3.145)

Nonparametric tests for multiple independent samples The Kruskal-Wallis test extends the Wilcoxon rank sum test to 2 or more independent samples – You could use the Kruskal-Wallis with 2 independent samples and reach the same conclusion as if you had used the Wilcoxon Analogous to one-way analysis of variance 59

Nonparametric tests for independent samples (Kruskal Wallis) kwallis var, by(byvar). kwallis cd4count, by(lastalc_3) Kruskal-Wallis equality-of-populations rank test | lastalc_3 | Obs | Rank Sum | | | | Never | 373 | | | >1 year ago | 180 | | | Within the past year | 441 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability =

Parametric vs. non-parametric (distribution free) tests Non parametric tests: – No normality requirement – Do require that the underlying distributions being compared have the same basic shape – Ranks are less sensitive to outliers and to measurement error If the underlying distributions are approximately normal, then the parametric tests are more powerful 61

Statistical hypothesis tests Data and comparison type Alternative hypothesesParametric test Stata command Non-parametric test Stata command Numerical; One meanH a : μ≠ μ a (two-sided) H a : μ>μ a or μ<μ a (one-sided) Z or t-test ttest var1=hypoth val.* Numerical; Two means, paired data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) Paired t-test ttest var1=var2* Sign test signtest var1=var2 Wilcoxon Signed-Rank signrank var1=var2) Numerical; Two means, independent data H a : μ 1 ≠ μ 2 (two-sided) H a : μ 1 >μ 2 or μ<μ a (one-sided) T-test (equal or unequal variance) ttest var1, by(byvar) unequal Wilcoxon rank-sum test ranksum var1, by(byvar) Numerical, Two or more means, independent data H a : μ 1 ≠ μ 2 or μ 1 ≠ μ 3 or μ 2 ≠ μ 3 etc.ANOVA oneway var1 byvar Kruskal Wallis test kwallis var1, by(byvar) Dichotomous; One proportion H a : p≠ p a (two-sided) H a : p>p a or p<p a (one-sided) Proportion test prtest var1=hypoth value* bitest var1=hypoth value Dichotomous; two proportions H a : p 1 ≠ p 2 (two-sided) H a : p 1 >p 2 (one-sided) Proportion test (z-test) prtest var1, by(byvar) Categorical by categorical (nxk) H a : The rows not independent of the columns 62

For next time Read Pagano and Gauvreau – Pagano and Gauvreau Chapters (review) – Pagano and Gauvreau Chapter 15