Bivariate Testing (ttests and proportion tests) STAT 4030 – Programming in R STATISTICS MODULE: Bivariate Testing (ttests and proportion tests) Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1
STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2
STATISTICS MODULE A side note of interest from Wikipedia: The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland. Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of beer. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. 3
STATISTICS MODULE: Bivariate Testing Ttests take three forms: One Sample Ttest - compares the mean of the sample to a given number. e.g. Is average monthly revenue per customer >$50 ? Formal Hypothesis Statement examples: H0: $50 H1: > $50 H0: = $50 H1: $50 4
STATISTICS MODULE: Bivariate Testing #here, the syntax looks like this – One sample, two sided, confidence level at 95%, tested against a designated value: t.test(vector, alternative=c("two.sided"), mu=55, conf.level=0.95) One sample, one sided, confidence level at 99%, tested against a designated value: t.test(vector, alternative=c("greater"), mu=55, conf.level=0.99) 5
STATISTICS MODULE: Bivariate Testing #here, the output looks like this – One Sample t-test data: Activity t = 3.1755, df = 39, p-value = 0.00292 alternative hypothesis: true mean is not equal to 55 95 percent confidence interval: 56.56107 62.03893 sample estimates: mean of x 59.3 6
STATISTICS MODULE: Bivariate Testing #note that you can also execute ttests by group…this is NOT a two sample test but rather two one sample tests – t.test(Activity[Group=="NORMAL"], mu=55, alternative = "two.sided", conf.level = 0.99) t.test(Activity[Group=="HYPER"], mu=55, alternative = "two.sided", conf.level = 0.99) 7
STATISTICS MODULE: Bivariate Testing 2. Paired Sample Ttest - compares the mean of the differences in the observations to a given number. e.g. Is there a difference in the production output of a facility after the implementation of new procedures? Formal Hypothesis Statement example: H0: diff = 0 H1: diff 0 8
STATISTICS MODULE: Bivariate Testing # here the syntax looks like this: t.test(vector1, vector2, paired = TRUE, conf.level = 0.90) #and the output looks like this: Paired t-test data: WidgeOne$Post_Training_Productivity and WidgeOne$Pre_Training_Productivity t = 1.7558, df = 39, p-value = 0.08698 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.07438134 1.05301176 sample estimates: mean of the differences 0.4893152 9
STATISTICS MODULE: Bivariate Testing Note that mathematically, the one sample ttest and the paired ttest are almost the same. Therefore, we can do this: WidgeOne$Diff <- WidgeOne$Post_Training_Productivity - WidgeOne$Pre_Training_Productivity mean(WidgeOne$Diff) t.test(WidgeOne$Diff, conf.level = 0.95) 10
STATISTICS MODULE: Bivariate Testing 3. Two Sample Ttest - compares the mean of the first sample minus the mean of the second sample to a given number. e.g. Is there a difference in the production output of two facilities? Formal Hypothesis Statement examples: H0: a - b = 0 H1: a - b 0 H0: a - b < 0 H1: a - b > 0 11
STATISTICS MODULE: Bivariate Testing When dealing with two sample, it is important to check the following assumptions: The samples are independent The samples have approximately equal variance The distribution of each sample is approximately normal Note – if the assumptions are violated and/or if the sample sizes are very small, we first try a transformation (e.g., take the log or the square root). If this does not work, then we engage in non-parametric analysis: Wilcoxon Rank Sum test (or Mann Whitney). 12
STATISTICS MODULE: Bivariate Testing #here the code looks like this: t.test(Activity ~ Drug, alternative = "two.sided", conf.level = 0.90) And the output looks like this: data: Activity by Group t = 1.9077, df = 35.532, p-value = 0.06454 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3180028 10.3180028 sample estimates: mean in group HYPER mean in group NORMAL 61.8 56.8 13
STATISTICS MODULE: Bivariate Testing Proportion testing works effectively the same way as ttesting – the main difference is that you need to use the Chi Square distribution because there is no estimateable standard deviation. 14
STATISTICS MODULE: Bivariate Testing One Sample Proportion Test - compares the proportion of the sample to a given number. e.g. Is the proportion of students who believe in love at first sight greater than 50%? H0: p 0.50 H1: p > 0.50 15
STATISTICS MODULE: Bivariate Testing #The code here takes a bit of work… Table object1<-table(factor) Sum(object1) Prop.test(object1[factor level],totaln, correct=FALSE, p= null hypothesis) Example: loveatfirst.count <- table(PSU$atfirst) prop.test(loveatfirst.count[3], sum(loveatfirst.count), correct=FALSE, p=0.50) Note that the “3” indicates the third level of the factor – which is “Yes”. 16
STATISTICS MODULE: Bivariate Testing The output looks like this: data: grtpers.count[2] out of sum(grtpers.count), null probability 0.5 X-squared = 2.0833, df = 1, p-value = 0.07446 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.4927361 1.0000000 sample estimates: p 0.5520833 17
STATISTICS MODULE: Bivariate Testing 2. Two Sample Proportion Test - compares the proportion of the first sample minus the proportion of the second sample to a given number. It is of common interest to test of two population proportions are equal. e.g. Is the proportion of students who believe in love at first sight different by gender? 18 18
STATISTICS MODULE: Bivariate Testing #basically, you need to create a table, and then execute the prop.test function: sex.by.grtpers.count<-table(PSU3b$Sex,(droplevels(PSU3b)$grtpers)) #note that this will compare the Female % No to the Male % No #this is because "no" is in the first column prop.test(sex.by.grtpers.count, correct=FALSE) data: grtpers.count[2] out of sum(grtpers.count), null probability 0.5 X-squared = 2.0833, df = 1, p-value = 0.07446 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.4927361 1.0000000 sample estimates: p 0.5520833 19