Download presentation
Presentation is loading. Please wait.
Published byΧθόνια Ζαφειρόπουλος Modified over 6 years ago
1
Bivariate Testing (ttests and proportion tests)
HMI 7530– Programming in R STATISTICS MODULE: Bivariate Testing (ttests and proportion tests) Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1
2
STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2
3
STATISTICS MODULE A side note of interest from Wikipedia:
The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland. Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of beer. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. 3
4
STATISTICS MODULE: Bivariate Testing
Ttests take three forms: One Sample Ttest - compares the mean of the sample to a given number. e.g. Is average monthly revenue per customer >$50 ? Formal Hypothesis Statement examples: H0: $50 H1: > $50 H0: = $50 H1: $50 4
5
STATISTICS MODULE: Bivariate Testing
#here, the syntax looks like this – One sample, two sided, confidence level at 95%, tested against a designated value: t.test(vector, alternative=c("two.sided"), mu=55, conf.level=0.95) One sample, one sided, confidence level at 99%, tested against a designated value: t.test(vector, alternative=c("greater"), mu=55, conf.level=0.99) 5
6
STATISTICS MODULE: Bivariate Testing
#here, the output looks like this – One Sample t-test data: Activity t = , df = 39, p-value = alternative hypothesis: true mean is not equal to 55 95 percent confidence interval: sample estimates: mean of x 59.3 6
7
STATISTICS MODULE: Bivariate Testing
#note that you can also execute ttests by group…this is NOT a two sample test but rather two one sample tests – t.test(Activity[Group=="NORMAL"], mu=55, alternative = "two.sided", conf.level = 0.99) t.test(Activity[Group=="HYPER"], mu=55, alternative = "two.sided", conf.level = 0.99) 7
8
STATISTICS MODULE: Bivariate Testing
2. Paired Sample Ttest - compares the mean of the differences in the observations to a given number. e.g. Is there a difference in the production output of a facility after the implementation of new procedures? Formal Hypothesis Statement example: H0: diff = 0 H1: diff 0 8
9
STATISTICS MODULE: Bivariate Testing
# here the syntax looks like this: t.test(vector1, vector2, paired = TRUE, conf.level = 0.90) #and the output looks like this: Paired t-test data: WidgeOne$Post_Training_Productivity and WidgeOne$Pre_Training_Productivity t = , df = 39, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of the differences 9
10
STATISTICS MODULE: Bivariate Testing
Note that mathematically, the one sample ttest and the paired ttest are almost the same. Therefore, we can do this: WidgeOne$Diff <- WidgeOne$Post_Training_Productivity - WidgeOne$Pre_Training_Productivity mean(WidgeOne$Diff) t.test(WidgeOne$Diff, conf.level = 0.95) 10
11
STATISTICS MODULE: Bivariate Testing
3. Two Sample Ttest - compares the mean of the first sample minus the mean of the second sample to a given number. e.g. Is there a difference in the production output of two facilities? Formal Hypothesis Statement examples: H0: a - b = 0 H1: a - b 0 H0: a - b < 0 H1: a - b > 0 11
12
STATISTICS MODULE: Bivariate Testing
When dealing with two sample, it is important to check the following assumptions: The samples are independent The samples have approximately equal variance The distribution of each sample is approximately normal Note – if the assumptions are violated and/or if the sample sizes are very small, we first try a transformation (e.g., take the log or the square root). If this does not work, then we engage in non-parametric analysis: Wilcoxon Rank Sum test (or Mann Whitney). 12
13
STATISTICS MODULE: Bivariate Testing
#here the code looks like this: t.test(Activity ~ Drug, alternative = "two.sided", conf.level = 0.90) And the output looks like this: data: Activity by Group t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group HYPER mean in group NORMAL 13
14
STATISTICS MODULE: Bivariate Testing
Proportion testing works effectively the same way as ttesting – the main difference is that you need to use the Chi Square distribution because there is no estimateable standard deviation. 14
15
STATISTICS MODULE: Bivariate Testing
One Sample Proportion Test - compares the proportion of the sample to a given number. e.g. Is the proportion of students who believe in love at first sight greater than 50%? H0: p 0.50 H1: p > 0.50 15
16
STATISTICS MODULE: Bivariate Testing
#The code here takes a bit of work… Table object1<-table(factor) Sum(object1) Prop.test(object1[factor level],totaln, correct=FALSE, p= null hypothesis) Example: loveatfirst.count <- table(PSU$atfirst) prop.test(loveatfirst.count[3], sum(loveatfirst.count), correct=FALSE, p=0.50) Note that the “3” indicates the third level of the factor – which is “Yes”. 16
17
STATISTICS MODULE: Bivariate Testing
The output looks like this: data: grtpers.count[2] out of sum(grtpers.count), null probability 0.5 X-squared = , df = 1, p-value = alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: sample estimates: p 17
18
STATISTICS MODULE: Bivariate Testing
2. Two Sample Proportion Test - compares the proportion of the first sample minus the proportion of the second sample to a given number. It is of common interest to test of two population proportions are equal. e.g. Is the proportion of students who believe in love at first sight different by gender? 18 18
19
STATISTICS MODULE: Bivariate Testing
#basically, you need to create a table, and then execute the prop.test function: sex.by.grtpers.count<-table(PSU3b$Sex,(droplevels(PSU3b)$grtpers)) #note that this will compare the Female % No to the Male % No #this is because "no" is in the first column prop.test(sex.by.grtpers.count, correct=FALSE) data: sex.by.grtpers.count X-squared = , df = 1, p-value = alternative hypothesis: two.sided 95 percent confidence interval: sample estimates: prop 1 prop 2 19
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.