Bivariate Testing (ttests and proportion tests)

Slides:



Advertisements
Similar presentations
Introduction to the t Statistic
Advertisements

Machine Learning Group University College Dublin Evaluation in Machine Learning Pádraig Cunningham.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Lecture 4 t-Tests. History (from Wikipedia) Introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Final Jeopardy $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 LosingConfidenceLosingConfidenceTesting.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Student’s t statistic Use Test for equality of two means
Statistical Methods II
Lecture 4 Ttests STAT 3120 Statistical Methods I.
Lab 5 Hypothesis testing and Confidence Interval.
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
Ttests Programming in R. The first part of these notes will address ttesting basics. The second part of these notes will address z test (or proportion.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.
Ttests INCM 9102 Quantitative Methods. Ttests The term “Ttest” comes from the application of the t-distribution to evaluate a hypothesis. Note: a “t-statistic”
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
T-TEST. Outline  Introduction  T Distribution  Example cases  Test of Means-Single population  Test of difference of Means-Independent Samples 
Virtual University of Pakistan
Data Analysis Module: One Way Analysis of Variance (ANOVA)
Causality, Null Hypothesis Testing, and Bivariate Analysis
Data Analysis Module: Bivariate Testing
Stat 251 (2009, Summer) Final Lab TA: Yu, Chi Wai.
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
Review 1. Describing variables.
Practical Statistics Mean Comparisons.
Non-Parametric Tests 12/1.
Chapter 4. Inference about Process Quality
Bivariate Testing (ANOVA)
Estimation & Hypothesis Testing for Two Population Parameters
Hypothesis Testing Review
STAT 4030 – Programming in R STATISTICS MODULE: Multiple Regression
Bivariate Testing (ttests and proportion tests)
Hypothesis testing. Chi-square test
Correlation and Regression Basics
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Y - Tests Type Based on Response and Measure Variable Data
STAT 4030 – Programming in R STATISTICS MODULE: Confidence Intervals
Correlation and Regression Basics
Bivariate Testing (ANOVA)
Hypothesis Theory PhD course.
HMI 7530– Programming in R STATISTICS MODULE: Multiple Regression
SA3202 Statistical Methods for Social Sciences
Nonparametric Statistical Methods: Overview and Examples
Daniela Stan Raicu School of CTI, DePaul University
Bivariate Testing (Chi Square)
HMI 7530– Programming in R STATISTICS MODULE: Confidence Intervals
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
Nonparametric Statistical Methods: Overview and Examples
Bivariate Testing (ttests and proportion tests)
Bivariate Testing (Chi Square)
Nonparametric Statistical Methods: Overview and Examples
Hypothesis testing. Chi-square test
Nonparametric Statistical Methods: Overview and Examples
Daniela Stan Raicu School of CTI, DePaul University
Statistical Analysis Chi-Square.
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Summary of Tests Confidence Limits
Data Analysis Module: Chi Square
Introduction to SAS Essentials Mastering SAS for Data Analytics
Inference for Who? Young adults. What? Heart rate (beats per minute).
Independent samples t-tests
Chapter 24 Comparing Two Means.
Descriptive statistics Pearson’s correlation
Statistical Inference for the Mean: t-test
Introductory Statistics
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Presentation transcript:

Bivariate Testing (ttests and proportion tests) STAT 4030 – Programming in R STATISTICS MODULE: Bivariate Testing (ttests and proportion tests) Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2

STATISTICS MODULE A side note of interest from Wikipedia: The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland. Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of beer. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. 3

STATISTICS MODULE: Bivariate Testing Ttests take three forms: One Sample Ttest - compares the mean of the sample to a given number. e.g. Is average monthly revenue per customer >$50 ? Formal Hypothesis Statement examples: H0:   $50 H1:  > $50 H0:  = $50 H1:   $50 4

STATISTICS MODULE: Bivariate Testing #here, the syntax looks like this – One sample, two sided, confidence level at 95%, tested against a designated value: t.test(vector, alternative=c("two.sided"), mu=55, conf.level=0.95) One sample, one sided, confidence level at 99%, tested against a designated value: t.test(vector, alternative=c("greater"), mu=55, conf.level=0.99) 5

STATISTICS MODULE: Bivariate Testing #here, the output looks like this – One Sample t-test data: Activity t = 3.1755, df = 39, p-value = 0.00292 alternative hypothesis: true mean is not equal to 55 95 percent confidence interval: 56.56107 62.03893 sample estimates: mean of x 59.3 6

STATISTICS MODULE: Bivariate Testing #note that you can also execute ttests by group…this is NOT a two sample test but rather two one sample tests – t.test(Activity[Group=="NORMAL"], mu=55, alternative = "two.sided", conf.level = 0.99) t.test(Activity[Group=="HYPER"], mu=55, alternative = "two.sided", conf.level = 0.99) 7

STATISTICS MODULE: Bivariate Testing 2. Paired Sample Ttest - compares the mean of the differences in the observations to a given number. e.g. Is there a difference in the production output of a facility after the implementation of new procedures? Formal Hypothesis Statement example: H0: diff = 0 H1: diff  0 8

STATISTICS MODULE: Bivariate Testing # here the syntax looks like this: t.test(vector1, vector2, paired = TRUE, conf.level = 0.90) #and the output looks like this: Paired t-test data: WidgeOne$Post_Training_Productivity and WidgeOne$Pre_Training_Productivity t = 1.7558, df = 39, p-value = 0.08698 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.07438134 1.05301176 sample estimates: mean of the differences 0.4893152 9

STATISTICS MODULE: Bivariate Testing Note that mathematically, the one sample ttest and the paired ttest are almost the same. Therefore, we can do this: WidgeOne$Diff <- WidgeOne$Post_Training_Productivity - WidgeOne$Pre_Training_Productivity mean(WidgeOne$Diff) t.test(WidgeOne$Diff, conf.level = 0.95) 10

STATISTICS MODULE: Bivariate Testing 3. Two Sample Ttest - compares the mean of the first sample minus the mean of the second sample to a given number. e.g. Is there a difference in the production output of two facilities? Formal Hypothesis Statement examples: H0: a - b = 0 H1: a - b  0 H0: a - b < 0 H1: a - b > 0 11

STATISTICS MODULE: Bivariate Testing When dealing with two sample, it is important to check the following assumptions: The samples are independent The samples have approximately equal variance The distribution of each sample is approximately normal Note – if the assumptions are violated and/or if the sample sizes are very small, we first try a transformation (e.g., take the log or the square root). If this does not work, then we engage in non-parametric analysis: Wilcoxon Rank Sum test (or Mann Whitney). 12

STATISTICS MODULE: Bivariate Testing #here the code looks like this: t.test(Activity ~ Drug, alternative = "two.sided", conf.level = 0.90) And the output looks like this: data: Activity by Group t = 1.9077, df = 35.532, p-value = 0.06454 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3180028 10.3180028 sample estimates: mean in group HYPER mean in group NORMAL 61.8 56.8 13

STATISTICS MODULE: Bivariate Testing Proportion testing works effectively the same way as ttesting – the main difference is that you need to use the Chi Square distribution because there is no estimateable standard deviation. 14

STATISTICS MODULE: Bivariate Testing One Sample Proportion Test - compares the proportion of the sample to a given number. e.g. Is the proportion of students who believe in love at first sight greater than 50%? H0: p  0.50 H1: p > 0.50 15

STATISTICS MODULE: Bivariate Testing #The code here takes a bit of work… Table object1<-table(factor) Sum(object1) Prop.test(object1[factor level],totaln, correct=FALSE, p= null hypothesis) Example: loveatfirst.count <- table(PSU$atfirst) prop.test(loveatfirst.count[3], sum(loveatfirst.count), correct=FALSE, p=0.50) Note that the “3” indicates the third level of the factor – which is “Yes”. 16

STATISTICS MODULE: Bivariate Testing The output looks like this: data: grtpers.count[2] out of sum(grtpers.count), null probability 0.5 X-squared = 2.0833, df = 1, p-value = 0.07446 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.4927361 1.0000000 sample estimates: p 0.5520833 17

STATISTICS MODULE: Bivariate Testing 2. Two Sample Proportion Test - compares the proportion of the first sample minus the proportion of the second sample to a given number. It is of common interest to test of two population proportions are equal. e.g. Is the proportion of students who believe in love at first sight different by gender? 18 18

STATISTICS MODULE: Bivariate Testing #basically, you need to create a table, and then execute the prop.test function: sex.by.grtpers.count<-table(PSU3b$Sex,(droplevels(PSU3b)$grtpers)) #note that this will compare the Female % No to the Male % No #this is because "no" is in the first column prop.test(sex.by.grtpers.count, correct=FALSE) data: grtpers.count[2] out of sum(grtpers.count), null probability 0.5 X-squared = 2.0833, df = 1, p-value = 0.07446 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.4927361 1.0000000 sample estimates: p 0.5520833 19