Data Analysis Module: Bivariate Testing

Slides:



Advertisements
Similar presentations
Introduction to the t Statistic
Advertisements

Machine Learning Group University College Dublin Evaluation in Machine Learning Pádraig Cunningham.
The Independent- Samples t Test Chapter 11. Independent Samples t-Test >Used to compare two means in a between-groups design (i.e., each participant is.
Lecture 4 t-Tests. History (from Wikipedia) Introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland.
PSY 307 – Statistics for the Behavioral Sciences
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Sample size computations Petter Mostad
Data Analysis Statistics. Inferential statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
PSY 307 – Statistics for the Behavioral Sciences
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Lecture 4 Ttests STAT 3120 Statistical Methods I.
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
NONPARAMETRIC STATISTICS
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Ttests Programming in R. The first part of these notes will address ttesting basics. The second part of these notes will address z test (or proportion.
Analyzing and Interpreting Quantitative Data
Hypothesis Testing of Proportions INCM 9102 Quantitative Methods.
1 Nonparametric Statistical Techniques Chapter 17.
Academic Research Academic Research Dr Kishor Bhanushali M
Lecture 5 TtestsAbout Proportions STAT 3120 Statistical Methods I.
Chapter 10 The t Test for Two Independent Samples
Chapter 6: Analyzing and Interpreting Quantitative Data
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Ttests INCM 9102 Quantitative Methods. Ttests The term “Ttest” comes from the application of the t-distribution to evaluate a hypothesis. Note: a “t-statistic”
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Introduction to the t statistic. Steps to calculate the denominator for the t-test 1. Calculate variance or SD s 2 = SS/n-1 2. Calculate the standard.
Lecture 5 Tests About Proportions STAT 3120 Statistical Methods I.
1 Nonparametric Statistical Techniques Chapter 18.
T-TEST. Outline  Introduction  T Distribution  Example cases  Test of Means-Single population  Test of difference of Means-Independent Samples 
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Data Analysis Module: One Way Analysis of Variance (ANOVA)
Bivariate Testing (ttests and proportion tests)
Part Four ANALYSIS AND PRESENTATION OF DATA
Practical Statistics Mean Comparisons.
Hypothesis Testing I The One-sample Case
Chapter 4. Inference about Process Quality
Bivariate Testing (ANOVA)
Bivariate Testing (ttests and proportion tests)
STAT 4030 – Programming in R STATISTICS MODULE: Confidence Intervals
Bivariate Testing (ANOVA)
SA3202 Statistical Methods for Social Sciences
Daniela Stan Raicu School of CTI, DePaul University
Chapter 9 Hypothesis Testing.
Bivariate Testing (Chi Square)
HMI 7530– Programming in R STATISTICS MODULE: Confidence Intervals
Bivariate Testing (ttests and proportion tests)
Bivariate Testing (Chi Square)
Daniela Stan Raicu School of CTI, DePaul University
Comparing two Rates Farrokh Alemi Ph.D.
Statistics for the Social Sciences
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 2)
Data Analysis Module: Chi Square
Statistics II: An Overview of Statistics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Inference for Who? Young adults. What? Heart rate (beats per minute).
15.1 The Role of Statistics in the Research Process
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Chapter 9 Hypothesis Testing: Single Population
Hypothesis testing using R
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Presentation transcript:

Data Analysis Module: Bivariate Testing Programming in R Data Analysis Module: Bivariate Testing

Data Analysis Module Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests/Bivariate testing One Sample Paired Independent Two Sample ANOVA Chi Square and Odds Regression Basics

Data Analysis Module:Bivariate Testing The first part of these notes will address ttesting basics. The second part of these notes will address z test (or proportion testing) basics.

Data Analysis Module:Bivariate Testing The term “Ttest” comes from the application of the t-distribution to evaluate a hypothesis. The t-distribution is used when the sample size is too small (less than 30) to use s/SQRT(n) as a substitute for the population std. In practice, even hypothesis tests with sample sizes greater than 30, which utilize the normal distribution, are commonly referred to as “ttests”. Note: a “t-statistic” and a “z-score” are conceptually similar – both convert measurements into standardized scores which follow a roughly normal distribution.

Data Analysis Module:Bivariate Testing A side note of interest from Wikipedia: The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland. Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of beer. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret.

Data Analysis Module: Bivariate Testing Ttests take three forms: One Sample Ttest - compares the mean of the sample to a given number. e.g. Is average monthly revenue per customer who switches >$50 ? Formal Hypothesis Statement examples: H0:   $50 H1:  > $50 H0:  = $50 H1:   $50

Data Analysis Module: Bivariate Testing Example: After a massive outbreak of salmonella, the CDC determined that the source was from a particular manufacturer of ice cream. The CDC sampled 9 production runs if the manufacturer, with the following results (all in MPN/g): .593 .142 .329 .691 .231 .793 .519 .392 .418 Use this data to determine if the avg level of salmonella is greater than .3 MPN/g, which is considered to be dangerous.

Data Analysis Module: Bivariate Testing First, Identify the Hypothesis Statements, including the Type I and Type II errors…and your assignment of alpha. Then, do the computation by hand…

Data Analysis Module: Bivariate Testing #here, the syntax is: t.test(vector to be analyzed, vector to be analyzed, * alternative hypothesis) * paired = TRUE for a paired ttest One sample t test is the default

Data Analysis Module: Bivariate Testing Two Sample Ttest - compares the mean of the first sample minus the mean of the second sample to a given number. e.g. Is there a difference in the production output of two facilities? Formal Hypothesis Statement examples: H0: a - b =0 H1: a - b  0

Data Analysis Module: Bivariate Testing When dealing with two sample or paired ttests, it is important to check the following assumptions: The samples are independent The samples have approximately equal variance The distribution of each sample is approximately normal Note – if the assumptions are violated and/or if the sample sizes are very small, we first try a transformation (e.g., take the log or the square root). If this does not work, then we engage in non-parametric analysis: Wilcoxan Rank Sum or Wilcoxan Signed Rank tests.

Data Analysis Module: Bivariate Testing # here the syntax is: t.test(vector to be tested~two level factor, data = data, var.equal=FALSE*) plot(t.test(vector to be tested~two level factor, data = data) *If the variances are similar, this would be set to TRUE

Data Analysis Module: Bivariate Testing Paired Sample Ttest - compares the mean of the differences in the observations to a given number. e.g. Is there a difference in the production output of a facility after the implementation of new procedures? Formal Hypothesis Statement example: H0: diff=0 H1: diff  0

Data Analysis Module: Bivariate Testing #here, the syntax is: t.test (vector to be analyzed, vector to be analyzed, paired = TRUE for a paired ttest, alternative = “greater”*) *the alternative hypothesis could also be “less than”. The default is not equal.

Z testing…or proportion based testing… Data Analysis Module: Bivariate Testing Z testing…or proportion based testing…

Z=(p-po)/SQRT((po(1-po)/n) Data Analysis Module: Bivariate Testing The testing formula for a one sample proportion is a simple z calculation: Z = (sample estimate – Null value)/Null Standard Error For a proportion, this would be: Z=(p-po)/SQRT((po(1-po)/n)

Data Analysis Module: Bivariate Testing Example of a one sample proportion test: If 30% of cars on a street are found to be speeding, the city will install “traffic calming” devices. John used his radar gun to measure the speeds of 400 cars on his street. He found that 32% were speeding. Will John get “traffic calming” devices on his street?

Data Analysis Module: Bivariate Testing Table object1<-table(factor) Sum(object1) Prop.test(object1[factor level],totaln, correct=FALSE, p= null hypothesis) Example: loveatfirst.count <- table(PSU$atfirst) prop.test(loveatfirst.count[3],227, correct=FALSE, p=0.45) Note that the “3” indicates the third level of the factor – which is “Yes”.

Data Analysis Module: Bivariate Testing Answer the following: Identify the Null and Alternative Hypotheses Identify the Type I and Type II errors, including the implications What is an appropriate alpha value? What is the associated p-value? What is your conclusion?

Data Analysis Module: Bivariate Testing 2. Two Sample Test - compares the proportion of the first sample minus the proportion of the second sample to a given number. It is of common interest to test of two population proportions are equal. e.g. Is there a difference in the percentage of students who pass a standardized test between those who took a prep course and those who did not? Formal Hypothesis Statement examples: H0: pa - pb =0 H0: pa - pb <0 H1: pa - pb  0 H1: pa - pb > 0

Data Analysis Module: Bivariate Testing Before you undertake a two sample test, there are few things to be determined: The two samples must be independent The number of individuals with each trait of interest and the number without the trait of interest must be at least 10 in each sample.

Data Analysis Module: Bivariate Testing #here, the code is pretty easy…just make the 2x2 table and then apply the prop.test function: FactorVar1.by.FactorVar2<-table(FactorVar1,FactorVar2) prop.test(FactorVar1.by.FactorVar2, correct=FALSE) Example: PSU$Wt <- ifelse(PSU$WtFeel=="RightWt","Right", ifelse(PSU$WtFeel=="OverWt"|PSU$WtFeel=="UnderWt", "Wrong","" ,)) PSU <- PSU[-which(PSU$Wt==""),] sex.by.wt <- table(PSU$Sex, PSU$Wt) prop.test(sex.by.wt, correct=FALSE)

Data Analysis Module: Bivariate Testing Answer the following: Identify the Null and Alternative Hypotheses Identify the Type I and Type II errors, including the implications What is an appropriate alpha value? Using the formula, determine the test statistic. What is the associated p-value? What is your conclusion?