Bivariate Testing (ANOVA)

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
ANOVA: Analysis of Variation
Analysis and Interpretation Inferential Statistics ANOVA
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
ANALYSIS OF VARIANCE.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
PSY 307 – Statistics for the Behavioral Sciences
Independent Sample T-test Formula
Statistics Are Fun! Analysis of Variance
Ch. 14: The Multiple Regression Model building
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 12: Analysis of Variance
AM Recitation 2/10/11.
Chapter 13: Inference in Regression
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
More About Significance Tests
© Buddy Freeman, 2015 H 0 : H 1 : α = Decision Rule: If then do not reject H 0, otherwise reject H 0. Test Statistic: Decision: Conclusion: We have found.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
ANOVA (Analysis of Variance) by Aziza Munir
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
PPA 415 – Research Methods in Public Administration Lecture 7 – Analysis of Variance.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
ANOVA P OST ANOVA TEST 541 PHL By… Asma Al-Oneazi Supervised by… Dr. Amal Fatani King Saud University Pharmacy College Pharmacology Department.
The Analysis of Variance ANOVA
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Analysis of Variance STAT E-150 Statistical Methods.
T tests comparing two means t tests comparing two means.
CHAPTER 10: ANALYSIS OF VARIANCE(ANOVA) Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
ANOVA: Analysis of Variation
Data Analysis Module: One Way Analysis of Variance (ANOVA)
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Data Analysis Module: Bivariate Testing
Bivariate Testing (ttests and proportion tests)
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
ANOVA: Analysis of Variation
Bivariate Testing (ANOVA)
Estimation & Hypothesis Testing for Two Population Parameters
Comparing Three or More Means
Basic Practice of Statistics - 5th Edition
STAT 4030 – Programming in R STATISTICS MODULE: Multiple Regression
Bivariate Testing (ttests and proportion tests)
Correlation and Regression Basics
STAT 4030 – Programming in R STATISTICS MODULE: Confidence Intervals
Correlation and Regression Basics
Chapter 10 Two-Sample Tests and One-Way ANOVA.
HMI 7530– Programming in R STATISTICS MODULE: Multiple Regression
Kin 304 Inferential Statistics
Bivariate Testing (Chi Square)
HMI 7530– Programming in R STATISTICS MODULE: Confidence Intervals
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
Bivariate Testing (ttests and proportion tests)
Bivariate Testing (Chi Square)
Interval Estimation and Hypothesis Testing
One way ANALYSIS OF VARIANCE (ANOVA)
Data Analysis Module: Chi Square
What are their purposes? What kinds?
ANOVA: Analysis of Variance
Introductory Statistics
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Bivariate Testing (ANOVA) HMI 7530– Programming in R STATISTICS MODULE: Bivariate Testing (ANOVA) Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2

STATISTICS MODULE: ANOVA What if we have more than two categories across which we want to compare the value of some quantitative variable? For example, lets say that we wanted to compare the mean weight loss of subjects who were put on one of four diet plans. For ease of discussion, lets call these plans A, B, C and D. 3

STATISTICS MODULE: ANOVA The following approach would be tempting… H0: Plan A = Plan B H1: plan A  Plan B H0: Plan B = Plan C H1: plan B  Plan C H0: Plan C = Plan D H1: plan C  Plan D H0: Plan A = Plan C H1: plan A  Plan C H0: Plan A = Plan D H1: plan A  Plan D H0: Plan B = Plan D H1: plan B  Plan D 4

STATISTICS MODULE: ANOVA …but wrong. Apart from being very cumbersome, there is a critical problem – we are inflating our probability of making a type 1 error. Think about that – lets use alpha = .05. If we ran 6 separate tests, that would generate a cumulative probability of a type 1 error of .3. We could lower the alpha value to .05/6 – I hear you saying. But this has its own problems – what happens if the number of tests increase to 8 or 10? Our alpha value would become so low, we would almost never reject the null (recall Power). 5

STATISTICS MODULE: ANOVA Lets discuss how to use ANOVA to test a hypothesis by returning to our dieters… In this instance there are four levels (diet plans) to a single factor (weight loss). The hypothesis statements would look like this: H0: All level means are equal. In other words, all four of the diet plans generate approximately the same amount of weight loss. H1: Not all of the level means are equal. In other words, at least one of the plans’ weight loss mean is statistically significant different from the other plans’ means. 6

STATISTICS MODULE: ANOVA Prior to executing the test, we must check for three important assumptions about our data: All the groups are normally distributed. All the populations sampled have approximately equal variance (you can check this by generating side-by-side boxplots). The rule of thumb is that the largest std is <2x the smallest std. The samples of the groups are independent of each other and subjects within the groups were randomly selected. As with most, but not all, statistical tests, if our samples are large, we can relax our assumptions and work around non normal data. 7

STATISTICS MODULE: ANOVA Lets examine the hypothesis statements in more detail: H0: µa = µb = µc = µd H1: µa ≠ µb ≠ µc ≠ µd Consider – what would the hypothesized distributions look like under H0 and H1? 8

STATISTICS MODULE: ANOVA Ok. We understand the concept, we have the hypotheses, we have the assumptions – we need a test statistic. In ANOVA, we use the F-distribution. In the science of statistics, whenever you need to evaluate a ratio of variances you will be using an F-statistic. The ratio in question here is: The variation BETWEEN the groups The variation WITHIN the groups Question – what kind of value would indicate difference versus no difference?

STATISTICS MODULE: ANOVA Returning to the diet plans… PLAN Mean PLAN A 14 20 22 26 27 20.50 PLAN B 15 18 23 25 28 30 23.17 PLAN C 32 36 40 42 45 40.00 PLAN D 33 38 44 46 47 41.67 OVERALL MEAN 31.33 10

STATISTICS MODULE: ANOVA Our hypotheses statements would be: H0: The four diets plans have the same results (the mean weight loss is the same) H1: At least one of the diet plans has a different result (the mean weight loss is different) We will now calculate our test statistic: The variation BETWEEN the groups The variation WITHIN the groups 11

STATISTICS MODULE: ANOVA To calculate the F-Statistic, we use the following table: SOURCE SUM OF SQUARES DEGREES OF FREEDOM MEAN SQUARE F-stat BETWEEN SSB # levels – 1 SSB/(# levels – 1) {SSB/(# levels – 1)} {SSW(n- # levels)} WITHIN SSW n- # levels SSW/(n- # levels) TOTAL SST (SSB + SSW) n-1 12

STATISTICS MODULE: ANOVA For those who are interested: SST = SSW + SSB ij(Xij-X)2 = ij(Xij-Xj)2 + nj(Xj-X)2 _ _ _ 13

STATISTICS MODULE: ANOVA For the present problem: SOURCE SUM OF SQUARES DEGREES OF FREEDOM MEAN SQUARE F-stat BETWEEN1 2195.67 3 731.89 24.33 WITHIN2 601.67 20 30.08 TOTAL 2797.34 23 1 SSB = 6(10.832 + 8.172 + 8.672 +10.332) 2SSW = (159.50 + 166.83 + 134 + 141.33) = 601.67

STATISTICS MODULE: ANOVA Now…what to do with an F-statistic of 24.33? This is a fairly strong statistic – recall that as the variance ratio approaches 1, the null is true. As the variance ratio grows larger than 1, we can more confidently reject the null. As with all test statistics, this result will translate into a p-value. The p-value associated with this statistic is less than .001. Based upon this result, we can confidently reject the null hypothesis and conclude that at least one of the results is different.

STATISTICS MODULE: ANOVA We are going to use some simple ANOVA code: a1 <- aov (y ~ x) Where y is the quantitative continuous variable and x is the categorical variable with more than 3 levels. a1 summary(a1) require(graphics) summary(a1 <- aov((y ~ x)) TukeyHSD(a1, “x", ordered = TRUE) plot(TukeyHSD(a1, “x"))