Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 21, Slide 1 Chapter 21 Comparing Two Proportions.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Copyright (c) Bani Mallick1 Stat 651 Lecture 5. Copyright (c) Bani Mallick2 Topics in Lecture #5 Confidence intervals for a population mean  when the.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
Classical Regression III
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Copyright (c) Bani Mallick1 STAT 651 Lecture 10. Copyright (c) Bani Mallick2 Topics in Lecture #10 Comparing two population means using rank tests Comparing.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Lecture 9: One Way ANOVA Between Subjects
Two Groups Too Many? Try Analysis of Variance (ANOVA)
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.
Inferences About Process Quality
Copyright (c) Bani Mallick1 STAT 651 Lecture # 11.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Copyright (c) Bani K. mallick1 STAT 651 Lecture #14.
5-3 Inference on the Means of Two Populations, Variances Unknown
Introduction to Analysis of Variance (ANOVA)
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Inferential Statistics: SPSS
Chapter 13: Inference in Regression
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Comparing Two Proportions
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
ANOVA (Analysis of Variance) by Aziza Munir
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Randomized completely Block design(RCBD). In this design we are interested in comparing t treatment by using b blocks.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
ANOVA: Analysis of Variance.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 10 The t Test for Two Independent Samples
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 6.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
The p-value approach to Hypothesis Testing
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Math Interlude: Signs, Symbols, and ANOVA nuts and bolts.
Basic Practice of Statistics - 5th Edition
Correlation and Regression
I. Statistical Tests: Why do we use them? What do they involve?
Chapter 10 Introduction to the Analysis of Variance
Presentation transcript:

Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12

Copyright (c) Bani K. Mallick2 Topics in Lecture #12 The ANOVA F-test, and the basics of the F- table

Copyright (c) Bani K. Mallick3 Book Sections Covered in Lecture #12 Chapter

Copyright (c) Bani K. Mallick4 Relevant SPSS Tutorials ANOVA-GLM Post hoc tests

Copyright (c) Bani K. Mallick5 ANalysis Of VAriance We now turn to making inferences when there are 3 or more populations This is classically called ANOVA It is somewhat more formula dense than what we have been used to. Tests for normality are also somewhat more complex

Copyright (c) Bani K. Mallick6 ANOVA Suppose we form three populations on the basis of body mass index (BMI): BMI 28 This forms 3 populations We want to know whether the three populations have the same mean caloric intake, or if their food composition differs.

Copyright (c) Bani K. Mallick7 ANOVA If you do lots of 95% confidence intervals, you’d expect by chance that about 5% will be wrong Thus, if you do 20 confidence intervals, you expect 1 = 20 x 5% will not include the true population parameter This is a fact of life

Copyright (c) Bani K. Mallick8 ANOVA One procedure that is often followed is to do a preliminary test to see whether there are any differences among the populations Then, once you conclude that some differences exist, you allow somewhat more informality in deciding where those differences manifest themselves The first step is the ANOVA F-test

Copyright (c) Bani K. Mallick9 ANOVA Consider the ACS data, with 3 BMI groups and measuring the % calories from fat (first FFQ) What is your preliminary conclusion about differences in means/medians? About differences in variability? About massive outliers?

Copyright (c) Bani K. Mallick10 ANOVA Consider the ACS data, with 3 BMI groups and measuring the % calories from fat (first FFQ)

Copyright (c) Bani K. Mallick11 ANOVA The F-test is easy to compute, and provided in all statistical packages The populations are 1, 2, … t The sample sizes are The population means are The hypothesis to test is

Copyright (c) Bani K. Mallick12 ANOVA The data from population i are The sample mean from population i is The sample mean of all the data is The total sample size is the total number of observations, called

Copyright (c) Bani K. Mallick13 ANOVA The ANOVA Table (demo in SPSS in class of ACS data) “Analyze” “Compare Means” “1- way ANOVA”: I’ll now show you what each item is

Copyright (c) Bani K. Mallick14 ANOVA The sample mean from population i is The sample mean of all the data is The idea of the F-test is based on distances The distance of the data to the overall mean is TSS = Total Sum of Squares

Copyright (c) Bani K. Mallick15 ANOVA The distance of the data to the overall mean is TSS = Total Sum of Squares This has degrees of freedom

Copyright (c) Bani K. Mallick16 ANOVA Next comes the “Between Groups” row

Copyright (c) Bani K. Mallick17 ANOVA The sum of squares between groups is It has t-1 degrees of freedom, so the number of populations is the degrees of freedom between groups + 1.

Copyright (c) Bani K. Mallick18 ANOVA Next comes the “Within Groups” row

Copyright (c) Bani K. Mallick19 ANOVA The distance of the observations to their sample means is This is the Sum of Squares Within It has degrees of freedom

Copyright (c) Bani K. Mallick20 ANOVA Next comes the “Mean Squares” These are the different sums of squares divided by their degrees of freedom

Copyright (c) Bani K. Mallick21 ANOVA Next comes the F-statistic It is the ratio of the mean square between groups to the mean square within groups

Copyright (c) Bani K. Mallick22 ANOVA The F-statistic is

Copyright (c) Bani K. Mallick23 ANOVA The F-statistic is What values (large or small) indicate differences? Clearly large, since if the population means are equal, the sample means will be close, and the top will be near 0

Copyright (c) Bani K. Mallick24 Why do they call it ANOVA? ANOVA = ANalysis Of VAriance I want to show you the concept in graphs, because these become important in STAT 652 I will illustrate the idea with samples from two populations The first population will be in red The second in green When pooled I will use blue

Copyright (c) Bani K. Mallick25 Why do they call it ANOVA? The data from the first population The total distance of the observations to their sample mean is Y Y Y

Copyright (c) Bani K. Mallick26 Why do they call it ANOVA? I will use this funny symbol to denote total distance The total distance of the observations to their sample mean is Y Y Y

Copyright (c) Bani K. Mallick27 Why do they call it ANOVA? Consider two similar populations Summing the two symbols is the SSW = Sum of Squared distances Within the two samples Y Y Y

Copyright (c) Bani K. Mallick28 Why do they call it ANOVA? Now pool the two similar populations Y Y Y YYYY YY YY Y Y Y Y The Blue symbol represents the sum of squared distances of the total sample to the total sample mean The is the Total Sum of Squares, or TSS

Copyright (c) Bani K. Mallick29 Why do they call it ANOVA? Now pool the two similar populations Y Y Y YYYY YY YY Y Y Y Y The Blue symbol represents the sum of squared distances of the total sample to the total sample mean The is the Total Sum of Squares, or TSS Note how the SSW and the TSS are about the same: when this happens, it indicates equal means for the populations

Copyright (c) Bani K. Mallick30 Why do they call it ANOVA? Now note what happens if the population means are different Y Y Y Y Y Y Y YYYY Y Y Y Y Note how the TSS has greatly increased Note how the SSW are the same as before (remember, we add the squared distances separately for the two populations It is this phenomenon that the F-test measures

Copyright (c) Bani K. Mallick31 ANOVA The F-statistic is compared to the F- distribution with t-1 and degrees of freedom. See Table 8,which lists the cutoff points in terms of . If the F-statistic exceeds the cutoff, you reject the hypothesis of equality of all the means. SPSS gives you the p-value (significance level) for this test

Copyright (c) Bani K. Mallick32 ANOVA I have used the language of the book up to this point, which coincides with the SPSS 1- way ANOVA output. For our purposes, the general linear model is more useful. It allows us to compare all the populations simultaneously It also allows us to check assumptions about normality Unfortunately, it uses slightly different terminology

Copyright (c) Bani K. Mallick33 ANOVA For our purposes, the general linear model is more useful. ANOVA Procedure General Linear Model Procedure: note how F-statistics and p-values are identical

Copyright (c) Bani K. Mallick34 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ a Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Between Groups Corrected model or variable name (BMIGROUP)

Copyright (c) Bani K. Mallick35 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ a Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Within Groups Error

Copyright (c) Bani K. Mallick36 ANOVA For our purposes, the general linear model is more useful. ANOVA Baseline FFQ Between Groups Within Groups Total Sum of SquaresdfMean SquareFSig. Tests of Between-Subjects Effects Dependent Variable: Baseline FFQ a Source Corrected Model Intercept BMIGROUP Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.059 (Adjusted R Squared =.049) a. Total Corrected Total

Copyright (c) Bani K. Mallick37 ANOVA If the populations have a common variance  2, the Mean squared error estimates it For 2-populations, the mean squared error equals the square of the pooled sd: Hence, estimated common s.d.

Copyright (c) Bani K. Mallick38 ANOVA There appears to be some difference 95% CI for difference in population means between high and low BMI groups is 2.26 to % CI for difference in population means between medium and low BMI groups is from to 4.31 High & Medium: 1.08 to 8.65 Conclusions?

Copyright (c) Bani K. Mallick39 ANOVA The ANOVA Table (demo in SPSS in class of ACS data)

Copyright (c) Bani K. Mallick40 ANOVA The ANOVA Table Number of populations is t=3, so degrees of freedom for the model (BMIGROUP) is t-1 = 2

Copyright (c) Bani K. Mallick41 ANOVA The ANOVA Table Total sample size is 184, so the degrees of freedom for the corrected total is = 183

Copyright (c) Bani K. Mallick42 ANOVA The mean square between groups is The mean square within groups is The F-statistic is the ratio: 5.689

Copyright (c) Bani K. Mallick43 ANOVA The p-value is 0.004: what does this mean? What was the null hypothesis?

Copyright (c) Bani K. Mallick44 ANOVA The p-value is 0.004: what does this mean? What was the null hypothesis? That the populations means are all equal

Copyright (c) Bani K. Mallick45 ANOVA The p-value is 0.004: what does this mean? We have more than 99% confidence that the null hypothesis is false What was the null hypothesis? That the populations means are all equal