Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Sociology 5811: T-Tests for Difference in Means
Advertisements

Analysis and Interpretation Inferential Statistics ANOVA
Independent Sample T-test Formula
Using Statistics in Research Psych 231: Research Methods in Psychology.
Experimental Design & Analysis
Statistics Are Fun! Analysis of Variance
PSY 307 – Statistics for the Behavioral Sciences
ANOVA Single Factor Models Single Factor Models. ANOVA ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.
Anthony J Greene1 ANOVA: Analysis of Variance 1-way ANOVA.
Inferences About Process Quality
Statistics for the Social Sciences
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Introduction to Analysis of Variance (ANOVA)
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Inferential Statistics
Copyright © 2005 by Evan Schofer
Copyright © 2005 by Evan Schofer
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
The basic idea So far, we have been comparing two samples
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Linear Regression Inference
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Comparing Two Population Means
Comparing Means: t-tests Wednesday 22 February 2012/ Thursday 23 February 2012.
Analysis of Variance ( ANOVA )
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
Sociology 5811: Lecture 9: CI / Hypothesis Tests Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 10: Hypothesis Tests Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 9: Testing Hypotheses
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Sociology 5811: Lecture 14: ANOVA 2
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Stats Lunch: Day 4 Intro to the General Linear Model and Its Many, Many Wonders, Including: T-Tests.
Testing Hypotheses about Differences among Several Means.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Statistics for the Social Sciences Psychology 340 Fall 2012 Analysis of Variance (ANOVA)
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 8 Parameter Estimates and Hypothesis Testing.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
T tests comparing two means t tests comparing two means.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Inferential Statistics Psych 231: Research Methods in Psychology.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
i) Two way ANOVA without replication
What are their purposes? What kinds?
Presentation transcript:

Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Midterm in one week Bring a Calculator Bring Pencil/Eraser Mid-term Review Sheet handed out today New topic today: ANOVA

5811 Midterm Exam Exam Topics: All class material and readings up through ANOVA Emphasis: conceptual understanding, interpretation Memorization of complex formulas not required I will provide a “formula sheet”… But, formulas won’t be labeled! Exam Format: Mix of short-answer and longer questions Mix of math problems and conceptual questions.

Review: Mean Difference Tests For any two means, the difference will also fall in a certain range. Example: Group 1 means range from 6.0 to 8.0 Group 2 means range from 1.0 to 2.0 Difference in means will range from 4.0 to 7.0 If it is improbable that the sampling distribution overlaps with zero, then the population means probably differ A corollary of the C.L.T provides formulas to estimate standard error of difference in means

Z-tests and T-tests If N = large, we can do a Z-test: This Z-score for differences in means indicates: How far the difference in means falls from zero (measured in “standard errors”) –If Z is large, we typically reject the null hypothesis… group means probably differ.

Z-tests and T-tests If N = small, but samples are normal, with equal variance, we can do a t-test: Small N requires a different formula to determine the standard error of difference in means Again: Large t = reject null hypothesis

T-Test for Mean Difference Question: What if you wanted to compare 3 or more groups, instead of just two? Example: Test scores for students in different educational tracks: honors, regular, remedial Can you use T-tests for 3+ groups? Answer: Sort of… You can do a T-test for every combination of groups e.g., honors & reg, honors & remedial, reg & remedial But, the possibility of a Type I error proliferates… 5% for each test. With 5 groups, chance of error reaches 50%.

ANOVA ANOVA = “ANalysis Of VAriance” “Oneway ANOVA” : The simplest form ANOVA lets us test to see if any group mean differs from the mean of all groups combined Answers: “Are all groups equal or not?” H0: All groups have the same population mean  1 =  2 =  3 =  4 H1: One or more groups differ But, doesn’t distinguish which specific group(s) differ Maybe only  2 differs, or maybe all differ.

ANOVA and T-Tests ANOVA and T-Test are similar Many sociological research problems can be addressed by either of them But, they rely on very different mathematical approaches If you want to compare two groups, both work If there are many groups, people usually use ANOVA Also, there are more advanced forms of ANOVA that are very useful.

ANOVA: Example Suppose you suspect that a firm is engaging in wage discrimination based on ethnicity Certain groups might be getting paid more or less… The company counters: “We pay entry-level workers all about the same amount of money. No group gets preferential treatment.” Given data on a sample of employees, ANOVA lets you test this hypothesis. Are observed group differences just due to chance? Or do they reflect differences in the underlying population? (i.e., the whole company)

ANOVA: Example The company has workers of three ethnic groups: Whites, African-Americans, Asian-Americans You observe: Y-bar White = $8.78 / hour Y-bar AfAm = $8.52 / hour Y-bar AsianAm = $8.91 / hour Even if all groups had the same population mean (  White =  AfAm =  AsianAm), samples differ randomly Question: Are the observed differences so large it is unlikely that they are due to random error? Thus, it is unlikely that:  White =  AfAm =  AsianAm

ANOVA: Concepts & Definitions The grand mean is the mean of all groups ex: mean of all entry-level workers = $8.75/hour The group mean is the mean of a particular sub- group of the population As usual, we hope to make inferences about population grand and group means, even though we only have samples and observed grand and group means We know Y-bar, Y-bar White, Y-bar AfAm,Y-bar AsianAm We want to infer about:  White,  AfAm,  AsianAm

ANOVA: Concepts & Definitions Hourly wage is the dependent variable We are looking to see if wage “depends” upon the particular group a person is in The effect of a group is the difference between that group’s mean from the grand mean Effect is denoted by alpha (  ) If  $  White = $8.90, then  White = $0.15 Effect of being in group j is: Calculated for samples as: It is like a deviation, but for a group

ANOVA: Concepts & Definitions ANOVA is based on partitioning deviation We initially calculated deviation as the distance of a point from the grand mean: But, you can also think of deviation from a group mean (called “e”): Or, for any case i in group j: Thus, the deviation (from group mean) of the 27 th person in group 4 is:

ANOVA: Concepts & Definitions The location of any case is determined by: The Grand Mean, , common to all cases The group “effect” , common to group members The distance between a group and the grand mean The within-group deviation (e): called “error” The distance from group mean to an case’s value

The ANOVA Model This is the basis for a formal model: For any population with mean  Comprised of J subgroups, N j in each group Each with a group effect  The location of any individual can be expressed as follows: Y ij refers to the value of case i in group j e ij refers to the “error” (i.e., deviation from group mean) for case i in group j

Sum of Squared Deviation We are most interested in two parts of the model: The group effects:  j Deviation of the group from the grand mean Individual case error: e ij Deviation of the individual from the group mean Each are deviations that can be “summed up” Remember, we square deviations when summing Otherwise, they add up to zero Remember variance is just squared deviation.

Sum of Squared Deviation The total deviation can partitioned into  j and e ij components: That is,  j + e ij = total deviation:

Sum of Squared Deviation The total deviation can partitioned into  j and e ij components: The total variance ( SS total ) is made up of: –  j  : between group variance ( SS between ) – e ij : within group variance ( SS within ) – SS total = SS between + SS within

Sum of Squared Deviation Given a sample with J sub-groups: Formula for the squared deviation can be re- written as follows: This is called the “Total Sum of Squares” (SS total )

Sum of Squared Deviation The between group (  ) variance is the distance from the grand mean to each group mean (summed for all cases): The within group variance (e) is the distance from each case to its group mean (summed):

Sum of Squared Variance The sum of squares grows as N gets larger. To derive a more comparable measure, we “average” it, just as with the variance: i.e, (divide) by N-1 It is desirable, for similar reasons, to “average” the Sum of Squares between/within Result the “Mean Square” variance –MS between and MS within

Sum of Squared Variance Choosing relevant denominators we get:

Mean Squares and Group Differences Question: Which suggests that group means are quite different? –MS between > MS within or MS between < MS within

Mean Squares and Group Differences MS between > MS within : MS between < MS within :

Mean Squares and Group Differences Question: Which suggests that group means are quite different: –MS between > MS within or MS between < MS within Answer: If between group variance is greater than within, the groups are quite distinct It is unlikely that they came from a population with the same mean But, if within is greater than between, the groups aren’t very different – they overlap a lot It is plausible that  1 =  2 =  3 =  4

The F Ratio The ratio of MS between to MS within is referred to as the F ratio: If MS between > MS within then F > 1 If MS between < MS within then F < 1 Higher F indicates that groups are more separate

The F Ratio The F ratio has a sampling distribution That is, estimates of F vary depending on exactly which sample you draw Again, this sampling distribution has known properties that can be looked up in a table The “F-distribution” –Different from z & t! Statisticians have determined how much area falls under the curve for a given value of F… So, we can test hypotheses.

The F Ratio Assumptions required for hypothesis testing using an F-statistic 1. J groups are drawn from a normally distributed population 2. Population variances of groups are equal If these assumptions hold, the F statistic can be looked up in an F-distribution table Much like T distributions –But, there are 2 degrees of freedom: J-1 and N-J One for number of groups, one for N

The F Ratio Example: Looking for wage discrimination within a firm The company has workers of three ethnic groups: Whites, African-Americans, Asian-Americans You observe in a sample of 200 employees: Y-bar White = $8.78 / hour Y-bar AfAm = $8.52 / hour Y-bar AsianAm = $8.91 / hour

The F Ratio Suppose you calculate the following from your sample: F = 6.24 Recall that N = 200, J = 3 Degrees of Freedom: J-1 = 2, N-J = 197 If  =.05, the critical F value for 2, 197 is 3.00 See Knoke, p. 514 The observed F easily exceeds the critical value Thus, we can reject H0; we can conclude that the groups do not all have the same population mean

Comparison with T-Test T-test strategy: Determine the sampling distribution of the mean… Use that info to assess probability that groups have same mean (difference in means = 0) ANOVA strategy Compute F-ratio, which indicates what kind of deviation is larger: “between” vs. “within” group High F-value indicates groups are separate Note: For two groups, ANOVA and T-test produce identical results.

Bivariate Analyses Up until now, we have focused on a single variable: Y Even in T-test for difference in means & ANOVA, we just talked about Y – but for multiple groups… Alternately, we can think of these as simple bivariate analyses Where group type is a “variable” –Ex: Seeing if girls differ from boys on a test … is equivalent to examining whether gender (a first variable) affects test score (a second variable).

2 Groups = Bivariate Analysis Group 1: Boys CaseScore CaseGenderScore Group 2: Girls CaseScore Groups = Bivariate analysis of Gender and Test Score

T-test, ANOVA, and Regression Both T-test and ANOVA illustrate fundamental concepts needed to understand “Regression” Relevant ANOVA concepts The idea of a “model” Partitioning variance A dependent variable Relevant T-test concepts Using the t-distribution for hypothesis tests Note: For many applications, regression will supersede T-test, ANOVA But in some cases, they are still useful…