Multiple Comparisons Q560: Experimental Methods in Cognitive Science Lecture 10.

Slides:

Advertisements

Similar presentations

Intro to ANOVA.

Advertisements

Week 2 – PART III POST-HOC TESTS. POST HOC TESTS When we get a significant F test result in an ANOVA test for a main effect of a factor with more than.

Lesson #24 Multiple Comparisons. When doing ANOVA, suppose we reject H 0 :  1 =  2 =  3 = … =  k Next, we want to know which means differ. This does.

One-Way BG ANOVA Andrew Ainsworth Psy 420. Topics Analysis with more than 2 levels Deviation, Computation, Regression, Unequal Samples Specific Comparisons.

Chapter 10 Analysis of Variance (ANOVA) Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.

Analysis of Variance (ANOVA) Statistics for the Social Sciences Psychology 340 Spring 2010.

POST HOC COMPARISONS A significant F in ANOVA tells you only that there is a difference among the groups, not which groups are different. Post hoc tests.

More on ANOVA. Overview ANOVA as Regression Comparison Methods.

Independent Sample T-test Formula

Lecture 10 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D

Analysis of Variance: Inferences about 2 or More Means

Comparing Means.

Statistics for the Social Sciences Psychology 340 Spring 2005 Analysis of Variance (ANOVA)

One-way Between Groups Analysis of Variance

Comparing Means.

Running Fisher’s LSD Multiple Comparison Test in Excel

If = 10 and = 0.05 per experiment = 0.5 Type I Error Rates I.Per Comparison II.Per Experiment (frequency) = error rate of any comparison = # of comparisons.

Analysis of Variance (ANOVA) Quantitative Methods in HPELS 440:210.

ANOVA Chapter 12.

Comparisons among groups within ANOVA

1 Multiple Comparison Procedures Once we reject H 0 :   =   =...  c in favor of H 1 : NOT all  ’s are equal, we don’t yet know the way in which.

ANOVA Greg C Elvers.

Comparing Means. Anova F-test can be used to determine whether the expected responses at the t levels of an experimental factor differ from each other.

Running Scheffe’s Multiple Comparison Test in Excel

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.

Psy 230 Jeopardy Related Samples t-test ANOVA shorthand ANOVA concepts Post hoc testsSurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.

One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.

Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)

Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.

Oneway/Randomized Block Designs Q560: Experimental Methods in Cognitive Science Lecture 8.

Stats/Methods II JEOPARDY. Jeopardy Estimation ANOVA shorthand ANOVA concepts Post hoc testsSurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400.

Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.

Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.

Inferential Statistics Psych 231: Research Methods in Psychology.

Posthoc Comparisons finding the differences. Statistical Significance What does a statistically significant F statistic, in a Oneway ANOVA, tell us? What.

Six Easy Steps for an ANOVA 1) State the hypothesis 2) Find the F-critical value 3) Calculate the F-value 4) Decision 5) Create the summary table 6) Put.

Statistical Significance

Step 1: Specify a null hypothesis

Everyday is a new beginning in life.

Week 2 – PART III POST-HOC TESTS.

Analysis of Variance (ANOVA)

Comparing several means: ANOVA (GLM 1)

Inferential Statistics

Comparing Three or More Means

Hypothesis testing using contrasts

Where we are Where we are going

Psych 706: Stats II Class #2.

Planned Comparisons & Post Hoc Tests

Differences Among Group Means: One-Way Analysis of Variance

What if. . . You were asked to determine if psychology and sociology majors have significantly different class attendance (i.e., the number of days a person.

Linear Contrasts and Multiple Comparisons (§ 8.6)

Comparing Several Means: ANOVA

Statistics for the Social Sciences

I. Statistical Tests: Why do we use them? What do they involve?

F-tests Testing hypotheses.

Chapter 12 A Priori and Post Hoc Comparisons Multiple t-tests

Chapter 12 A Priori and Post Hoc Comparisons Multiple t-tests

Chapter 14 Homework: problem 1

Comparing Means.

Psych 231: Research Methods in Psychology

Statistics for the Social Sciences

Psych 231: Research Methods in Psychology

Psych 231: Research Methods in Psychology

Conceptual Understanding

SPSS SPSS Problem (Part 1). SPSS SPSS Problem (Part 1)

Psych 231: Research Methods in Psychology

Presentation transcript:

Multiple Comparisons Q560: Experimental Methods in Cognitive Science Lecture 10

The problem with t-tests… We could compare three groups with multiple t-tests: M1 vs. M2, M1 vs. M3, M2 vs. M3 But this causes our chance of a type I error (alpha) to compound with each test we do: 1 - (1 - p)n Testwise Error: Probability of a type I error on any one statistical test Experimentwise Error: Probability of a type I error over all statistical tests in an experiment ANOVA keeps our experimentwise error = alpha

Pairwise Comparisons Overall significance test is called Omibus ANOVA When rejecting H0 in an ANOVA test, we just know there is a difference somewhere…we need to do some detective work to find it Testwise error is p(Type I error) on any one test Experimentwise error is p(Type I error) over a series of separate hypothesis tests We have to make sure we do not exceed a .05 chance of a Type I error while we “investigate”

ANOVA CSI Detective At least one pair of means is “guilty” of causing the overall omnibus variance Omnibus ANOVA tells you a variance crime has been committed. Your job is to further investigate to figure out where the abnormal variance is coming from:

“Even a stopped clock is right twice a day” Pairwise Comparisons Planned comparisons (a priori tests) are based on theory and are planned before the data are collected Unplanned comparisons (post-hoc tests) are “fishing expeditions” after data have been observed Planned comparisons are preferred b/c they have a much smaller chance of making a Type I error “Even a stopped clock is right twice a day”

Post-Hoc Tests Exploring data after H0 has been rejected Specific tests to control experimentwise error Simplest possible is to follow up with t-tests using a Bonferroni correction to alpha (Dunn test) Alpha is divided across number of comparisons Paired t-test (RM Anova) or Independent t-tests (Oneway ANOVA)

Post-Hoc Tests Other post-hoc tests have specific methods for controlling experiment-wise error. There are over a dozen unique tests; we’ll just look at three Tukey’s HSD: Where q is the studentized range statistic You look up the value for q in table using k (# of treatment groups) and dfwithin HSD is the minimal mean difference for significance

Placebo Drug A Drug B Drug C Tukey’s HSD Data for three drugs designed to act as pain relievers: Placebo Drug A Drug B Drug C

Tukey’s HSD For our drug experiment: k = 4, n = 5, dfW = 16, MSW = 2.00 We look up q from table and get q = 4.05 Mean differences greater than 2.56 are significant Our means were: 1, 2, 4, 5 from the drug study So: 1 < 4, 5 and 2 < 5; all other means are statistically equal

Scheffe’s Test Scheffe’s test is considered one of the most conservative (cautious) tests. It uses the MS between the two treatments you are comparing, but uses the MS error from the omnibus ANOVA and k-1 as the numerator df So the critical value for a Scheffe F-ratio is the same as it was for the omnibus ANOVA Two treatments you are comparing

Even though you’re only comparing two means, they were selected from k overall means, so k is used to determine the df. From the drug experiment, let’s compare Placebo to Drug C (M=1 vs. M=5) Fcrit(3,16) = 3.24  Placebo < Drug C

Planned vs. Unplanned Comparisons Consider an experiment w/ 5 groups: we have 10 possible pairwise comparisons (1 vs. 2, 1 vs. 3, etc) Assume that the null is true, but by chance two of the means are far enough apart to erroneously reject (ie., the data contain a type I error) If you plan a single comparison beforehand, you have a 1/10 chance of selecting the one comparison that happened to have a type I error

Planned vs. Unplanned Comparisons If you look at the data first, you are certain to make a type I error You are doing all comparisons in your head, even though you only to the math for one If you plan the comparisons beforehand (and they are a subset of all comparisons) p(Type I) is much lower than if you snoop at the data first.

I’m only interested in Control < 2 and 3 Some of these look redundant to me I’m only interested in Control < 2 and 3

“The complete null” We don’t always care I’m only interested in Control < 2 and 3

Since the two comparisons I need to do are a priori and independent, I don’t inflate the FW error rate This really just involves picking contrast coefficients E.g., my active-passive category learning expt

Planned Comparisons Planned comparisons (a priori tests) are based on theory and are planned before the data are collected If comparisons are planned in advance, the likelihood of making a Type I error is smaller than if the comparisons were made on a post-hoc basis (because we guess at a subset of hypotheses) If you are making all pairwise comparisons, it won’t make a difference whether the comparisons were planned in advance or not.

Orthogonal Linear Contrasts We can define a linear combination of weighted means for a particular hypothesis We set the condition that as a linear contrast. So: We also want our contrasts to be orthogonal, that is, they don’t contain overlapping amounts of information

We also want our contrasts to be orthogonal, that is, they don’t contain overlapping amounts of information Knowing that M1 is greater than the mean of M2 and M3 does tell us that M1 is likely greater than M3 When members of a set of contrasts are independent of each other, they are called orthogonal. Conditions for orthogonality: where a and b are weights for diff contrasts # of comparisons = # of df for treatments

(1, 2, 3, 4, 5) Coefficients (1, 2) vs. (3, 4, 5) 3 -2 1 -1 2 (1) vs. (2) (3) vs. (4, 5) (4) vs. (5) We start at the trunk, once we have formed two branches, we never compare groups on the same limb to those on a different limb

(1, 2, 3, 4, 5) Coefficients (1, 2) vs. (3, 4, 5) 3 -2 1 -1 2 (1) vs. (2) (3) vs. (4, 5) (4) vs. (5)

Trend Analysis Often, we are not interested in differences between groups, per se, but rather the overall trend across groups (especially in RM ANOVA) Linear, Quadratic, Cubic, etc. Linear: -3 -1 1 3 Quad: 1 -1 -1 1 Cubic: -1 3 -3 1 Tables of orthogonal polynomial coefficients or (fortunately) SPSS has them built in for us

Multiple Comparisons w/ SPSS Let’s use the category learning experiment to test our post-hoc and a priori hypotheses in SPSS Category Learning: If information sampled is the important factor to learning, then we expect the two “intelligent” sampling conditions to outperform the “random” sampling condition.

Experiment 1 Condition Exploration Exemplar Sampling Random Passive Uniform Generate Active “Intelligent” Yoked 480 Training trials (feedback), followed by 480 test trials (no feedback)