Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.

Slides:

Advertisements

Similar presentations

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA

Advertisements

Inferential Statistics and t - tests

Kxk BG Factorial Designs expanding the 2x2 design reasons for larger designs statistical analysis of kxk BG factorial designs using LSD for kxk factorial.

Chapter 12 A Priori and Post Hoc Comparisons Multiple t-tests Linear Contrasts Orthogonal Contrasts Trend Analysis Bonferroni t Fisher Least Significance.

Topic 12 – Further Topics in ANOVA

ANOVA & Pairwise Comparisons

Analytic Comparisons & Trend Analyses Analytic Comparisons –Simple comparisons –Complex comparisons –Trend Analyses Errors & Confusions when interpreting.

Smith/Davis (c) 2005 Prentice Hall Chapter Thirteen Inferential Tests of Significance II: Analyzing and Interpreting Experiments with More than Two Groups.

C82MST Statistical Methods 2 - Lecture 4 1 Overview of Lecture Last Week Per comparison and familywise error Post hoc comparisons Testing the assumptions.

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

POST HOC COMPARISONS A significant F in ANOVA tells you only that there is a difference among the groups, not which groups are different. Post hoc tests.

Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.

Lecture 10 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D

Regression Models w/ k-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.

Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.

Comparing Means.

Analyses of K-Group Designs : Analytic Comparisons & Trend Analyses Analytic Comparisons –Simple comparisons –Complex comparisons –Trend Analyses Errors.

Statistical Analysis of Factorial Designs z Review of Interactions z Kinds of Factorial Designs z Causal Interpretability of Factorial Designs z The F-tests.

Lecture 9: One Way ANOVA Between Subjects

Two Groups Too Many? Try Analysis of Variance (ANOVA)

Multivariate Analyses & Programmatic Research Re-introduction to Programmatic research Factorial designs  “It Depends” Examples of Factorial Designs Selecting.

Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.

Introduction to Multivariate Research & Factorial Designs

Analyses of K-Group Designs : Omnibus F & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation.

Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:

One-way Between Groups Analysis of Variance

Some Details about Bivariate Stats Tests Conceptualizing the four stats tests Conceptualizing NHST, critical values and p-values NHST and Testing RH: Distinguishing.

Review of Factorial Designs 5 terms necessary to understand factorial designs 5 patterns of factorial results for a 2x2 factorial designs Descriptive &

Design Conditions & Variables Limitations of 2-group designs “Kinds” of Treatment & Control conditions Kinds of Causal Hypotheses Explicating Design Variables.

Parametric & Nonparametric Models for Within-Groups Comparisons overview X 2 tests parametric & nonparametric stats Mann-Whitney U-test Kruskal-Wallis.

K-group ANOVA & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures.

Today Concepts underlying inferential statistics

Multiple-Group Research Designs Limitations of 2-group designs “Kinds” of Treatment & Control conditions Kinds of Causal Hypotheses k-group ANOVA & Pairwise.

2x2 BG Factorial Designs Definition and advantage of factorial research designs 5 terms necessary to understand factorial designs 5 patterns of factorial.

Effect Sizes, Power Analysis and Statistical Decisions Effect sizes -- what and why?? review of statistical decisions and statistical decision errors statistical.

Chapter 14 Inferential Data Analysis

Richard M. Jacobs, OSA, Ph.D.

Analysis of Factorial Designs Statistical Analysis of 2x2 Designs Statistical Analysis of kxk Designs.

Hypothesis Testing:.

1 Multiple Comparison Procedures Once we reject H 0 :   =   =...  c in favor of H 1 : NOT all  ’s are equal, we don’t yet know the way in which.

Analyses of K-Group Designs : Omnibus F & Follow-up Analyses ANOVA for multiple condition designs Pairwise comparisons, alpha inflation & correction Alpha.

Stats Lunch: Day 7 One-Way ANOVA. Basic Steps of Calculating an ANOVA M = 3 M = 6 M = 10 Remember, there are 2 ways to estimate pop. variance in ANOVA:

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Coding Multiple Category Variables for Inclusion in Multiple Regression More kinds of predictors for our multiple regression models Some review of interpreting.

Testing Hypotheses about Differences among Several Means.

Intro to Factorial Designs The importance of “conditional” & non-additive effects The structure, variables and effects of a factorial design 5 terms necessary.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.

Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, October 15, 2013 Analysis of Variance (ANOVA)

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

Regression Models w/ 2 Categorical Variables Sources of data for this model Variations of this model Definition and advantage of factorial research designs.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Advanced Meta-Analyses Heterogeneity Analyses Fixed & Random Efffects models Single-variable Fixed Effects Model – “Q” Wilson Macro for Fixed Efffects.

Basic Analysis of Factorial Designs The F-tests of a Factorial ANOVA Using LSD to describe the pattern of an interaction.

Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.

Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.

Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.

Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance

Psy 230 Jeopardy Related Samples t-test ANOVA shorthand ANOVA concepts Post hoc testsSurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.

Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)

Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.

Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.Introduction To Basic Ratios 3.Basic Ratios In Excel 4.Cumulative.

Effect Sizes & Power Analyses for k-group Designs Effect Size Estimates for k-group ANOVA designs Power Analysis for k-group ANOVA designs Effect Size.

Posthoc Comparisons finding the differences. Statistical Significance What does a statistically significant F statistic, in a Oneway ANOVA, tell us? What.

Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF.

Multiple Comparisons Q560: Experimental Methods in Cognitive Science Lecture 10.

I. Statistical Tests: Why do we use them? What do they involve?

Presentation transcript:

Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group X² Power Analysis for k-group X² gof-X 2 & RH Testing Alpha inflation Power Analyses

ANOVA vs. X² n Same as before –ANOVA – BG design and a quantitative DV –X² -- BG design and a qualitative/categorical DV While quantitative outcome variables have long been more common in psychology, there has been an increase in the use of qualitative variables during the last several years. n improvement vs. no improvement n diagnostic category n preference, choice, selection, etc.

For example… I created a new treatment for social anxiety that uses a combination of group therapy (requiring clients to get used to talking with other folks) and cognitive self-appraisal (getting clients to notice when they are and are not socially anxious). Volunteer participants were randomly assigned to the treatment condition or a no-treatment control. I personally conducted all the treatment conditions to assure treatment integrity. Here are my results using a DV that measures whether or not the participants was “socially comfortable” in a large-group situation Cx X² (1) = 9.882, p = “Here is evidence that the combination of group therapy & cognitive self- appraisal increases “social comfort.” ??? “ You can see that the treatment works because of the cognitive self-appraisal; the group therapy doesn’t really contribute anything.” Which of the following statements will these results support? Yep -- treatment comparison causal statement Nope -- identification of causal element statement & we can’t separate the role of group therapy & self-appraisal Group therapy & self-appraisal Comfortable Not comfortable 45 25

Same story... I created a new treatment for social anxiety that uses a combination of group therapy (requiring clients to get used to talking with other folks) and cognitive self-appraisal (getting clients to notice when they are and are not socially anxious). Volunteer participants were randomly assigned to the treatment condition or a no-treatment control. I personally conducted all the treatment conditions to assure treatment integrity. What conditions would we need to add to the design to directly test the second of these causal hypotheses... The treatment works because of the cognitive self-appraisal; the group therapy doesn’t really contribute anything.” No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal

No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal Let’s keep going … Here’s the design we decided upon. Assuming the results from the earlier study replicate, we’d expect to get the means shown below. What responses for the other two conditions would provide support for the RH: The treatment works because of the cognitive self-appraisal; the group therapy doesn’t really contribute anything.”

Omnibus X² vs. Pairwise Comparisons n Omnibus X² –overall test of whether there are any response pattern differences among the multiple IV conditions –Tests H0: that all the response patterns are equal n Pairwise Comparison X² –specific tests of whether or not each pair of IV conditions has a response pattern difference n How many Pairwise comparisons ?? –Formula, with k = # IV conditions # pairwise comparisons = [k * (k-1)] / 2 –or just remember a few of them that are common 3 groups = 3 pairwise comparisons 4 groups = 6 pairwise comparisons 5 groups = 10 pairwise comparisons

Pairwise Comparisons for X² Using the Effect Size Computator, just plug in the cell frequencies for any 2x2 portion of the k-group design There is a mini critical- value table included, to allow H0: testing and p- value estimation It also calculates the effect size of the pairwise comparison, more later…

Example of pairwise analysis of a multiple IV condition design X²(2)= 7.641, p =.034 Tx1 Tx2 Cx Comfortable Not comfortable Tx1 Tx C ~C Tx2 Cx C ~C Tx1 Cx C ~C X²(1)=.388, p>.05X²(1)=4.375, p<.05X²(1)=6.549, p<.05 Retain H0: Tx1 = Tx2 Reject H0: Tx1 > Cx Reject H0: Tx2 > Cx

The RH: was, “In terms of the % who show improvement, immediate feedback (IF) is the best, with delayed feedback (DF) doing no better than the no feedback (NF) control.” What to do when you have a RH: Determine the pairwise comparisons, how the RH applied to each … IF DF IF NF DF NF > > = IF DF NF Improve Not improve X²(2)= , p<.001 Run the omnibus X² -- is there a relationship ?

Perform the pairwise X² analyses Determine what part(s) of the RH were supported by the pairwise comparisons … RH: IF > DF IF > NF DF = NF well ? supported not supported not supported We would conclude that the RH: was partially supported ! IF DF i ~i DF NF i ~i IF NF i ~i X²(1)=22.384, p<.001 X²(1)=3.324, p>.05 Reject H0: IF > DF Retain H0: IF = NFReject H0: DF < NF X²(1)=9.137, p<.005

The RH: was, “In terms of the % who show improvement, those receiving feedback will do better than those receiving the no feedback (NF) control.” Remember that pairwise comparisons are the same thing as simple analytic comparisons. It is also possible to perform complex comparisons with X 2 IF DF NF Improve Not improve FB NF i ~i Reject H0: DF = NF X²(1)=.661, p>.05 As with ANOVA, complex comparisons can be misleading if interpreted improperly  we would not want to say that “both types of feedback are equivalent to no feedback”  that statement is false based on the pairwise comparisons.

Alpha Inflation n Increasing chance of making a Type I error the more pairwise comparisons that are conducted Alpha correction n adjusting the set of tests of pairwise differences to “correct for” alpha inflation n so that the overall chance of committing a Type I error is held at 5%, no matter how many pairwise comparisons are made There is no equivalent to HSD for X² follow-ups n We can “Bonferronize” p =.05 / #comps to hold the experiment-wise Type I error rate to 5% –2 comps  X 2 (1,.025) = 5.02 –3 comps  X 2 (1,.0167) = 5.73 –4 comps  X 2 (1,.0125) = 6.24 –5 comps  X 2 (1,.01) = 6.63 n As with ANOVA  when you use a more conservative approach you can find a significant omnibus effect but not find anything to be significant when doing the follow-ups!

k-group Effect Sizes When you have more than 2 groups, it is possible to compute the effect size for “the whole study”. Include the X², the total N and click the button for df > 1 However, this type of effect size is not very helpful, because: -- you don’t know which pairwise comparison(s) make up the r -- it can only be compared to other designs with exactly the same combination of conditions

Pairwise Effect Sizes Just as RH: for k-group designs involve comparing 2 groups at a time (pairwise comparisons)… The most useful effect sizes for k-group designs are computed as the effect size for 2 groups (effect sizes for pairwise comparisons) The effect size computator calculates the effect size for each pairwise X² it computes

k-group Power Analyses As before, there are two kinds of power analyses;;; A priori power analyses conducted before the study is begun start with r & desired power to determine the needed N Post hoc power analysis conducted after retaining H0: start with r & N and determine power & Type II probability

Power Analyses for k -group designs Important Symbols S is the total # of participants in that pairwise comp n = S / 2 is the # of participants in each condition of that pairwise comparison N = n * k is the total number or participants in the study Example the smallest pairwise X² effect size for a 3-BG study was.25 with r =.25 and 80% power S = 120 for each of the 2 conditions n = S / 2 = 120 / 2 = 60 for the whole study N = n * k = 60 * 60 = 180

As X 2 designs get larger, the required 2x2 follow-up analyses can get out of hand pretty quickly. For example … No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal Outcome Improve Stay same Get worse This would require 18 2x2 comparisons: 6 each for pairwise comparisons among the 4 IV conditions for each of improve/same, same/worse and improve/worse. The maximum experiment wise alpha would be 18*.05 or a 90% chance of making at least one Type I error. To correct for this we’d need to use a p-value of.05/18 =.003 for each of the 18 comparisons Which, in turn, greatly increases the chances of making Type II errors

Another approach to analyzing larger designs is to use gof-X 2 to describe response patterns of each condition or to test RH: that are phrased in terms of the response pattern predictions. No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal Outcome Improve Stay same Get worse For this design we would run 4 gof X 2 analyses. As with the 2x2, there is no equivalent to HSD for X² follow-ups One approach is to use p=.01 for each pairwise comparison, reducing the alpha inflation Another is to “Bonferronize” p =.05 / #comps to hold the experiment-wise Type I error rate to 5%

The RH: for this study was that: The treatment works because of the cognitive self-appraisal; the group therapy doesn’t really contribute anything.” No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal Outcome Improve Stay same Get worse Based on this we would expect that both the combined and self- appraisal conditions would have more “improve” than “stay same” or “get worse”. We would also expect a “flat” response profile for both the no-treatment and group therapy conditions.

Enter the expected frequencies (usually representing equiprobability) Enter the cell frequencies Be sure to click the blue compute button For the Group Therapy & Self-Appraisal condition… to perform the gof-X2 we need the expected frequency for the equiprobability H0: with n=60 and equiprobability, the expected frequency for each condition is 2 With df=2 (k-1)  X 2 (.01) = 9.21 and so, p <.01 We’d conclude that this condition does not have equiprobability and that the response pattern matches the RH:

Here are the results of the follow-up analyses… No-treatment control Group therapy & self-appraisal Group therapy Self- appraisal Outcome Improve Stay same Get worse X²(2)=47.5, p<.001 X²(2)=.71, p>.05 X²(2)=.37, p>.05 X²(2)=36.11, p<.001 We would conclude that there is complete support for the RH: that  The treatment works because of the cognitive self- appraisal; the group therapy doesn’t really contribute anything.”

There are a couple of problems with X 2 follow-ups that you should consider… The follow-up analyses – both the 2x2 and the gof – have substantially less power than the onimibus test So, it is possible to find a “significant overall effect that isn’t anywhere” The likelihood of this increases if you use alpha correction Neither the 2x2 nor the gof analyses are really “complete” both analyses tell you that there is a pattern, but not what the pattern is some recommend using 2-cell gof analyses to identify the specific location of the pattern – others point out the enormous alpha inflation or alpha correction involved… for the example, each of the 18 2x2 follow-ups that is significant would require 2 additional 2-cell gof  as many as follow-up analyses for a 3x4 design!!!