Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Randomization Techniques for Single-Case Intervention Data and Introduction to ExPRT Statistical Software Joel R. Levin University of.

Similar presentations


Presentation on theme: "Statistical Randomization Techniques for Single-Case Intervention Data and Introduction to ExPRT Statistical Software Joel R. Levin University of."— Presentation transcript:

1 Statistical Randomization Techniques for Single-Case Intervention Data and Introduction to ExPRT Statistical Software Joel R. Levin University of Arizona

2 Purpose of This Presentation (Dejà Vu?)
To broaden your minds by introducing you to new and exciting, scientifically credible single-case intervention design-and-analysis possibilities. To let you know that these procedures are becoming increasingly acceptable to single-case intervention researchers and are beginning to appear in the SCD research literature. Whether YOU ever choose to adopt them in your own SCD research (after this week!) is entirely up to you.

3 A Permutation-Test Primer (Based on Levin, 2007)
Rationale and assumptions Samples and populations Individuals and groups Scores and ranks exact probabilities and sampling approximations for scores Levin, J. R. (2007). Randomization tests: Statistical tools for assessing the effects of educational interventions when resources are scarce. In S. Sawilowsky (Ed.), Real data analysis (pp ). Greenwich, CT: Information Age.

4 Is an Instructional Intervention Effective?
From a total of 6 elementary school classrooms, 3 are randomly assigned to receive an instructional intervention that is designed to boost students’ academic performance (intervention classrooms), while students in the other 3 classrooms continue to receive their regular instruction (control classrooms). Following the instructional phase of the experiment, students in all classrooms are administered a 50-point achievement test and the average test performance within each classroom is calculated. Of interest in this study is the mean achievement-test difference between the 3 intervention classrooms and the 3 control classrooms.

5 Data-Analysis Rationale
The obtained mean difference is examined in the context of the distribution of all possible mean differences that can be generated by assigning the 6 obtained classroom means to two instructional conditions, assuming that 3 classroom means must be assigned to each condition. A statistical test is then conducted by addressing the question: How (un)likely or (im)probable is what actually occurred (i.e., the obtained intervention-control mean difference) in relation to everything that could have occurred (i.e., the distribution of all possible intervention-control mean differences, given the study's design structure and the set of means produced)? Should the result of the foregoing test be deemed statistically improbable (e.g., p < .05), then the researcher would conclude that the two instructional methods differ with respect to students' average achievement-test performance.

6 How Do I Test These?: Let Me Count the Ways
In how many different ways can 6 scores be assigned to 2 groups, if 3 scores must end up in each group? That is the same thing as asking how many different combinations of 3 objects are there if selected from a total of 6 objects. So as not to waste time attempting to express that quantity symbolically here: The answer, my friends, boils down to 6!/3!3!= 20. For example, consider the following 6 scores: Let us systematically count the specific ways that these 3 scores could be assigned to Group 1. (Note: The order in which the 3 scores are listed is not important.)

7

8 Example 1: Classroom Means and All Possible Assignments of Them to the Two Conditions (N1 = N2 = 3)
Condition 1 Condition M Difference (C1-C2) , 40.1, 39.6 (122.3, 40.8) , 36.7, 36.3 (110.6, 36.9) * p1 = 1/20 = .05 , 40.1, 37.6 (120.3, 40.1) , 36.7, 36.3 (112.6, 37.5) p2 = 2/20 = .10 , 40.1, 36.7 (119.4, 39.8) , 37.6, 36.3 (113.5, 37.8) , 40.1, 36.3 (119.0, 39.7) , 37.6, 36.7 (113.9, 38.0) , 39.6, 37.6 (119.8, 39.9) , 36.7, 36.3 (113.1, 37.7) , 39.6, 36.7 (118.9, 39.6) , 37.6, 36.3 (114.0, 38.0) , 39.6, 36.3 (118.5, 39.5) , 37.6, 36.7 (114.4, 38.1) , 37.6, 36.7 (116.9, 39.0) , 39.6, 36.3 (116.0, 38.7) , 37.6, 36.3 (116.5, 38.83) 40.1, 39.6, 36.7 (116.4, 38.80) , 36.7, 36.3 (115.6, 38.5) , 39.6, 37.6 (117.3, 39.1) , 39.6, 37.6 (117.3, 39.1) , 36.7, 36.3 (115.6, 38.5) , 39.6, 36.7 (116.4, 38.80) 42.6, 37.6, 36.3 (116.5, 38.83) , 39.6, 36.3 (116.0, 38.7) , 37.6, 36.7 (116.9, 39.0) , 37.6, 36.7 (114.4, 38.1) , 39.6, 36.3 (118.5, 39.5) , 37.6, 36.3 (114.0, 38.0) , 39.6, 36.7 (118.9, 39.6) , 36.7, 36.3 (113.1, 37.7) , 39.6, 37.6 (119.8, 39.9) , 37.6, 36.7 (113.9, 38.0) , 40.1, 36.3 (119.0, 39.7) , 37.6, 36.3 (113.5, 37.8) , 40.1, 36.7 (119.4, 39.8) , 36.7, 36.3 (112.6, 37.5) , 40.1, 37.6 (120.3, 40.1) , 36.7, 36.3 (110.6, 36.9) , 40.1, 39.6 (122.3, 40.8)

9 What Summary Measure(s) Can/Should Be Analyzed by The To-Be-Presented Techniques?
• Means (Levels) of the phases • Medians • Truncated/Censored data based on a priori rules • Even randomly selected observations? • Slopes (Trends) of the phases • Variances of the phases • Any “Predicted Pattern” within or between phases • Special considerations when groups are the units of intervention administration

10 Why Randomization Statistical Tests?
Conceptually and computationally straightforward easy to explain to others parsimonious Logically consistent connection to the design’s randomization components implications for Type I error control Underlying statistical assumptions Statistical power characteristics Versatility and adaptability relative to other single-case statistical approaches

11 Randomized Adaptations of Traditional Single-Case Designs
• Phase/Intervention Randomization (Within Cases) the order of the A and B phases/interventions is randomly determined for each case (e.g., participant, pair, group, classroom) • Intervention Randomization (Between Cases) ‒ cases are randomly assigned to interventions Intervention Start-Point Randomization the transition point from one phase to the next is randomly determined for each case Case Randomization cases are randomly assigned to positions within the design (typically, to the “tiers” of a multiple-baseline design)

12 Background Eugene Edgington’s randomization-test contributions
Randomized phase/intervention designs Basic design (AB/BA) and “return to baseline design” (ABA/BAB), including when A and B consist of two different interventions Reversal design (ABAB/BABA) and alternating treatment design Randomized intervention start-point designs Basic design (AB) and “return to baseline” design (ABA), including when A and B consist of two different interventions Reversal design (ABAB) and alternating treatment design Single-case crossover design Multiple-baseline design Replications (i.e., multiple units) and extensions of these

13 Bottom-Line Conclusions Based on Statistical Simulation Studies
Tests of Change in Level 1. ABAB…AB and Alternating Treatment Design (Levin, Ferron, & Kratochwill, 2012) 2. Variety of AB Designs (Levin, Ferron, & Gafurov, 2014) 3. Multiple-Baseline Designs (Levin, Ferron, & Gafurov, 2016; 2017) Tests of Change in Trend or Variability Multiple-Baseline Designs (Levin, Ferron, & Gafurov, 2018)

14 Effect-Size Alert for Single-Case Research Outcomes, or Don’t “Dis” Large Effect Sizes Here!
Marquis et al. (2000) noted in their meta-analysis of positive behavior support that “[t]he smallest [conventional effect-size measure] for outcomes was 1.5, which would be considered quite large in a group study context” (p. 165); and that their effect-size estimates “ranged from 1.5 standardized units to 3.1 units” (p. 167). Rogers and Graham (2008, p. 885) indicated that “[W]hen we have used [the conventional method of effect-size calculation in meta-analyses of] single subject design studies in writing, the effect sizes are typically 3.0 and higher.” In a single-case enuresis-treatment study conducted by Miller (1973), the conventional effect sizes calculated for the two participants were 5.98 and 6.41 (Busk & Serlin, 1992, p ).

15 References Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for single-case research. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis (pp ). Hillsdale, NJ: Erlbaum. Marquis, J. G., Horner, R. H., Carr, E. G., Turnbull, A. P., Thompson, M., Behrens, G. A., et al. (2000). A meta-analysis of positive behavior support. In R. Gersten, E. P. Schiller, S. Vaughn, (Eds.), Contemporary special education research: Syntheses of knowledge base on critical instructional issues (pp ). Mahwah, NJ: Erlbaum. Miller, P. M. (1973). An experimental analysis of retention control training in the treatment of nocturnal enuresis in two institutional adolescents. Behavior Therapy, 4, Rogers, L. A., & Graham, S. (2008). A meta-analysis of single subject design writing intervention research. Journal of Educational Psychology, 100,

16 Randomized Intervention Start-Point Designs and Analysis
1. Basic AB Design

17 Adapted from Levin, J. R. , & Wampold, B. E. (1999)
Adapted from Levin, J. R., & Wampold, B. E. (1999). Generalized single‑case randomization tests: Flexible analyses for a variety of situations. School Psychology Quarterly, 14, 59‑93.

18

19 Randomized Intervention Start-Point Designs and Analyses
1. Basic AB Design 2. Replicated AB Design

20 Replicated AB Design With Three Cases (“Units”), Two Within-Series Intervention Conditions, 20 Time Periods, and 13 Potential Intervention Points for Each Case Marascuilo, L. A., & Busk, P. L. (1988). Combining statistics for multiple-baseline AB and replicated ABAB designs across subjects. Behavioral Assessment, 10, 1-28.

21 Randomized Intervention Start-Point Designs and Analyses
1. AB Design 2. Replicated AB Design 3. Paired AB Design

22 Levin & Wampold’s (1999) Simultaneous Start-Point Model
Time Period * Pair 1X A A A A A A A A A A A B B B B B B B B B Pair 1Y A A A A A A A A A A A B B B B B B B B B Note: Potential intervention start points are between Time Periods 5 and 17 inclusive. *Randomly selected intervention start point for the pair of units Levin, J. R., & Wampold, B. E. (1999). Generalized single-case randomization tests: Flexible analyses for a variety of situations. School Psychology Quarterly, 14, 59–93.

23 From Levin, J. R. , & Wampold, B. E
From Levin, J. R., & Wampold, B. E. (1997, July) Single-case randomization tests for a variety of situations. Paper presented at the 10th European Meeting of the Psychometric Society, Santiago de Compostela, Spain.

24 Means and Mean Differences Associated With Each of the 9 Potential Intervention Startpoints, By Reinforcer Type Start Point Social Token T-S A B B-A A B B-A Diff Rank * *

25 If it had been predicted that there would be a greater improvement in level (mean) from Phase A to Phase B with token reinforcers than with social reinforcers, then the significance probability associated with the actually obtained randomization-test outcome is given by: p = 4/ 9 = Consequently, we would conclude that the prediction has not been supported at any reasonable α level. (Note that in this example based on only 9 potential intervention start points (and, therefore, a total of only 9 randomization-test outcomes), the lowest p-value possible is 1/9 = .111.)

26 Randomized Intervention Start-Point Designs and Analyses
1. AB Design 2. Replicated AB Design 3. Paired AB Design 4. Replicated Paired AB Design

27 Levin & Wampold’s (1999) Replicated Simultaneous Start-Point Model
Time Period Pair 1X A A A A A A A A A A A* B B B B B B B B B Pair 1Y A A A A A A A A A A A* B B B B B B B B B Pair 2X A A A A A A A A A* B B B B B B B B B B B Pair 2Y A A A A A A A A A* B B B B B B B B B B B Note: Potential intervention start points are between Time Periods 5 and 17 inclusive. *Randomly selected intervention start point for each pair of units Levin, J. R., & Wampold, B. E. (1999). Generalized single-case randomization tests: Flexible analyses for a variety of situations. School Psychology Quarterly, 14, 59–93

28 Randomized Intervention Start-Point Designs and Analyses
1. AB Design 2. Replicated AB Design 3. Paired AB Design 4. Replicated Paired AB Design 5. “Dual Order” AB Design

29 Dual-Order AB Designs A novel approach for improving the power of single-case randomization tests Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2014). Improved randomization tests for a class of single-case intervention designs. Journal of Modern Applied Statistical Methods, 13(2), 2-52; retrievable from Onghena, P., Vlaeyen, J. W. S., & de Jong, J. (2007). Randomized replicated single-case experiments: Treatment of pain-related fear by graded exposure in vivo. In S. Sawilowsky (Ed.), Real data analysis (pp ). Greenwich, CT: Information Age.

30 Suppose that in a 16-observation design, A and B are either a baseline phase and an intervention phase or two different interventions that a single case is to receive. The case is assigned randomly to one of the two phase orders (AB or BA). [With a true baseline-intervention design this can be accomplished by including one or more mandatory baseline/adaptation observations (A') for both phase orders.] The random assignment of phase orders is required for the subsequent AB randomization test (modified Edgington test) to be valid. In addition, the case receives a randomly selected intervention start point, with an a priori specification of 10 potential start points, from Observations 4 through 13 inclusive.

31 AB Randomized Phase-Order Design (With Mandatory Initial A' Baseline Phase)
With the original Edgington (1975) model, the study can be diagrammed as: A' A' A' A A A B B B* B B B B B B B B B B However, with the revised Edgington model, the opposite “pretend” ordering of As and Bs was also possible and therefore can be included in the randomization distribution: A' A' A' B B B A A A* A A A A A A A A A A

32 AB Randomized Phase-Order Design (With Mandatory Initial A' Baseline Phase)
Single-order (AB) randomization: With 10 potential intervention points, even if the observed outcome were the most extreme the lowest one-tailed significance probability would be p = 1/10 = .10. Dual-order (AB and BA) randomization: Even with only 10 potential intervention points it would be possible to attain a one-tailed significance probability of p = 1/20 = .05.

33 Randomized AB Order Designs:Summary and Conclusions (Levin et al
Note: The to-be-reported simulation results are typically based on a within-phase autocorrelation of ρ = .30. For all AB design variations examined: Dual-order randomization maintains satisfactory Type I error control. Dual-order randomization exhibits power that is generally about points higher (and in some situations, even more) than that of single-order randomization.

34 Selected Single- Versus Dual-Randomization Power Comparisons of Longer and Shorter Series Simulations (SL = Series Length, NSP = Number of Potential Intervention Start Points) N d r Size (SL/NSP) Single Dual Difference 2 2.0 .30 Longer (15/5) .44 .85 .41 Shorter (9/5) .42 .80 .38 3 1.5 .49 .90 Shorter (7/3) .28 .73 .45 5 1.0 .89 Shorter (8/2) .15 .71 .56 (Crossover Design as Special Case)

35 Randomized Intervention Start-Point Designs and Analyses
1. AB Design 2. Replicated AB Design 3. Paired AB Design 4. Replicated Paired AB Design 5. “Dual Order” AB Design (Crossover Design as a Special Case) Potential difference in design credibility between a dual-order AB design and a systematic crossover design?: Don’t lose your “balance”!

36 Randomized Intervention Start-Point Designs and Analyses
1. AB Design 2. Replicated AB Design 3. Paired AB Design 4. Replicated Paired AB Design 5. “Dual Order” AB Design (Crossover Design as a Special Case) 6. Multiple-Baseline Design, Including “Restricted” Replicated AB Design (Levin, Ferron, & Gafurov, 2014; 2016; 2017)

37 Multiple-Baseline Design
Revusky’s “between case” procedure Revusky, S. H. (1967). Some statistical treatments compatible with individual organism methodology. Journal of the Experimental Analysis of Behavior, 10, Wampold & Worsham’s and Koehler & Levin’s “within-case” procedures Wampold, B. E., & Worsham, N. L. (1986). Randomization tests for multiple-baseline designs. Behavioral Assessment, 8, Type I error and power assessments have been made by Ferron and Sentovich (2002) of the Wampold-Worsham, Marascuilo-Busk, and Koehler-Levin procedures Ferron, J., & Sentovich, C. (2002). Statistical power of randomization tests used with multiple-baseline designs. Journal of Experimental Education, 70,

38 Multiple-Baseline Design Updates: Summary and Conclusions (Levin et al
Modifications (and improved) randomized start-point versions of the Revusky and Marascuilo-Busk procedures have been developed. Generally speaking, the Koehler-Levin and modified Marascuilo-Busk procedures are recommended, based on their superior power to detect immediate abrupt intervention effects, along with the modified Revusky procedure in special situations. But hold on, life is not so simple. Here’s the “rest of the story”…

39 (A)

40 Multiple-Baseline Design Updates: Summary and Conclusions (Levin et al
Bad news: When the intervention effects are not immediate and abrupt, power is drastically reduced for all of the procedures (especially for 2-observation delayed abrupt effects).

41 (A)

42 Multiple-Baseline Design Updates: Summary and Conclusions (Levin et al
Bad news: When the intervention effects are not immediate and abrupt, power is drastically reduced for all of the procedures (especially for 2-observation delayed abrupt effects). Good news: With careful researcher forethought and planning (and good luck!), adjusted analyses can be formulated for all of the procedures to recapture much of the lost power. Other news: Under certain alternate-effect conditions, the original fixed-point Wampold-Worsham procedure performs as well as (or better) the more recent randomized start-point procedures.

43 Multiple-Baseline Design Updates: Summary and Conclusions (Levin et al
And a few final comments on these randomization tests: Slippery slopes and variability (Levin, Ferron, & Gafurov, 2018) The “power” of increased sample size (specifically, the number of cases) Systematic power comparisons of randomization tests and HLM procedures

44 References Levin, J. R., Ferron, J., M., & Kratochwill, T. R. (2012). Nonparametric statistical tests for single-case systematic and randomized ABAB…AB and alternating treatment designs: New developments, new directions. Journal of School Psychology, 50, Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2014). Improved randomization tests for a class of single-case intervention designs. Journal of Modern Applied Statistical Methods, 13(2), 2-52; retrievable from Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2016). Comparison of randomization-test procedures for single-case multiple-baseline designs. Developmental Neurorehabilitation, DOI: / Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2017). Additional comparisons of randomization-test procedures for single-case multiple-baseline designs: Alternative effect types. Journal of School Psychology, 63, Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2018). Randomization tests of trend and variability for single-case multiple-baseline designs.

45 Some Randomization-Test Software
Randomized Phase Designs Edgington & Onghena’s (2007) SCRT (Single-Case Randomization Tests) program, in their book Levin, Ferron, & Kratochwill’s (2011) SAS software for various ABAB…AB and alternating treatment designs Randomized Intervention Start-Point Designs Edgington & Onghena’s (2007) SCRT program, in their book (also Bulté & Onghena, 2008) Koehler’s (2012) program for the Koehler-Levin (1998) multiple-baseline procedure ( Gafurov & Levin’s ExPRT (Excel Package of Randomization Tests) Version 3.1, March 2018; downloadable at Other Borckhardt et al.’s (2008) Simulation Modeling Analysis (SMA) program

46 ExPRT (Excel Package of Randomization Tests)
− Developed by Boris S. Gafurov, George Mason University; and Joel R. Levin, University of Arizona − Statistical randomization tests for single-case intervention studies − ExPRT is freely available to single-case intervention researchers: Current Version 3.1 (March 2018) may be downloaded from

47 Current Designs and Analyses
AB design (several different models) ABA and ABC designs ABAB design Multiple-baseline design Alternating treatment design (randomized pairs model)

48 Current Features of the ExPRT Statistical Software
Features of ExPRT’s programs include: • user-defined α levels (one- or two-tailed tests) • statistical decisions (reject, do not reject) and significance probabilities (p-values) • statistical tests based on either mean (level), slope (trend), or variance (variability) • output distribution of all possible outcomes • graph of the outcomes for each case • case-by-case and across-case summary measures and effect-size estimates (conventional d and rescaled NAP) • a randomizing routine for planned studies

49 AB Design − Basic time-series design: Baseline/Control (A) vs. Intervention (B); or Intervention A vs. Intervention B − Intervention start-point (or phase transition-point) randomization procedures ●Edgington model and Marascuilo-Busk extension ● Levin-Wampold simultaneous start-point model for two different paired interventions (general- and comparative-effectiveness tests) ● Levin et al.’s (2014) dual intervention-order (AB or BA) option ● Levin et al.’s (2016) single-case systematic crossover design: intervention effect and time effect

50

51

52

53

54

55

56 Multiple-Baseline Design
Within-case tests (Wampold-Worsham and Koehler-Levin models) Between-case tests (Levin et al.’s, 2016, Modified Revusky model)

57

58

59

60


Download ppt "Statistical Randomization Techniques for Single-Case Intervention Data and Introduction to ExPRT Statistical Software Joel R. Levin University of."

Similar presentations


Ads by Google