Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.

Similar presentations


Presentation on theme: "Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1."— Presentation transcript:

1 Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1

2 Information and materials mentioned or shown during this presentation are provided as resources and examples for the viewer's convenience. Their inclusion is not intended as an endorsement by the Regional Educational Laboratory Southeast or its funding source, the Institute of Education Sciences (Contract ED-IES-12-C-0011). In addition, the instructional practices and assessments discussed or shown in these presentations are not intended to mandate, direct, or control a State’s, local educational agency’s, or school’s specific instructional content, academic achievement system and assessments, curriculum, or program of instruction. State and local programs may use any instructional content, achievement system and assessments, curriculum, or program of instruction they wish. 2

3 Purpose & Audience In scope Evaluation designs that allow for causal inferences (RCT & QED) Creating an evaluation plan to examine the effectiveness of professional development Out of scope Other program evaluation designs Identifying best practices for conducting professional development Identifying best practices in systems change 3 Target audience: LEAs, SEAs, and researchers interested in creating an evaluation of a specific professional development program and have an intermediate level of understanding in effectiveness studies.

4 PLANNING THE DESIGN Dr. Sharon Koon 4

5 Distinction between WWC evidence standards and additional qualities of strong studies WWC design considerations for assessing effectiveness research: – Two distinct groups—a treatment group (T) and a comparison group (C). – For randomized controlled trials (RCTs), low attrition for both the T and C groups. – For quasi-experimental designs (QEDs), baseline equivalence between T and C groups. – Contrast between T and C groups measures impact of the treatment. – Valid and reliable outcome data used to measure the impact of a treatment. – No known confounding factors. – Outcome(s) not overaligned with the treatment. – Same data collection process—same instruments, same time/year—for the T and C groups. 5 Source: http://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi- Experiments-in-Education-Version-2.pdfhttp://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi- Experiments-in-Education-Version-2.pdf

6 Distinction between WWC evidence standards and additional qualities of strong studies (cont.) Additional qualities of strong studies: – Pre-specified and clear primary and secondary research questions. – Generalizability of the study results. – Clear criteria for research sample eligibility and matching methods. – Sample size large enough to detect meaningful and statistically significant differences between the T and C groups overall and for specific subgroups of interest. – Analysis methods reflect the research questions, design, and sample selection procedures. – A clear plan to document the implementation experiences of the T and C conditions. 6 Source: http://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi- Experiments-in-Education-Version-2.pdfhttp://www.dir-online.com/wp-content/uploads/2015/11/Designing-and-Conducting-Strong-Quasi- Experiments-in-Education-Version-2.pdf

7 Determinants of a What Works Clearinghouse (WWC) study rating Source: http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

8 Study features that will be discussed Randomized controlled trials (RCTs) – Random assignment process Cluster-level RCT considerations – Attrition, both overall and T-C differential Quasi-experimental designs (QEDs) and high-attrition RCTs – Baseline equivalence For both RCTs and QEDs – Confounding factors – Outcome eligibility Power analysis (not considered by WWC evidence standards) Source: WWC references - http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

9 Random assignment process Units can be assigned at any level and at multiple levels (e.g., schools, teachers, students) – Cluster design: when groups rather than individuals are the unit of assignment Make sure the units are – Assigned entirely by chance – Have a non-zero probability of being assigned to each group (but can have different probabilities across conditions) – Have consistent assignment probability within group or use an appropriate analytic approach Can be useful to conduct within strata Must maintain assignment status in the analysis, even if noncompliance occurs (i.e, intent-to-treat analysis) Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

10 Cluster-level RCT considerations When cluster-level outcomes are analyzed, results provide evidence about cluster-level effects To meet WWC standards without reservations for analyses of subcluster effects, the sample should include subcluster units identified before the results of the random assignment were revealed. For example, in a school-level RCT examining teacher retention, the sample – Should include teachers in the schools before the random assignment results were provided to the schools – Cannot meet standards without reservations if it includes any teachers who joined the schools after the random assignment results were provided Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

11 Attrition Occurs when sample members initially assigned to T or C groups are not in the analysis because they are missing key data used to calculate impacts The WWC is concerned about overall attrition and differences in the attrition rates between T and C groups WWC examines cluster and, if applicable, subcluster attrition Key data include outcomes, and for high-attrition RCTs, characteristics used to assess baseline equivalence Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

12 Ways of minimizing attrition in RCTs Make sure study participation activities are clear to everyone involved – e.g., can prevent an uninformed superintendent from pulling the plug Conduct random assignment after participants consented to study participation – Non-consent counts as attrition Conduct random assignment as close to the start of the implementation period as possible – Could help minimize attrition turnover Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18

13 QEDs In a QED, there are at least two groups (one intervention and one comparison). The groups are created non-randomly – Use a convenience sample, or nonparticipants who are nearby and available, but are not participating in the intervention. – Use a statistical technique to match participants (e.g., propensity score matching). – Form the groups retrospectively, using administrative data. Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23

14 Baseline equivalence Must be demonstrated for QEDs and high- attrition RCTs Based on units/individuals in the analytic sample using baseline characteristics Example baseline characteristics: – Prior measure of the outcome – Demographic characteristics related to the outcome of interest Sources: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18, http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18 http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

15 Baseline equivalence (cont.) Calculate the T-C standardized mean difference at baseline – Differences between 0.05 and 0.25 standard deviations require statistical adjustment when calculating impacts – If there is a difference greater than 0.25 standard deviations for any required characteristic, then no outcomes in that domain may meet standards Sources: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18,http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=18 http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

16 Confounding factors Common confounds – Single unit (school, classroom, teacher) in one or both conditions) – Characteristics of the units in each group differ systematically in ways that are associated with the outcomes – Intervention is bundled with other services not being studied – T and C occur at different points in time Source: http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23,http://ies.ed.gov/ncee/wwc/multimedia.aspx?sid=23 http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

17 Outcome eligibility Face validity and reliability. Minimum reliability standards include: – internal consistency (such as Cronbach’s alpha) of 0.50 or higher; – temporal stability/test-retest reliability of 0.40 or higher; or – inter-rater reliability (such as percentage agreement, correlation, or kappa) of 0.50 or higher. Not overaligned – E.g., an outcome measure based on an assessment that relied on materials used in the T condition but not in the C condition (e.g., specific reading passages) Source: http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

18 Outcome eligibility (cont.) Collected in the same manner for both T and C groups. Issues include: – different modes, timing, or personnel were used for the groups – measures were constructed differently for the groups Source: http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

19 Power analysis Power: the probability of finding a difference when there is a true difference in the populations (i.e., correctly rejecting a false null hypothesis). Key variables influence the power of a statistical test: – The alpha that a researcher chooses – The magnitude of the true population effect (effect size) – The sample size – Any clustering of the data – The extent to which baseline covariates predict the outcome variable 19

20 Power analysis (cont.) A priori power analysis is conducted prior to doing the study. It enables you to design a study with adequate statistical power. Several online tools are available to researchers. For example, Optimal Design, can be used for individual and group RCTs. http://sitemaker.umich.edu/group-based/optimal_design_software http://sitemaker.umich.edu/group-based/optimal_design_software 20

21 Questions & Answers Homework: Find psychometric properties of outcome measures you are considering Bring questions to sessions 3 - 5 21

22 Developing an evaluation of professional development Webinar 3: Going Deeper into Identifying & Measuring Target Outcomes 1/15/2016, 2:00pm Webinar 4: Going Deeper into Analyzing Results 1/19/2016, 2:00pm Webinar 5: Going Deeper into Interpreting Results & Presenting Findings 1/21/2016, 2:00pm 22


Download ppt "Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1."

Similar presentations


Ads by Google