Power Considerations for Educational Studies with Restricted Samples that Use State Tests as Pretest and Outcome Measures June 2010 Presentation at the.

Slides:



Advertisements
Similar presentations
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Advertisements

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
A “Best Fit” Approach to Improving Teacher Resources Jennifer King Rice University of Maryland.
Explaining Race Differences in Student Behavior: The Relative Contribution of Student, Peer, and School Characteristics Clara G. Muschkin* and Audrey N.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
PSY 307 – Statistics for the Behavioral Sciences
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Lecture 9: One Way ANOVA Between Subjects
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Today Concepts underlying inferential statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
Mark DeCandia Kentucky NAEP State Coordinator
QUIZ CHAPTER Seven Psy302 Quantitative Methods. 1. A distribution of all sample means or sample variances that could be obtained in samples of a given.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Common Questions What tests are students asked to take? What are students learning? How’s my school doing? Who makes decisions about Wyoming Education?
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.
Non Experimental Design in Education Ummul Ruthbah.
Correlation and Regression
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Using VAM Data to Determine Points (50 % of the Total) toward Unified Single Rating Draft Procedures 11/21/2012 DRAFT DOCUMENT.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
1)Test the effects of IV on DV 2)Protects against threats to internal validity Internal Validity – Control through Experimental Design Chapter 10 – Lecture.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Link Between Inclusive Settings and Achievement in Urban Settings Elizabeth Cramer Florida International University.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
Issues in Assessment Design, Vertical Alignment, and Data Management : Working with Growth Models Pete Goldschmidt UCLA Graduate School of Education &
Teacher Engagement Survey Results and Analysis June 2011.
1 Review of ANOVA & Inferences About The Pearson Correlation Coefficient Heibatollah Baghi, and Mastee Badii.
Optimal Design for Longitudinal and Multilevel Research Jessaca Spybrook July 10, 2008 *Joint work with Steve Raudenbush and Andres Martinez.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Promoting the wellbeing of Africans through policy-relevant research on population and health 1 Impact evaluation of the East African Quality in Early.
Mark DeCandia Kentucky NAEP State Coordinator
NAEP 2011 Mathematics and Reading Results NAEP State Coordinator Mark DeCandia.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
1 Inferences About The Pearson Correlation Coefficient.
School-level Correlates of Achievement: Linking NAEP, State Assessments, and SASS NAEP State Analysis Project Sami Kitmitto CCSSO National Conference on.
T tests comparing two means t tests comparing two means.
Gary W. Phillips American Institutes for Research CCSSO 2014 National Conference on Student Assessment (NCSA) New Orleans June 25-27, 2014 Multi State.
Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments IES Research Conference June 28 th, 2010 Marie-Andrée Somers (Presenter)
Sampling Theory and Some Important Sampling Distributions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Hypothesis test flow chart
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Research Questions  What is the nature of the distribution of assignment quality dimensions of rigor, knowledge construction, and relevance in Math and.
TEACHNJ Proposed Regulations. TEACHNJ Regulations Proposal  Two Terms that are very important to know: SGO – Student Growth Objective (Created in District)
1 Testing Various Models in Support of Improving API Scores.
Analysis for Designs with Assignment of Both Clusters and Individuals
Comparability of Assessment Results in the Era of Flexibility
School Quality and the Black-White Achievement Gap
Internal Validity – Control through
Stats Tools for Analyzing Data
Delaware Department of Education
Comparing Populations
Student Mobility and Achievement Growth In State Assessment Mohamed Dirir Connecticut Department of Education Paper presented at National Conference.
Brahm Fleisch Research supported by the Zenex Foundation October 2017
Additional notes on random variables
Day 2 Applications of Growth Curve Models June 28 & 29, 2018
Additional notes on random variables
CCSSO National Conference on Student Assessment June 21, 2010
Sample Sizes for IE Power Calculations.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Power Considerations for Educational Studies with Restricted Samples that Use State Tests as Pretest and Outcome Measures June 2010 Presentation at the Institute for Education Sciences Research Conference Russell Cole ● Josh Haimson ● Irma Perez-Johnson ● Henry May

The research reported here was supported by the National Center for Education Evaluation and Regional Assistance, U.S. Department of Education, through contract ED-04-CO-0112 to Mathematica Policy Research.

 Randomized controlled trial (RCT) –Unbiased estimate of program impact –Increasingly prevalent in education research  Probability of detecting a true program impact is based on n, , effect size (ES) –Use of pretest can increase power (1-  –Pretest-Posttest correlation shrinks minimum detectable effect size (MDES) Measuring impact of education intervention 3

MDES Increases as Pretest-Posttest Correlation Decreases 4

 State assessments as outcomes –Used to define proficiency for AYP –Universal in grades 3–8 (Math and ELA) –Minimizes burden –Low(er) cost and scale scores readily available  State tests tend to have lower CSEM at middle of ability distribution –Largest CSEM at tails –Variance (  2 ) can be partitioned into explainable and unexplainable (measurement error) components –Given increased CSEM at tails, samples of students selected at tails will have higher proportions of unexplainable variance State Tests Prevalent, But Appropriate? 5

 If there is greater measurement error for low- performing students, does this mean that pretest-posttest correlations will be attenuated?  To capture variability in correlation coefficients associated to measurement error, select samples with different average achievement levels and calculate r  Compare pretest-posttest correlations across different achievement levels (and across states) to inform power calculations General Methodology 6

Research Questions 7  What is the average pretest-posttest correlation coefficient for samples of students selected at different pretest achievement levels?  Do correlation coefficients differ by state?

 4 complete states + 2 large districts from 2 additional states  3 years of population data –2 sets of pre-post correlations –(Year1,Year2), (Year2,Year3)  English/Language Arts & Mathematics  Grades 3–8 Population Data 8

1. Sample pretest achievement level determined A.Lowest performers B.Proficiency threshold C.Average performers 2. Grade grouping (pretest year) A.Early elementary (grades 3 and 4) B.Late elementary (grade 5) C.Middle school (grades 6 and 7) Analysis Decisions 9

For each state, year, subject, and grade-group: 1. Pretest standardization 2. Selection of study samples (n = 500) 3. Calculation of pretest-posttest correlation –6 states, 2 years pre-post data, 2 subjects, 3 grade groups for each achievement level 4. Cross-cutting aggregation (ANOVA) Analysis Procedure 10

Pretest-Posttest Correlations Attenuated for Lowest-Performing Samples 11

Large Variation in Pretest-Posttest Correlation Across States 12

Observed for Power Analysis 13 r =. 89 r =.60 r =. 37

Implications for MDES Might Be Modest 14 r =.60 r =.65

 Pretest-posttest correlations –Large attenuation when homogeneous sample selected –Might be lower than anticipated for low performers on state assessments –Similar for ELA/Mathematics and across grade levels –Affected by other factors (ceiling/floor effects)  Use available administrative records to gauge Discussion/Summary 15

Thank you May, Henry, Irma Perez-Johnson, Joshua Haimson, Samina Sattar, and Phil Gleason (2009). “Using State Tests in Education Experiments: A Discussion of the Issues.” (NCEE ). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education