Download presentation
Presentation is loading. Please wait.
Published byAudrey Cobb Modified over 6 years ago
1
Validating Student Growth During an Assessment Transition
National Conference on Student Assessment June 30, 2017
2
Background Georgia has implemented the Student Growth Percentile (SGP) model since 2012. SGPs are utilized by educators for instructional planning, and are also included as components in the state’s school accountability and educator effectiveness systems. In 2015, Georgia transitioned from its legacy assessment system – Criterion Referenced Assessment Tests (CRCTs) and End of Course Tests (EOCTs) to a new assessment system – the Georgia Milestones Assessment System. We have implemented the student growth percentile model since We first began implementation under our Race to the Top grant and selecting this model in partnership with a working committee of Georgia educators. We have three main purposes for our implementation of SGPs: 1) they are used by educators to inform instruction; 2) they are used as one component in our school accountability system; and 3) they are used as one component in our educator effectiveness system. In 2015, we transitioned from our legacy assessment system – CRCT and EOCT – to a new assessment system – Georgia Milestones. This presentation will focus on some of the validity studies we have undertaken to monitor the growth transition across assessment systems.
3
Growth During an Assessment Transition
SGPs were reported during the assessment transition as the SGP methodology is robust to scale transformations. SGPs describe the amount of growth a student has demonstrated relative to academically-similar students. One can think about the prior scores as the starting point with the current assessment score being the ending point. Since SGPs are not a gain score model, they can withstand the prior and current scores being based on different assessment systems. Because we are implementing SGPs, we were able to continue reporting growth during the assessment transition as SGPs are robust to scale transformations. SGPs describe the amount of growth a student has demonstrated relative to academically-similar students. We can think about the prior scores as the starting point with the current score being the ending point. Since SGPs are not a gain score model, they can handle the prior and current scores being based on different assessment systems. 12/6/2018
4
Growth During an Assessment Transition
Growth is independent of proficiency cuts. SGPs do not decrease because assessment expectations increase. SGPs measure how students transition from the old assessment system to the new assessment system relative to academically-similar students. You will still have the full range of growth during the assessment transition. Even though students may demonstrate lower proficiency rates on the new assessment system (“yard stick”), students will continue to learn and grow – and SGPs will capture that growth at all levels. Another key factor of the assessment transition is that growth is independent of proficiency cuts. This was a big concern for our educators. Because there would be increased performance expectations on the new assessment, they were concerned that growth would be lower. However, you still have the full range of growth during an assessment transition. Students continue to learn and grow – and SGPs capture that growth at all levels. Essentially, SGPs measured how students transitioned from the old to the new assessment system relative to academically-similar students. 12/6/2018
5
Validating Growth Georgia has implemented a series of validity studies to monitor its student growth transition. Test the power of the CRCT/EOCT for predicting Georgia Milestones scores Estimate the standard errors of the SGPs and MeanGPs Check for floor or ceiling effects in the tests Compare the distribution of SGPs to the uniform distribution We have been conducting a series of validity studies to monitor the student growth transition. Today I will focus on four of those studies: Test the power of the CRCT/EOCT for predicting Georgia Milestones scores Estimate the standard errors of the SGPs and MeanGPs Check for floor or ceiling effects in the tests Compare the distribution of SGPs to the uniform distribution 12/6/2018
6
1. Power Study Purpose – to determine the strength of the relationship between the new and old test scores Study – estimate the power (R2) of the CRCT/EOCT for predicting scores from Georgia Milestones Desired outcome – the power of the prior CRCT/EOCT scores to predict Georgia Milestones is not weaker than it was for predicting CRCT/EOCT scores The purpose of the first study is to determine the strength of the relationship between the new and old test scores. For this, we use R-squared to estimate the power of the CRCT/EOCT for predicting scores from Georgia Milestones. We would want the power of the prior CRCT/EOCT scores to predict Georgia Milestones scores to be as strong as it was for predicting CRCT/EOCT scores. 12/6/2018
7
1. Power Study R-Squared (2 Priors) Subject 2014|2013,2012
2015|2014,2013 2016|2015,2014 ELA 0.62 0.67 0.71 Mathematics 0.68 0.72 0.73 Science 0.66 0.69 0.70 Social Studies Here is an example of some of the data from our study was our last year on our legacy assessment system. At that time, R-squared ranged from .62 to .68, depending on the content area. When we transitioned to our new assessment system in 2015, R-squared did not decrease. In fact, it increased slightly to the .67 to .72 range. There was an additional slight increase in This indicates that there was no decrease in the power to predict scores as we transitioned assessment systems. assessment transition 12/6/2018
8
2. Standard Error Study Purpose – to determine if there is a change in the level of uncertainty associated with growth estimates Study – compare SGP standard errors for the new and old test Desired outcome – standard errors for SGPs for Georgia Milestones are not larger than those for the CRCT/EOCT The purpose of the second study was to determine if there is a change in the level of uncertainty associated with growth estimates. To do this, we compared SGP standard errors for the new and old test. We would hope that the SGP standard errors for the new assessment are no larger than those for the old assessment. 12/6/2018
9
2. Standard Error Study assessment transition
Here is the distribution of standard errors from before and after our assessment transition. As you can see, the distribution is consistent across all three years. If you saw a change in the distribution, you would want to investigate further. assessment transition 12/6/2018
10
3. Floor/Ceiling Effect Study
Purpose – to determine if there are floor or ceiling effects associated with the new assessment Study – plot the distributions of scores by grade and content area Desired outcome – there are no floor or ceiling effects associated with the new assessment The purpose of the third study is to determine if there are floor or ceiling effects associated with the new assessment. Such an effect would exist if many students receive the lowest few or highest few score values on Georgia Milestones. Since the new assessment has higher expectations and uses new item types, the primary concern would be floor effects. Floor and ceiling effects can be studied by plotting the distributions of scores by grade and content area. We would expect achievement to have roughly a bell-shaped curve. If there is a spike of students at the lower or upper end of the distribution, there would be evidence of floor or ceiling effects. Such effects can lead to inaccuracy in SGP calculations. 12/6/2018
11
3. Floor/Ceiling Effect Study
Here is the distribution of scale scores for two of our mathematics assessments. You can see the roughly bell shaped curve. Should you see spikes at either end of the distribution, you might have evidence of floor or ceiling effects. 12/6/2018
12
4. Uniform Distribution Study
Purpose – to compare the distribution of SGPs to the uniform distribution Study – review goodness of fit plots Desired outcome – SGPs distributions should not deviate from the uniform distribution The purpose of the fourth study is to compare the distribution of SGPs to the uniform distribution. SGPs should be roughly uniform with 1% of students receiving each SGP from 1 to 100. However, should students cluster at the low and high points of the tests or if there is model misfit, this distribution can deviate from normal. We can produce goodness of fit plots that group SGPs into deciles. Under the uniform distribution, roughly 10% of the sample should be in each bin. 12/6/2018
13
4. Uniform Distribution Study
This is the goodness of fit plot for our grade 8 ELA test in The plot on the left breaks up the growth percentile range and prior score range into deciles. You can see there are roughly 10% of students in each cell, as expected. The Q-Q plot on the right shows the empirical SGP distribution against the theoretical distribution. You would want the lines to be consistent at a 45 degree angle, indicating that the observed SGPs show the same uniform distribution theoretically expected. 12/6/2018
14
Additional Studies Compare SGPs across subgroups of students
Letting the score distributions stabilize before selecting cohorts for baseline SGPs Compare the distributions of MeanGPs for schools and teachers across years Compare MeanGPs within schools and teachers across years Compare growth ratings and educator effectiveness ratings across years and subgroups I have discussed four of the validity studies we are currently focusing on. There are additional studies that states could conduct to examine growth calculations during assessment transitions. These studies include comparing SGPs across subgroups of students to determine if the relative position of subgroups has not changed; letting score distributions on the new assessment stabilize before setting baselines, should a state implement baseline-referenced SGPs; as well as several studies to compare meanGPs at the teacher and school level. 12/6/2018
15
Contact Information Melissa Fincher, Ph.D. Deputy Superintendent for Assessment and Accountability or (404) Allison Timberlake, Ph.D. Director of Accountability or (404) Qi Qin Assessment Specialist, Growth Model or (404)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.