PARCC Data Forensics: A Multifaceted Approach

PARCC Data Forensics: A Multifaceted Approach
Jeffrey T. Steedle, Pearson CCSSO National Conference on Student Assessment June 21, 2016

Outline Overview of PARCC Approach Non-Statistical Methods
Lessons Learned

Overview of PARCC Approach

Overview of PARCC Approach
Response Change Analysis Non-Statistical Methods Aberrant Response Pattern Detection Internet and Social Media Monitoring PARCC Data Forensics Plagiarism Analysis Off-Hours Monitoring Longitudinal Performance Modeling

Non-Statistical Methods

Internet and Social Media Monitoring
Caveon, L.L.C. monitors Internet sites and forums for potential security breaches. This service generates regular updates and categorizes risk levels. Pearson reviews alerts and takes action with the impacted state (if known) and PARCC, Inc. PARCC States follow their internal security breach procedures when working with districts and schools. Monitoring occurs until the content has been removed.

Off-Hours Monitoring Each state sets the permissible testing hours, and the test administration system does not permit off-hours testing. If a school must conduct off-hours testing, it works with the state to override the system. A report can be pulled at any time to see which schools were granted permission for off-hours testing.

Statistical Methods

Response Change Analysis
There can be no light marks (or “erasure”) analysis for students who test online. Instead, the focus is on instances of points gained (when a student’s item score increased from the initial response to the final response). Instances of points gained (per student) were aggregated to identify atypically high average instances of points gained at the test administrator, school, and district levels.

Response Change Analysis: Example Results
Distributions of average instances of points gained at the school level in spring 2015 On this test, the typical school average instances of “points gained” per student was about 3.

Plagiarism Detection This analysis was conducted for Prose Constructed Response (PCR) tasks administered online using Latent Semantic Analysis (LSA). LSA involves pairwise comparisons of responses and identifies similarity even when responses have synonymous words or phrases. An “exploratory” method was used to identify responses for the analysis (i.e., the schools with atypically high writing performance given their reading performance).

Plagiarism Detection: Example Results
Very few response pairs were flagged for similarity. Matches were most likely to occur for Narrative Writing Tasks, especially those that required students to retell a story from a different character’s perspective. Course LAT RST NWT ELA03 1 2 ELA04 6 ELA05 37 ELA06 ELA07 127 ELA08 3 ELA09 4 5 ELA10 ELA11

Aberrant Response Pattern Detection
Aberrant response pattern detection analysis examines the unusualness of student task scores compared with what would be expected. The proposal was to use the Modified Caution Index (MCI; Harnisch & Linn, 1981; Tendeiro & Meijer, 2013), which quantifies the extent to which higher-scoring students perform poorly on easy tasks and lower-scoring students perform well on difficult tasks. MCI was previously researched and implemented with multiple-choice items.

Aberrant Response Pattern Detection: MCI
X is an examinee’s observed score vector (with raw score r) p is the vector of p-values sorted from easy to difficult MCI = 0.0 when a student gets the r easiest items correct (expected) MCI = 1.0 when a student gets the r hardest items correct (unexpected) The polytomous generalization involves breaking the scores for polytomous items into a series of dichotomous items (e.g., 3 out of 5 points becomes ). 𝑀𝐶𝐼= cov 𝐗 ∗ ,𝐩 −cov(𝐗,𝐩 cov 𝐗 ∗ ,𝐩 −cov( 𝐗 ′ ,𝐩

Aberrant Response Pattern Detection: MCI Null Distribution
One challenge was to decide what MCI values to flag. There is no formula for the MCI null distribution, so it was simulated.

Aberrant Response Pattern Detection: Example Simulation Study Results
In this figure, darkness indicates the proportion of simulated students flagged by MCI (99th percentile criterion). Consider simulated students who got 5 raw score points on the Grade 7 Mathematics Online EOY. If each of those students had 6 aberrant responses, a very small number of them would be flagged by MCI. 77% of those students would be flagged if they had 16 aberrant responses.

What do we learn from this? A lot of simulated aberrant responding is required to get flagged by MCI. Very few students at higher score levels (e.g., 50% correct) ever got flagged.

A standardized log-likelihood person-fit statistic (Drasgow, Levine, & Williams, 1985) appeared to be much more sensitive to simulated aberrant responding than MCI.

Longitudinal Performance Modeling
Longitudinal performance modeling evaluates the performance on PARCC assessments across test administrations and identifies unusual performance gains in the unit of interest (e.g., school or district). The current plan is to use the cumulative logit regression model approach (Clark, Skorupski, Jirka, McBride, Wang & Murphy, 2014) to identify unusual changes in test performance across two consecutive administrations of the PARCC assessment.

Longitudinal Performance Modeling: Cumulative Logit Regression
Predictor Variable Outcome Variable PARCC scale score in the previous grade in the same content area Performance level (1–5) on the current test (treated as an ordinal variable) The cumulative logit regression model (Agresti, 1996) estimates the probability of a student being classified in each performance level, given the student’s performance in the previous grade level. Prior Score Current Level P(Level 1) P(Level 2) P(Level 3) P(Level 4) P(Level 5) 734 4 0.09 0.21 0.37 0.31 0.02

Cumulative Logit Regression
Logits for the first J-1 cumulative probabilities are: where Y is the student’s observed performance level and J is the number of performance levels. The student-level probabilities are then aggregated by summing the probabilities across all students in the unit of interest for each performance level.

Dividing the aggregated probabilities (expected counts) by the total number of students yields the expected proportion of examinees in each performance level. The expected and observed proportions are compared, and a standardized residual is computed to flag units with larger than expected performance gains. Level 1 Level 2 Level 3 Level 4 Level 5 Expected 0.24 0.36 0.30 0.08 0.02 Observed 0.19 0.28 0.37 0.11 0.05 Difference -0.05 -0.08 0.07 0.03

Clark et al. (2014) found that using this approach to detect cases of test misconduct yielded good detection power with conservative Type I error (false positive) rates in a number of simulated conditions. However, the model has never been implemented as a data forensics method in an operational setting. Thus, an exploratory study using data from an operational PARCC administrations is currently planned. An investigation will be conducted using PARCC results from spring 2015 and spring 2016.

Lessons Learned

Response Change Analysis
The response change analysis will continue. There has been some consideration of different rules for identifying “score increases” (e.g., examining final score vs. scores from first response, first non-blank response, response preceding the final response). Some PARCC states have expressed the desire to obtain response documents (for paper testing) and to know the items on which students had score increases.

Plagiarism Analysis Plagiarism on PCRs may be uncommon and/or difficult to perpetrate. If there were any plagiarizers, the method for selecting papers for comparison may have missed them. Since that analysis did not detect any suspicious behavior, future analyses will permit PARCC states to request that certain districts/schools be included in the analysis because other sources of information suggest potential test security violations. Some PARCC states have expressed a desire to receive responses flagged for similarity.

Aberrant Response Pattern Detection
The “simplest” method is not necessarily the best choice. MCI generalized to polytomous items was not sensitive to simulated aberrant responding. In January 2016, the PARCC TAC did not endorse MCI as an aberrant response pattern detection method. PARCC will calculate standardized log-likelihood person-fit statistics to identify aberrant responding starting in 2016.

Contact Jeffrey T. Steedle

PARCC Data Forensics: A Multifaceted Approach

Similar presentations

Presentation on theme: "PARCC Data Forensics: A Multifaceted Approach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PARCC Data Forensics: A Multifaceted Approach

Similar presentations

Presentation on theme: "PARCC Data Forensics: A Multifaceted Approach"— Presentation transcript:

Similar presentations

About project

Feedback