Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013.

Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013

Outline of Presentation  FLDOE Data Forensics Goals  Data Forensics (DF) Process  Student-Level Invalidations  School-Level Flags and Requests  Q&A

FLDOE Data Forensics Goals  Uphold fairness and validity of test results “I believe that those of us in the field of assessment must now take even greater leadership on the issue of test data integrity.” Dr. Greg Cizek, NCME President 2012-13 CCSSO Conference, June 2013 “I…urge you to do everything you can to ensure the integrity of the data used to measure student achievement.” Arne Duncan US Secretary of Education Letter to Chief State School Officers, June 2011

Other Goals  “Measure and Manage”  Identify risks and irregularities  Take action based on data and analysis  Communicate zero tolerance of misbehavior to students and educators

Data Forensics Process  Analyses of test data  First building a “model” of typical question responses  Identify unusual patterns which indicate test scores may not be trustworthy  Examples…

Prescriptions for Use of Data Forensics OBP Guidebook Handbook

“I urge us to reframe our concerns about test data integrity not as cheating concerns, but as a validity issue.” Dr. Greg Cizek, NCME President 2012-13 CCSSO Conference, June 2013

Testing Examiner’s Role  Ensure (and then certify) the test administration is fair and proper  Declare scores invalid when fairness and validity are negatively impacted  Decision depends upon fairness and validity, not whether an individual cheated

FLDOE Data Forensics  Focus on two groups  Students  Schools  Administrations  EOC  FCAT 2.0  FCAT Retakes  Utilize VERY conservative thresholds

Conservative Thresholds “…it seems to make the most sense to prioritize the allocation of resources.” “If an assessment budget only permits investigating the ‘worst of the worst’, then those resources should allocated digging deeper into possible test data invalidity whether that means instances that exceed a 5 SD criterion, the 20 most outlying test centers, 1% of classrooms, or whatever the resources will allow.” Dr. Greg Cizek NCME President 2012-13 CCSSO Conference, June 2013

A quick discussion of conservative thresholds….  Chance of being hit by lightning = 1 in a million  Chance of winning the lottery = 1 in 10 million  Chance of DNA false-positive = 1 in 30 million to 1 in a billion  Chance of tests being flagged and taken independently = 1 in a TRILLION

Statistics Used  Similarity  Erasures  Gains/Losses

Similarity  Our Most Powerful & “Credible” Statistic  Measures degree of similarity between 2 or more test instances  Analyze each test instance against all other test instances in the same school

Erasures  Based on estimated answer changing rates from:  Wrong-to-Right  Anything-to-Wrong  Find answer sheets with unusual WtR answers  Extreme statistical outliers could involve tampering, “panic cheating”, etc.

Unusual Gains/Losses  Predict score using prior year info.  Measure large score increases/decreases against predicted score  Extreme Gains/Losses may result from:  Pre-knowledge, ie “Drill It and Kill It”  Coaching  Student development—visual acuity

Student-level Analysis  Similarity Analysis only  Most credible, strongest  No flagging for erasures or gains  Invalidate test scores with Similarity Index ≥ 12  Notification letters to be sent to parents  Supporting info sent to districts  Chances of seeing two (or more) students’ tests so similar, with each doing his/her own work: 0.000000000001

Steps for Calculating Similarity 1.Analyze each grade/subject statewide to create a model of “normal” test taking behaviors 2.Use student’s performance to compute the probability of an incorrect/correct answer on all items 3.Calculate the probability that two students will answer an item identically  Expected Identical Correct/Incorrect  Observed Identical Correct/Incorrect much 4.Tests are flagged when the number of identical responses is much greater than expected

Example of Flagged Students

Example: 5 th Grade Reading Cluster  Identifies possible collusion  2 students passed, but break the assumption of independent test taking  ie, the results are not trustworthy.

Invalidation Support Materials  District Invalidation Spreadsheet  Draft Parent Notification Letter  Appeals Guide  Similarity “Cluster” Spreadsheets (sent upon request by FL DOE)

District Invalidations Spreadsheet

Similarity Cluster Spreadsheets*  Explanations  Examinees  Summary results  Alignment  Actual responses *Similarity “Cluster” Spreadsheets sent upon request from FL DOE

Alignment Detail Letters = identical correct Numbers = identical incorrect

School-Level Analysis “… I am talking about educator cheating. I don't know if its 5% or 10%, but I doubt it's 1/10 of 1%. What I do know for certain is that there is uniformly more cheating than we think there is.” Dr. Greg Cizek, NCME President 2012-13 CCSSO Conference, June 2013

School-Level Analysis  Similarity, Erasures, and Gains  Flagged schools conduct internal review  Extreme instances may prompt formal investigations and sanctions

School-Level Student Data  One row of data per test result (student & subject)  Identifying information  Student  Test result  Similarity information  Erasure information  Other information (if available)

Identifying Information Caveon IDUIN Student IDSubject PAS or CBT IDTest Name Last NameCore Form First NameTest Form MITest Group GradeTest Date DistrictRaw Score SchoolScale Score District NamePassed School NameAchievement Level

Similarity Information Caveon IDExpected Incorrect Similarity IndexPercent Match Similarity ClusterClosest Similarity Index Cluster Identifier**Closest Match ID Cluster Index**Closest Last Name Matching Test IDClosest First Name Questions in CommonSource-Copier Index** Correct MatchesDominant Score** Incorrect MatchesNon-Dominant Score** Expected CorrectStandardized Difference** ** Only present when Similarity Index > 12

Was the Similarity Due to  Small groups or large groups?  Answer copying?  Communication between students?  Poor proctoring?  Something else?  Disclosure of answers  Buddy system  “Chunking and redirecting”

Snippet of Similarity Information Caveon ID Similarity Index Similarity Cluster Matching Test ID Correct Matches Incorrect Matches Expected Correct Expected Incorrect Percent Match Closest Similarity Index 235090237368145835.480.3 235101.9_0b157dfb2373710177743.551.9 235111.7_98b46c98236952212937.11.7 235129.1_b07ff4d02368162921056.459.1 235131.6_e14b8a1d2361661011225.811.6 235140235616153933.870.3 23515023531111410640.320.6 235160235522163729.030.6 235172.7_b07ff4d0236817193941.942.7 235182.5_d62f76512352511177745.162.5 2351916.2_414e1fe92352618289774.1916.2

Answering the questions  Sort/filter by Similarity Cluster  How many clusters?  What are the sizes?  What are the index values?  Plot matches on seating chart  Are students close?  Is there a pattern?  Teachers and/or other groupings

Patterns to check on seating charts  Separation, associations, and index values  Index values above 5  Tight groups (communication/pencil tapping?)  “Close” pairs (answer copying/blind spots?)  Wide separation (cell phones?)  Index values between 3 and 5  Content disclosure  Chunking/redirecting  Index values below 3  Separated pairs are probably noise  Larger groups could indicate something else

Clustering examples - #1  10 pairs, 3 triplets, 1 quad, 1 quint, 1 (23), 1 (43)  For clusters > 3, use Matching IDs to create sub- clusters  If some index values are very high, filter out the very small index values Caveon ID Similarity Index Similarity Cluster Matching Test ID 235101.944_0b157dfb23737 235971.6601_0b157dfb23740 2373637.6168_0b157dfb23737 37.6168_0b157dfb23736 2374015.6448_0b157dfb23737

Clustering examples - #2  Split large groups  If some index values are very high, filter out the very small index values Caveon ID Matching Test ID Closest Similarity Index 236812370110.24 237012368110.24 23512236819.117 23762236819.073 23704236816.762 23709236816.675 23677237012.915 23565235122.754 23517236812.653 23755237092.642 23628237012.388 23590237552.168 23690235122.139 23661237011.735 23601237091.471 23728236901.345

Erasure Information  Only present for paper-and-pencil  Students do not erase frequently  Use seating charts and student associations Caveon ID Erasure Index Wrong-to-Right Erasures Any-to-Wrong Erasures Right-to-Wrong Erasures Wrong-to-Wrong Erasures WTR Delta Flag WTR Delta Difference

Summary  Goal: Fair and valid testing for all students  DOE to conduct Data Forensics on FCAT test data  Focus on  Individual students -- extremely simililar tests  Schools—Similarity, Gains, and Erasures

Follow Up Questions? Victoria.Ash@fldoe.org

Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013.

Similar presentations

Presentation on theme: "Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013.

Similar presentations

Presentation on theme: "Overview of the FL DOE Data Forensics Program Steve Addicott, Vice President Dennis Maynes, Chief Scientist Caveon Test Security October 29, 2013."— Presentation transcript:

Similar presentations

About project

Feedback