Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.

Slides:



Advertisements
Similar presentations
WMS-IV Wechsler Memory Scale - Fourth Edition
Advertisements

Test Development.
Analyzing and Using ISAT Data - Mathematics C MATH 2, Inc. Claran Einfeldt Cathy Carter.
S MARTER B ALANCED A SSESSMENT U PDATES Derek Brown, Director of Assessment Oregon Department of Education.
January 22, /25/ STAAR: A New Assessment Model STAAR is a clearly articulated assessment program. Assessments are vertically aligned within.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
Implementing Virginia’s Growth Measure: A Practical Perspective Deborah L. Jonas, Ph.D. Executive Director, Research and Strategic Planning Virginia Department.
Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.
Communicating through Data Displays October 10, 2006 © 2006 Public Consulting Group, Inc.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.
Chapter 6 Reproducibility: duplicate measurements of the same individual in the same situation and time frame. Validity: comparison of questionnaire data.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
About the tests PATMaths Fourth Edition:
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
The Accuracy of Small Sample Equating: An Investigative/ Comparative study of small sample Equating Methods.  Kinge Mbella Liz Burton Rob Keller Nambury.
Validation of the Assessment and Comparability to the PISA Framework Hao Ren and Joanna Tomkowicz McGraw-Hill Education CTB.
FCAT 2.0 and End-of-Course Assessments 1 Kris Ellington Deputy Commissioner Division of Accountability, Research and Measurement 850/
How Can Teacher Evaluation Be Connected to Student Achievement?
The Impact of Including Predictors and Using Various Hierarchical Linear Models on Evaluating School Effectiveness in Mathematics Nicole Traxel & Cindy.
Qatar Comprehensive Educational Assessment (QCEA) 2007: Summary of Results.
Chapter 3 Understanding Test Scores Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition.
SAT 10 (Stanford 10) 2013 Nakornpayap International School Presentation by Ms.Pooh.
 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.
Statistics for the Behavioral Sciences, Sixth Edition by Frederick J. Gravetter and Larry B. Wallnau Copyright © 2004 by Wadsworth Publishing, a division.
Stats/Methods I JEOPARDY. Jeopardy CorrelationRegressionZ-ScoresProbabilitySurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.
1 Watertown Public Schools Assessment Reports 2010 Ann Koufman-Frederick and Administrative Council School Committee Meetings Oct, Nov, Dec, 2010 Part.
Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.
© 2007 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium WIDA Focus on Growth H Gary Cook, Ph.D. WIDA.
Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.
1 Inferences About The Pearson Correlation Coefficient.
35th Annual National Conference on Large-Scale Assessment June 18, 2005 How to compare NAEP and State Assessment Results NAEP State Analysis Project Don.
Chapter 16 Data Analysis: Testing for Associations.
Copyright © 2010, SAS Institute Inc. All rights reserved. How Do They Do That? EVAAS and the New Tests October 2013 SAS ® EVAAS ® for K-12.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
University of Ostrava Czech republic 26-31, March, 2012.
Welcome to MMS MAP DATA INFO NIGHT 2015.
Gary W. Phillips American Institutes for Research CCSSO 2014 National Conference on Student Assessment (NCSA) New Orleans June 25-27, 2014 Multi State.
1 Maximizing Predictive Accuracy of District Benchmarks Illuminate Education, Inc. User’s Conference Aliso Viejo, California June 4&5, 2012.
LECTURE 14 NORMS, SCORES, AND EQUATING EPSY 625. NORMS Norm: sample of population Intent: representative of population Reality: hope to mirror population.
Vertical Articulation Reality Orientation (Achieving Coherence in a Less-Than-Coherent World) NCSA June 25, 2014 Deb Lindsey, Director of State Assessment.
Essentials for Measurement. Basic requirements for measuring 1) The reduction of experience to a one dimensional abstraction. 2) More or less comparisons.
Describing a Score’s Position within a Distribution Lesson 5.
Kentucky Board of Education OAA 10/9/12 1 Assessment and Accountability Update.
A Comparison of Marking Levels in the Reviewed Majors.
Terra Nova By Tammy Stegman, Robyn Ourada, Sandy Perry, & Kim Cotton.
STAR Reading. Purpose Periodic progress monitoring assessment Quick and accurate estimates of reading comprehension Assessment of reading relative to.
1 Testing Various Models in Support of Improving API Scores.
Understanding the Results Ye Tong, Ph.D. Pearson.
Policy Definitions, Achievement Level Descriptors, and Math Standards.
Assessments for Monitoring and Improving the Quality of Education
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Student Growth Measurements and Accountability
Next-Generation MCAS: Update and review of standard setting
California Educational Research Association
MANA 4328 Dr. Jeanne Michalski
Week 3 Class Discussion.
English Language Development Assessment (ELDA)
Linear transformations
Timeline for STAAR EOC Standard Setting Process
Cardinal Convo April &
Formative Assessments Director, Assessment and Accountability
Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,
15.1 The Role of Statistics in the Research Process
Student Assessment Updates
Driven by Data to Empower Instruction and Learning
Presentation transcript:

Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL

Objectives for State Testing Review Committee  Review cut score validation process  Review modifications to grade 8 mathematics cut scores  Review grade 5 mathematics cut score proposal  Review new ISAT reporting scale proposal

Purpose of the Validation Panel  Review cutoff values in relationship to  Performance definitions  Assessment frameworks  Representative item difficulties  District/school performance profiles  Judge the reasonableness of the proposed cutoffs

Process  Review development of ISAT performance levels  Review equating process to SAT-10 vertical scale  Consider proposed cutoffs in relationship to frameworks, definitions, data  Evaluate the reasonableness of the outcomes

Constraints  NCLB requirements for continuity in school/district accountability  Grades 3, 5, 8 frame possibilities for grades 4, 6, 7

Outcome

ISAT/SAT-10 Bridge Study

Bridge Study Purposes  SAT-10 will be the basis for the enhanced ISAT vertical scales  Vertical scale will be the basis for establishing intermediate grade performance levels (reading/math grades 4, 6, 7)

Description of the Equating Sample  9793 SAT-10 Records  9 districts  47 schools  grades 3, 4, 5, 7, 8  9793 SAT-10 Records  9 districts  47 schools  grades 3, 4, 5, 7, 8

ISAT/SAT-10 Data Record Match

Descriptive Statistics for Matched Samples: Reading Tests

Descriptive Statistics for Matched Samples: Mathematics Tests

Descriptive Statistics for Matched Samples: Science Tests

Statistical Comparison of Matched Sample SAT-10 Means With SAT-10 Spring Norms

Statistical Comparison of Matched Sample ISAT Means With ISAT 2005 Test Population

Correlation Between Corresponding ISAT/SAT-10 Ability Estimates

Equating Methods  Rasch (outliers included)  Rasch (outliers excluded)  Equipercentile  Equipercentile with linear smoothing  Rasch (outliers included)  Rasch (outliers excluded)  Equipercentile  Equipercentile with linear smoothing

Equating Outcomes  Rasch/Equipercentile results similar with respect to cut points  Rasch:  Equipercentile:  Outlier impact  Tended to eliminate higher ISAT/lower SAT-10 scores  Inconsistent with past practice  Rasch/Equipercentile results similar with respect to cut points  Rasch:  Equipercentile:  Outlier impact  Tended to eliminate higher ISAT/lower SAT-10 scores  Inconsistent with past practice

Constants For Transforming ISAT Scales to the SAT-10 Vertical Scales

Table for Transforming ISAT Scores to SAT-10 Vertical Scales

Transformation of the Performance Level Cut Scores

ISAT Performance Category Score Ranges on the Current Scales

ISAT Cut Scores Expressed on the SAT-10 Vertical Scales (Lower Bounds)

Reading Cut Scores Expressed on the SAT-10 Vertical Scale

Science Cut Scores Expressed on the SAT-10 Vertical Scale

Mathematics Cut Scores Expressed on the SAT-10 Vertical Scale

Adjustment to the Grade 8 Mathematics Cut Scores

 Large Discrepancies Across Grades (Grade 3: 79%; Grade 5: 73%; Grade 8: 54%)  Multiple Panels Used in the Original Derivation (162, 149)  Discrepancies in National Percentile Ranks  Large Discrepancies Across Grades (Grade 3: 79%; Grade 5: 73%; Grade 8: 54%)  Multiple Panels Used in the Original Derivation (162, 149)  Discrepancies in National Percentile Ranks Reasons for Reexamining the Cut Scores

National Percentile Ranks Corresponding to Each ISAT Cut Score

Estimating Performance Level Cut Scores for Intermediate Grades

Reading Cut Scores With Interpolations for All Grades

Mathematics Cut Scores With Interpolations for All Grades

The Evaluation Process  Review performance definitions and assessment frameworks  Review difficulty-ordered item booklets  Review district/school performance profiles  Make initial judgments  Large group discussion  Final judgments

Difficulty-Ordered Item Booklets  Represent a range of probable performances by students at different levels  Primarily represent the Below Standards/ Meets Standards performance range

Difficulty-Ordered Reading Booklets

Difficulty-Ordered Mathematics Booklets

District 1 Performance Profile

District 2 Performance Profile

School 1 Performance Profile

Evaluation Worksheet

Evaluation Outcomes

Panel Recommendations for Grade 5 Exceeds Cut Score in Mathematics  Large Discrepancies Across Grades— Grade 3: 34%; Grade 5: 12%; Grade 8: 17% (before adjustment, but about 29% after adjustment)  Multiple Panels Used in the Original Derivation — Current cut score uses higher of two panels’ recommendations. Other panel would have about 33%.  Exceeds items in the meets performance range  Discrepancies in National Percentile Ranks  Large Discrepancies Across Grades— Grade 3: 34%; Grade 5: 12%; Grade 8: 17% (before adjustment, but about 29% after adjustment)  Multiple Panels Used in the Original Derivation — Current cut score uses higher of two panels’ recommendations. Other panel would have about 33%.  Exceeds items in the meets performance range  Discrepancies in National Percentile Ranks

National Percentile Ranks Corresponding to Each ISAT Cut Score

Options to Consider  Anchor existing grade 3 and adjusted grade 8. Interpolate intermediate grades  Anchor on adjusted grade 3 and adjusted grade 8.  Anchor existing grade 3 and adjusted grade 8. Interpolate intermediate grades  Anchor on adjusted grade 3 and adjusted grade 8.

Defining the 2006 ISAT Reporting Scales

Enhanced ISAT Vertical Scale  Linear transformation of the SAT-10 vertical scale  Unit size  Range consideration  Linear transformation of the SAT-10 vertical scale  Unit size  Range consideration

ScenarioScenario  Unit size approximately twice that of the current scale  Anchor lower end of grade 3 (reading, mathematics) or grade 4 (science) scale at 120  How would things have looked in 2005 if the new reporting scale were used?  Unit size approximately twice that of the current scale  Anchor lower end of grade 3 (reading, mathematics) or grade 4 (science) scale at 120  How would things have looked in 2005 if the new reporting scale were used?

Reading

Mathematics

Science

Reading Means

Mathematics Means

Science Means

Score Distribution (3, 5, 8)

Score Distribution (4,7)

Constants for Transforming SAT-10 Scales to New ISAT Reporting Scales

Constants for Transforming Existing (1999) ISAT Scales to New (2006) ISAT Reporting Scales