CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

CAROLE GALLAGHER, PHD. CGALLAG@WESTED.ORG CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change: Guidance from the Joint Standards

Purpose of this Presentation Share reminders about responsible reporting practices in current context Provide recommendations for the types of information to be communicated to stakeholders (including media) in times of change Focus on particular recommendations from updated Standards for Educational and Psychological Testing (2014) 2

Presentation Origins CCSSO sponsored original work, with support from state members in the Accountability Systems Reporting special interest group (ASR SCASS) Outcome was white paper (Gallagher, 2012) intended to provide guidance to states and other jurisdictions to promote responsible reporting of findings from measures of teacher effectiveness 3

Seminal Resources at the Core of this Work Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999 and 2014) AERA Code of Ethics (2011) GRE Guide to the Use of Scores (ETS, 2011) Findings from NAEP validity studies (NCES, 2003) ED Information Quality Guidelines ED Peer Review Guidance (2009) Researchers such as Colmers, Goodman, Hambleton, Zenisky, Aschbacher, and Herman 4

Key Reminders from Joint Standards When tests are revised, users should be informed of the changes, any adjustments made to the score scale, and the degree of comparability of scores from the original and revised tests. (Standard 4.25) When substantial changes to tests occur, scores should be reported on a new scale, or a clear statement should be provided to alert users that the scores are not directly comparable with those on earlier versions of the test. (Standard 5.20) 5

Key Reminders from Joint Standards When substantial changes are made to a test, documentation should be amended, supplemented, or revised to provide stakeholders with useful information and appropriate cautions. (Standard 7.14) When an alteration to a test have occurred, users have the right to information about the rationale for that change and empirical evidence to support the validity of score interpretations from the revised test. (Standard 9.9) 6

Implications During Times of Change Changes to a core assessment component must be transparent and communicated to stakeholders via reporting tools  Test purpose (same test, new purpose)  Target population (same test, changing population)  Content assessed (new test, same purpose)  Item types (same content, new techniques)  Delivery method/mode of administration (same content, new techniques)  Scoring methods (new test, new techniques)  Performance expectations (new test, new expectations) 7

Changes in Test Purpose Examples of changes in test purpose:  Measure growth as well as status  Use for accountability at the state, school, or teacher levels  Use to determine readiness for college and career  Use for placement decisions at the K–12 or post-secondary levels What should be communicated to stakeholders?  Procedures for constructing indices of growth (e.g., gain score)  Rationale for use for this purpose and how scores will be interpreted  Resulting changes to tested content or frequency of testing  Technical evidence to support test use for this purpose  Findings from analyses of decision consistency if used for classification  How consequences are anticipated and monitored (stakes may change) 8

Changes in Population Tested Examples of changes in target population:  Census instead of self-select (e.g., ACT or SAT)  Students with disabilities formerly known as 2%  Changes in English learner population, new translations What should be communicated to stakeholders?  Documentation of development practices (e.g., universal design)  Ways in which potential sources of construct-irrelevant variance were evaluated and emerging sub-group differences will be examined  Norming decisions and practices  Technical evidence to support use for this population  Information about administration and scoring methods  How consequences are anticipated and monitored 9

Changes in Content Assessed Examples of changes in tested content:  Any new or revised standards on which tests are based  CCSS-like standards (focus on CCR, includes listening & speaking, includes practice or process standards)  NGSS-like standards What should be communicated to stakeholders?  Documentation of stakeholder involvement in development  Technical evidence related to content validity, e.g., findings from blueprint analyses or studies of alignment  Plan for communicating shifts in what is assessed at each grade  Plan for mitigating threats associated with opportunity to learn  How consequences are anticipated and monitored 10

Changes in Item Types Examples of new item types used on assessment:  Performance tasks  Technology-enhanced items What should be communicated to stakeholders?  Rationale for use/measurement theory of action  Ways in which potential sources of construct-irrelevant variance were identified and addressed  Documentation about development, administration, and scoring  Findings from small-scale tryouts, pilot, and field testing (item-level data)  Plan for analyzing and evaluating potential sub-group differences that may emerge 11

Changes in Item Delivery Method Examples of new delivery methods:  Mix of computer supported and paper-pencil  Computer adaptive What should be communicated to stakeholders?  Rationale for new approach  Ways in which potential sources of construct-irrelevant variance were considered and addressed  Detailed administration and scoring guidance  Findings from analyses for scores produced under each method  Plan for analyzing and evaluating potential sub-group differences that may emerge  Documentation of how comparability of inferences intended to be drawn will be evaluated and findings used 12

Changes in Scoring Practices Examples of changes in scoring practices:  Use of artificial intelligence (AI) scoring of text  Teacher-scoring of performance tasks  Reporting at claim or other subscore level  Combining unlike measures into composite What should be communicated to stakeholders?  Research supporting use of particular methods or rubrics  Qualifications and training of scorers  Reliability estimation procedures consistent with test structure  Precision and SEM for all scores  Findings from studies of inter-rater reliability  Documentation of rationale/methods for assigning weights 13

Changes in Performance Expectations Examples of changing performance standards:  Test now used for new purpose (stakes have changed)  Content rigor has changed  Changes in scaling decisions or in meaning of scale scores  New cut-scores What should be communicated to stakeholders?  New guidance for interpreting scores  Information about scale rationale and properties  Documentation about standard setting and SEMs in vicinity of each cut-score  Evidence of the validity of score interpretations for subgroups  How consequences are anticipated and monitored 14

In Summary... Changes to a number of assessment elements can affect the validity of inferences drawn from scores Lack of transparency can undermine stakeholder trust in the assessment system Responsible testing practices in times of change call for states to keep stakeholders informed Evidence collected should inform test users about the technical quality of the new test, its fairness for the targeted population, how scores should be interpreted, and appropriate uses of test result 15

White Paper and Other Resources 16

Details in White Paper Exemplary practices in number of states and large districts Long-standing guidance in developing comprehensive reports, based on research and best practices State-of-the-states in terms of laws, policies, and regulations that can have an impact on reporting practices 17

Key Report Features Discussed Purpose and target audience(s) Measures from which results are reported Scoring, rating, and performance levels How score was calculated and/or performance rated (e.g., criteria used) Interpretation of results Use of report or database 18

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

Similar presentations

Presentation on theme: "CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

Similar presentations

Presentation on theme: "CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:"— Presentation transcript:

Similar presentations

About project

Feedback