Comparability of Assessment Results in the Era of Flexibility

Slides:



Advertisements
Similar presentations
RIDE – Office of Special Populations
Advertisements

STATE STANDARDIZED ASSESSMENTS. 1969The National Assessment for Educational Progress (NAEP) administered for the first time, Florida participated in the.
On The Road to College and Career Readiness Hamilton County ESC Instructional Services Center Christina Sherman, Consultant.
VALUE – ADDED 101 Ken Bernacki and Denise Brewster.
The State of the State TOTOM Conference September 10, 2010 Jim Leigh Office of Assessment and Information Services Oregon Department of Education.
12 Ways MAP Data Can Be Used in a School. 12 Ways To Use MAP Data Monitor Academic Growth Using National Norms Identify Individual Reading Pathway using.
NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,
CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction California Assessment Update California Mathematics Council.
Ohio’s Assessment Future The Common Core & Its Impact on Student Assessment Evidence by Jim Lloyd Source doc: The Common Core and the Future of Student.
Parent Training California Assessment for Student
Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.
TOM TORLAKSON State Superintendent of Public Instruction National Center and State Collaborative California Activities Kristen Brown, Ph.D. Common Core.
Charles Pack Jr. WorkKeys and KeyTrain Help Make The Academy of Careers and Technology A West Virginia Exemplary School.
Assessing Students With Disabilities: IDEA and NCLB Working Together.
Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Idaho State Department of Education Accessing Your ISAT by Smarter Balanced Data Using the Online Reporting System (ORS) Angela Hemingway Director, Assessment.
CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Santa Clara COE Assessment Accountability Network September.
Smarter Balanced Assessment System March 11, 2013.
Standards-Based Assessment Overview K-8 Fairfield Public Schools Fall /30/2015.
Guide to Test Interpretation Using DC CAS Score Reports to Guide Decisions and Planning District of Columbia Office of the State Superintendent of Education.
Understanding Alaska Measures of Progress Results: Reports 1 ASA Fall Meeting 9/25/2015 Alaska Department of Education & Early Development Margaret MacKinnon,
Georgia’s Changing Assessment Landscape Melissa Fincher, Ph.D. Associate Superintendent for Assessment and Accountability Georgia Department for Education.
Future Ready Schools National Assessment of Educational Progress (NAEP) in North Carolina Wednesday, February 13, 2008 Auditorium III 8:30 – 9:30 a.m.
1 Georgia’s Changing Assessment Landscape Melissa Fincher Associate Superintendent for Assessment and Accountability Georgia Department for Education GACIS.
Vertical Articulation Reality Orientation (Achieving Coherence in a Less-Than-Coherent World) NCSA June 25, 2014 Deb Lindsey, Director of State Assessment.
Building an Interim Assessment System: A Workbook for School Districts CCSSO National Conference on Student Assessment Detroit, MI June 22, 2010.
KHS PARCC/SCIENCE RESULTS Using the results to improve achievement Families can use the results to engage their child in conversations about.
1 Perspectives on the Achievements of Irish 15-Year-Olds in the OECD PISA Assessment
Breakout Discussion: Every Student Succeeds Act - Scott Norton Council of Chief State School Officers.
Outcomes By the end of our sessions, participants will have…  an understanding of how VAL-ED is used as a data point in developing professional development.
1 Testing Various Models in Support of Improving API Scores.
Project Update: Next-generation MCAS
You Can’t Afford to be Late!
Next Generation Iowa Assessments
Assessment to Support Competency-Based Pathways
American Institutes for Research
Quarterly Meeting Focus
Smarter Balanced Assessment Results
Student Growth Measurements and Accountability
Understanding the Next-Generation MCAS
Comparability of Assessment Results in the Era of Flexibility
Understanding the Next-Generation MCAS
ASSESSMENT OF STUDENT LEARNING
Release of PARCC Student Results
NWEA Measures of Academic Progress (MAP)
Language Arts Assessment Update
2015 PARCC Results for R.I: Work to do, focus on teaching and learning
Bursting the assessment mythology: A discussion of key concepts
Validating Interim Assessments
Louisiana’s Comprehensive Assessment System
Common Core Update May 15, 2013.
Understanding the Next-Generation MCAS
PARCC Assessments Overview
Standard Setting for NGSS
Understanding the Next-Generation MCAS
SAT and Accountability Evidence and Information Needed and Provided for Using Nationally Recognized High School Assessments for ESSA Kevin Sweeney,
Summative: Formative resources: Interim Assessments:
Timeline for STAAR EOC Standard Setting Process
Shasta County Curriculum Leads November 14, 2014 Mary Tribbey Senior Assessment Fellow Interim Assessments Welcome and thank you for your interest.
Brian Gong Center for Assessment
Connecting OSAS Math Results to Instruction and Program Evaluation
NAEP and International Assessments
Michigan School Testing Conference
Assessment Literacy: Test Purpose and Use
Georgia’s Changing Assessment Landscape
Office of Strategy, Innovation and Performance
State Assessment Update
What it IS, What it means, What it Offers
Assessing Students With Disabilities: IDEA and NCLB Working Together
Presentation transcript:

Comparability of Assessment Results in the Era of Flexibility Jessica Baghian, Louisiana Department of Education Brian Gong, Center for Assessment Jeffrey Nellhaus, Parcc Inc. CCSSO National Conference on Student Assessment Austin, Texas June 28, 2017

Topics for this Session Jeff Nellhaus, Parcc Inc. How comparable are State and NAEP standards for proficiency? What purposes are served by comparability? Jessica Baghian, Louisiana Department of Education How has Louisiana achieved comparability to other states? Why is comparability to other states important to Louisiana? Brian Gong, Center for Assessment What are the challenges for achieving comparability? What approaches can be taken to address the challenges?

How Comparable are State and NAEP Standards for Proficient Performance? Beginning in 2005, NCES began to report states’ cut scores for proficiency in terms of their NAEP scale score equivalents For example, if a state reported 60% of its students performing Proficient or above on its grade 8 math assessment, NCES determined what the cut score for Proficient on the NAEP scale would have to be for 60% of the students in the state to perform Proficient or above on the NAEP grade 8 math test The NCES report helped answer the question: “Is State X’s standard for proficiency comparable to NAEP’s standard for proficiency?”

NAEP Scale Score Equivalent of State Cut Scores for Proficient Performance: 2009, Grade 8 Mathematics 299 300 262 229 *See slide 12 for notes and sources.

The Good News … Between 2009 and 2015, states’ cut scores for proficiency, in terms of their NAEP scale score equivalents, became Closer to the NAEP cut score for proficiency Higher on average Closer to each other While most states’ cut scores for proficiency in 2015 remained in NAEP’s Basic Level, between 2009 and 2015, the number of states with cut scores in NAEP’s Below Basic Level decreased Proficient Level increased

Change in States’ NAEP Equivalent Cut Scores for Proficiency 2009–2015 *See slide 12 for notes and sources.

NAEP Scale Score Equivalent of State Cut Scores for Proficient Performance: 2009, Grade 8 Mathematics 299 300 262 229 *See slide 12 for notes and sources.

NAEP Scale Score Equivalent of State Cut Scores for Proficient Performance: 2013, Grade 8 Mathematics *See slide 12 for notes and sources.

Some Reasons why States’ Performance Standards for Proficiency Have Become More NAEP-Like Waivers provided by USED for setting improvement targets for school and district accountability Public reporting of states’ NAEP equivalent cut scores for proficiency exposed the variability in states’ standards New generation tests based on college- and career-ready content and performance standards The use of NAEP and other external benchmarks of college- and career- readiness in standard-setting for new generation tests

Use of NAEP Results in Standard-Setting for PARCC The PARCC Assessment RFP in 2013 required that its standard-setting process be informed by established benchmarks for proficiency and college- and career-readiness “The offeror shall describe a set of benchmarks to inform standard setting, including the percentage of students at or above proficient on the most recent NAEP assessments the college-readiness benchmarks on ACT and SAT relevant benchmarks on international assessments the college- and career-ready benchmark on SBAC assessments”

What Purposes are Served by Comparability ? Credibility Comparability to build & maintain public support Comparability to other states, NAEP, other tests – SAT, ACT, TIMSS Accountability Comparability for making high stakes decisions Comparability across forms – within and across years, and across paper- and computer-based test forms Trend Comparability to report change over time Comparability across forms, across years, and to former testing program Research/Best Practice Comparability for research, identify best practices NAEP plays this role, but data is needed at school and district level and more frequently Reason for Consortia – common measure

Notes and Sources Notes Sources Figure on slide 4: In Nebraska, each district develops local assessments to report on standards. Therefore, the state was not included in the analyses. California was not included because the state does not test general mathematics. Figure on slide 6: Nebraska was not included in the 2009 analysis because it did not offer a statewide assessment to report on standards. Figure on slide 8: California and Virginia were not included because the states do not assess general mathematics in grade 8. Sources Phillips, G. (2016). National Benchmarks for State Achievement Standards. American Institutes for Research. Washington, DC U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2013 Mathematics Assessments. U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2009, 2011, 2013 Mathematics Assessments. U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2009 Mathematics Assessments. U.S. Department of Education, Office of Planning, Evaluation and Policy Development, EDFacts SY 2008–09, Washington, DC, 2010. The National Longitudinal School-Level State Assessment Score Database (NLSLSASD) 2010.

Contact Information Jeffrey Nellhaus jnellhaus@parcconline.org

Louisiana’s Comprehensive Assessment System

Ensuring Comparability with Other States Louisiana students are just as smart and capable as any in the country. However, academic results have not always afforded Louisiana’s students the opportunities of their peers in other states.

Goals of Louisiana's System Purpose Approach High quality, fully aligned content in the shortest form possible Set an instructional vision Hold our system accountable for rigorous reading, writing, and math learning Fully assess the complete scope of standards Reduce assessment times where possible Find and use the best items to build forms Modified, shortened form Comparability Equity that ensures Louisiana students are held to the same expectations as students anywhere Credibility so that parents know results mean the same in Louisiana as in other states Maintain stable scale and cut scores Use admin, scoring, and processing rules Run an external audit Cohesion grade to grade and within a grade Grade to grade students experience the same quality of items and results to monitor growth Within a grade, provide information on progress to mastery in order to adjust instruction Provide K-high school summative assessment Provide aligned diagnostics and interims for each grade

Ensuring Comparability with Other States: Areas Studied To ensure Louisiana’s claims of comparability were defensible, the Department conducted a third party validation by the Center for Assessment. The Center studied four core areas:

Ensuring Comparability with Other States: Claims Studied The Center also studied three claims:

Assistant Superintendent, Louisiana Department of Education Contact Information Jessica Baghian Assistant Superintendent, Louisiana Department of Education jessica.baghian@la.gov

Comparability Challenges and Solution Approaches for Emerging Assessment and Accountability Options Brian Gong Center for Assessment Presentation in the session on “Establishing Comparability of Assessment Results in an Era of Flexibility ” CCSSO National Conference on Student Assessment June 28, 2017 Austin, TX

Overview Description of context of new calls for comparability Three issues, some possible solution approaches Summary Comparability - Gong

Comparability & interpretation The key to comparability is interpretation and use – we want enough comparability to support our intended interpretations and uses Deep knowledge in measurement field about what affects comparability, what types of interpretations can be supported, what methods may be used to promote and evaluate comparability of scores and interpretations However, new desired uses/context and new types of assessments challenge us to consider what we mean by “comparable” and how to support interpretations of comparability with new methods Comparability - Gong

“Comparability” – We assume it In almost every test interpretation and use today, we assume that test scores are comparable We aggregate scores We interpret trend in performance over time We compare individuals and groups to each other We produce derivative scores that assume we can mathematically operate on multiple scores (e.g., index, growth, value-added) We make policy decisions and take practical actions upon the basis of these test score interpretations (e.g., school accountability, teacher evaluation, instructional intervention) Comparability - Gong

BUT… we are uneasy Because we also want many conditions that are not strictly the same (standardized) Different test forms for security Different test forms for efficiency (e.g., CAT) Different test forms for validity (sampling of domain) Different items, translations/formats, cognitive demand, and administration conditions for validity (accommodations, special populations) Different tests for policy and other reasons (each state; ACT/SAT; NAEP; TIMSS/PISA; AP/IB; Common Core?) Different tests across time Different tests across populations Different tests across time and populations Comparability - Gong

In addition, we want Different content/skills as grades progress Individual choice for production, application, and specialization Individualized information for diagnosis and program evaluation for individuals, subgroups, and programs Comparability - Gong

Our dilemma We want to act as though test scores were strictly comparable, but We also want a lot of conditions that prohibit making the tests and/or testing conditions the same, and in some cases we know the same items are invalid for different individuals So… How can we conceptually understand dimensions that inform our interpretations and uses? What technical tools and approaches are available to support us in making interpretations that involves “comparability of test scores”? Comparability - Gong

New options, new flexibility Multiple tests that are sort of the same purpose, but share no items and use special studies to make comparable (e.g., state high school test and college entrance exams) Multiple tests that are quite different in purpose and share no items (e.g., state test and commercial interim assessment, or other commercial assessment, e.g., OECD District-level PISA with state) Tests that may allow references from one testing program to another by sharing items (e.g., drawing on item banks with sufficient information to link to scales and/or performance levels) – openly available Comparability - Gong

Why might a state want this type of flexibility? Researchers have mapped state proficiency cuts to NAEP, and will likely continue to do so, enabling state-to-NAEP and indirectly state-to-state comparisons of proficiency State might want item-level linking because it wants: Comparisons to a scale other than NAEP Comparisons at the scale-score level Control over and detailed knowledge of the technical aspects Control over the timing, interpretation, publicity Needs trusted resources to do linking to external test because cannot develop on own Comparability - Gong

Comparability Continua Content Comparability less more Content Basis of Test Variations Same content area Same content standards Same test specs Same test items Score Comparability less more Pass/fail score/ decision Achievement level score Scale score Raw score Score Level Comparability - Gong 29

Comparability Continua – 2 Population Comparability less more Population characteristics Adjusted in inter- pretation s Adjusted character- istics Similar character- istics Same students Reporting-level Comparability less more State District School Student Level of reporting unit Comparability - Gong 30

Context of interpretation and use We can solve some of our problems by better specification of what we mean Don’t always need to create comparability at the “more” end of the continuum for content or scores Example: accountability is social valuing; may not need comparable test scores from assessment (e.g., 1%, 2%, ELP, very-low on-grade assessments) Example: claim about comparable achievement level performance at the state level Comparability - Gong

Item-bank linking: researchable task “Extreme linking” (Dorans) is commonly done, with appropriate safeguards and checks Other challenges to traditional linking—notably CAT— have been researched and acceptable solutions have led to wide use (e.g., parameter invariance over item order, test length, time of administration, etc.) Similarly, other item-bank solutions will need to specify under which conditions what types of comparability can be maintained, and show that empirically—but this is an exciting option Comparability - Gong

Summary Use the flexibility available to achieve your policy goals and intended uses Specify, specify, specify – so you know what is comparable and what is not, by intention or by constraint (where are you on the continua of content comparability, score comparability, population comparability, reporting unit comparability?) Validity and strict comparability may not go together Use the tools available to support appropriate comparability Focus on valid interpretations, as well as technical demands Empirically check your results Comparability - Gong

Questions? Comments? Thank you! Comparability - Gong

Brian Gong bgong@nciea.org Comparability - Gong