Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013.

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.
National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Discussion of the Principles and Issues.
A.Review of PISA constructs and indices (and variables) to identify those that are likely to be operable and not operable in developing country contexts,
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
PISA FOR DEVELOPMENT Technical Workshops Components and input for ToR of the International Contractor(s) 9 th April 2014 OECD Secretariat 1.
Middle Years Programme
1 Introducing the Victorian Curriculum Reform 2004 Consultation Paper 2004 Consultation Paper A Framework of ‘Essential Learning’ April 2004.
Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
Study on the outcomes of teaching and learning about ‘race’ and racism Kish Bhatti-Sinclair (Division of Social Work Studies) Claire Bailey (Division of.
Fit to Learn Using the Employability Skills Framework to improve your performance at College The Employability Skills Framework has been developed by business.
Unit 3 Siobhan Carey Department for International Development Making cross-national comparisons using micro data.
Literacy Assessment and Monitoring Programme (LAMP) UNESCO Institute for Statistics.
Curriculum Project Garred Kirk. EARL 1: Civics The student understands and applies knowledge of government, law, politics, and the nation’s fundamental.
Learning Objectives, Performance Tasks and Rubrics: Demonstrating Understanding and Defining What Good Is Brenda Lyseng Minnesota State Colleges.
What is PIAAC?. About PIAAC PIAAC is an international large-scale assessment administered in in 23 countries It assessed 16 - to 65-year-olds,
Highlights from PIRLS and TIMSS 2011 Jack Buckley National Center for Education Statistics Washington, DC December 11, 2012.
MULTILINGUAL & MULTICULTURAL EDUCATION DEPARTMENT
What do international assessments measure: PISA Raymond J. Adams Washington DC, May This paper is intended to promote the exchange of ideas among.
INACOL National Standards for Quality Online Teaching, Version 2.
ICT and Education Indicators S
Challenges in International Large-Scale Assessments Higher School of Economics, Moscow, Russia, May 16, 2013.
Insights from PISA & TIMSS on our planning of services & policies on curriculum development in Science Education Sci Edu Section, CDI 6 Oct 2014.
Combined Grades Making Them Work Fall 2007 Building Classes of Combined Grades “In successful schools, classrooms are organized to meet the learning.
PIAAC: ORIGINS, INTERNATIONAL DIMENSION, CONCEPTS AND AIMS William Thorn, OECD
Becoming a Teacher Ninth Edition
1 6-8 Smarter Balanced Assessment Update English Language Arts February 2012.
Mark Keese Head of Employment Analysis and Policy Division Directorate for Employment, Labour and Social Affairs Better skills for more inclusive and sustainable.
High School Mathematics: Where Are We Headed? W. Gary Martin Auburn University.
Student Engagement Survey Results and Analysis June 2011.
NCES International Assessments and the International Data Explorer Education Writers Association April 7, 2011 Dana Kelly NCES.
1 U.S. PIAAC National Supplement: Prison Study Overview Association of State Correctional Administrators Research and Best Practices Committee Gaylord.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Highlights from PIRLS and TIMSS 2011 Jack Buckley National Center for Education Statistics Washington, DC December 11, 2012.
Measuring Learning and Improving Education Quality: International Experiences in Assessment John Ainley South Asia Regional Conference on Quality Education.
Benchmarking with National and International Assessments Larry V. Hedges Northwestern University This paper is intended to promote the exchange of ideas.
Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. 10/7/2015 A Model for Scaling, Linking, and Reporting.
Smarter Balanced Assessment Update English Language Arts February 2012.
Committee on the Assessment of K-12 Science Proficiency Board on Testing and Assessment and Board on Science Education National Academy of Sciences.
1 Race to the Top Assessment Program General & Technical Assessment Discussion Jeffrey Nellhaus Deputy Commissioner January 20, 2010.
Standards-Based Curricula Dr. Dennis S. Kubasko, Jr. NC Teach Summer 2009.
Teaching to the Standard in Science Education By: Jennifer Grzelak & Bonnie Middleton.
Israel Accession Seminar PIAAC: Programme for International assessment of Adult Competencies Skills strategy in OECD Programme for the International Assessment.
June 2003George Mason University1 Needs Assessment Farrokh Alemi, Ph.D.
Educator’s view of the assessment tool. Contents Getting started Getting around – creating assessments – assigning assessments – marking assessments Interpreting.
Developing NAEP Frameworks: A Look Inside the Process Mary Crovo Deputy Executive Director National Assessment Governing Board November 17, 2007.
What Use Are International Assessments for States? 30 May 2008 Jack Buckley Deputy Commissioner National Center for Education Statistics Institute of Education.
Education and Assessment in France Bruno Trosseille DEPP - Assessment, Forecasting and Performance Directorate Ministry of Education, France International.
A Statistical Linkage Between NAEP and ECLS-K Grade Eight Reading Assessments Enis Dogan Burhan Ogut Young Yee Kim Sharyn Rosenberg NAEP Education Statistics.
Race to the Top Assessment Program: Public Hearing on Common Assessments January 20, 2010 Washington, DC Presenter: Lauress L. Wise, HumRRO Aab-sad-nov08item09.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
You Can’t Afford to be Late!
International Large-Scale Assessments – Best practice and what are they good for? Dirk Hastedt, IEA Moscow, October 2015.
I NTERNATIONAL B ENCHMARKING S TUDY — C ONTENT A LIGNMENT Mary J. Pitoniak, Nancy Glazer, Luis Saldivia Educational Testing Service June 22, 2015 National.
SSA – Technical Cooperation Fund End of Project Conference The Role of International Achievement Studies (OECD PISA, IEA TIMSS, PIRLS…) Importance of Large-scale.
1 Researchers : 10 altogether; 3 from Denmark, 2 from Finland, 2 from Norway, and 3 from Sweden. Scientific board: professor Antero Malin, Finland (project.
11 PIRLS The Trinidad and Tobago Experience Regional Policy Dialogue on Education 2-3 December 2008 Harrilal Seecharan Ministry of Education Trinidad.
NAEP What is it? What can I do with it? Kate Beattie MN NAEP State Coordinator MN Dept of Education This session will describe what the National Assessment.
New Survey Questionnaire Indicators in PISA and NAEP
What is PIAAC?.
Inquiry and IBL pedagogies
The future of PISA: perspectives for innovation
Literacy Assessment and Monitoring Programme (LAMP)
Assessment Framework and Test Blueprint
PISA • PIRLS • TIMSS Program for International Student Assessment
Booklet Design and Equating
Roland Wilson, David Potter, & Dr. Dru Davison
Topic Principles and Theories in Curriculum Development
Programme for the International Assessment of Adult Competencies
Presentation transcript:

Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013

Overview Setting a context Growth in Large Scale Assessments (LSA) Features of Large Scale Assessments (LSA) Growing importance of CBA Innovations in recent LSA (PIAAC and PISA) Future areas for innovation

Until relatively recently educational data were not collected in a consistent or standardized manner. In 1958, a group of scholars representing various disciplines met at UNESCO in Hamburg, Germany to discuss issues surrounding the evaluation of schools and students through the systematic collection of data relating to knowledge, skills and attitudes. Their meeting led to the development of a feasibility study of 13 year olds in 12 countries covering 5 content areas and the legal entity known as IEA in Setting a Context

Back in the United States the Commissioner of Education, Francis Keppel, invited Ralph Tyler in 1963 to develop a plan for the periodic assessment of student learning. Planning meetings were held in 1963 and 1964 and a technical advisory committee formed in In April 1969, NAEP first assessed in-school 17 year olds in citizenship, science and writing. Setting a Context

Tyler’s vision for NAEP was that it would focus on what groups of students know and can do rather than on what score an individual might receive on a test. The assessment would be based on identified objectives whose specifications would be determined by subject matter experts. Reports would be based on the performance of selected groups, not individuals, who responded correctly to the exercises and would not rely on grade-level norms. Setting a Context

Prior to IEA and NAEP there were no assessment programs to measure students or adults as a group. The primary focus of educational testing had been on measuring individual differences in achievement rather than on students’ learning. And, the data that were collected dealt primarily with the inputs to education rather than the yield of education. Setting a Context

Interpretations would be limited to the set of items used in each assessment. This basic approach to large scale assessments remained in place through all of the 1970s. In the 1980s programs beginning with NAEP began to use item response theory (IRT) to allow for the creation of scales and the broadening of inferences to include items not included in the assessment. New methodology involving marginal estimation was developed to optimize the reporting of proficiency distributions based on complex designs such as BIB spiraling. This approach remains in use today. Setting a Context

… not being satisfied with assertions or self reports … in response to policy makers and researchers wanting to know more … asking more challenging questions … and creating both the need and opportunity for new methodological and technological developments Growth and Expansion

Number of assessments Participation of countries Populations who are surveyed Domains / Constructs that are measured Methodology Modes Growth and Expansion

Overview 10 Large-Scale International Surveys School-Based PIRLS TIMSS PISA Adults IALS ALL PIAAC STEP Growth and Expansion

Curriculum Life skills Measurement

Growth and Expansion Curriculum Life skills Measurement

Features of LSA Assessment LSA are primarily concerned with the accuracy of estimating the distribution of a group of respondents rather than individuals. In this way, the focus is on providing information that can inform policy and further research Differ from individual testing in key ways

Extensive framework development Sampling Weighting Use of Complex Assessment Designs IRT Modeling Population Modeling Connection to background variables Increasing reliance on CBA Features of LSA Assessment

Until very recently all large scale national and international assessments were paper based assessments with some optional computer based components. PIAAC (2012) was the first large scale survey of adult skills in which the primary mode of delivery was computer and paper and pencil became the option. In 2015, PISA will also use computers as the primary mode of delivery with paper and pencil becoming an option for countries Growing Importance of Computer Based Assessments

Why is a Computer Delivered Assessment Important for PISA? Better reflects the ways in which students & adults access, use and communicate information Enables surveys like PIAAC and PISA to broaden the range of skills that can be measured; Allow these surveys to take better advantage of both operational and measurement efficiencies that technology can provide Why is Computer Based Assessment Important for Surveys such as PIAAC and PISA?

Goals of the PIAAC 2012 and PISA 2015 Assessment Designs Establish the comparability of inferences across countries, across assessments and across modes Broaden what can be measured by both extending the existing constructs and by being able to introduce new constructs Reduce random and systematic error through the use of more complex designs, automated scoring; use of timing information; and the use of adaptive testing

PIAAC Main Study Cognitive Assessment Design CORE 4L + 4N LITERACY 20 Tasks NUMERACY 20 Tasks READING COMPONENTS CBA-Core Stage 1: ICT LITERACY Stage 1 (9 tasks) Stage 2 (11 tasks) NUMERACY Stage 1 (9 tasks) Stage 2 (11 tasks) PS in TRE NUMERACY Stage 1 (9 tasks) Stage 2 (11 tasks) LITERACY Stage 1 (9 tasks) Stage 2 (11 tasks) PS in TRE ICT use from BQ CBA-Core Stage 2: 3L + 3N No computer experience Computer experience Pass Fail :Random assignment

Average Proficiency Scores By Domain and Subgroups LiteracyNumeracyPSTRE No ICT Failed CBA Core Refused CBA CBA

Cumulative Distribution of Numeracy Proficiency by Subgroups

Percentage of Item-by-Country Interactions * Literacy and numeracy interactions go across modes and time LiteracyNumeracyPSTRE 8%7%3% 146 out of 1748 pairs (76 items x 23 countries) 118 out of 1748 pairs (76 items x 23 countries) 8 out of 280 pairs (14 items x 20 countries)

Number of Unique Parameters for Each Country - Numeracy

Maintaining and Improving Measurement of Trends Proposal for PISA 2015 is to enhance and stabilise the measurement of trend data Refocus the balance between random and systematic errors

Recommended approach stabilizes trend through reducing bias by including all items in each minor domain while reducing the number of students responding to each item MAJOR minor Construct Coverage The reduced height of the bars for the minor domains represents the reduction of items in that domain and therefore the degree to which construct coverage has been reduced Width conveys the relative number of students who respond to each item within the domain MAJOR minor Construct Coverage in the Current PISA Design by Major and Minor Domains Recommended Approach for Measuring Trends in PISA 2015 and Beyond Height of the bars represents the proportion of items measured in each assessment cycle by domain Maintaining and Improving Measurement of Trends

MAJOR 2006 minor 2009 minor 2012 MAJOR 2015 minor 2018 New Items New Items Reflecting New Construct New Items Reflecting Old Construct Trend Items minor 2021 Domain Rotation Scientific Literacy as a major domain - new items Scientific Literacy as a minor domain – new trend line from a construct point of view Maintaining and Improving Measurement of Trends Impact Over Cycles

Introduction of new item types Use of fully automated scoring More flexible use of languages Development of research around process information contained in log files Introduction of more complex psychometric models Development of derivative products Future Innovations

Large scale international assessments continue to grow in importance Computer based assessments are now feasible and will become the standard for development and delivery … better reflect the ways in people now access, use and communicate information add efficiency and quality to the data introduce innovation that broadens what can be measured and reported Summary

Questions and Discussion

Broaden what was measured; Demonstrate high comparability among countries, over time and across modes; Introduce multi-stage adaptive testing; Include the use of timing information to better distinguish between omit and not reached items; Demonstrate an improvement in the quality of the data that was collected The design for PIAAC was able to …

Growth and Expansion Curriculum Life skills Measurement