VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006.

Slides:

Advertisements

Similar presentations

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.

Advertisements

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Chapter Fifteen Understanding and Using Standardized Tests.

Reports and Scores Fen Chou, Ph.D. Louisiana Department of Education August 2006.

Statement of Intent for Growth Metrics Presented to the PARCC Governing Board June 26, 2013.

Meeting NCLB Act: Students with Disabilities Who Are Caught in the Gap Martha Thurlow Ross Moen Jane Minnema National Center on Educational Outcomes

Lesson Thirteen Standardized Test. Yuan 2 Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized.

Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.

Grade 3-8 English Language Arts and Mathematics Results August 8, 2011.

Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.

Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)

Vertical Scale Scores.

Learning Trajectory Everyday Mathematics Program Goals.

P RE - SERVICE T EACHER E DUCATION : IMPACT OF HIGHER EDUCATION INSTITUTIONS ON PUPIL MATHEMATICS PERFORMANCE EDUCATIONAL RESEARCH FOR THE GOOD OF SOCIETY.

New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.

Update on the State Testing Program November 14, 2011.

Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

FCAT 2.0 and End-of-Course Assessments 1 Kris Ellington Deputy Commissioner Division of Accountability, Research and Measurement 850/

1 An Introduction to Language Testing Fundamentals of Language Testing Fundamentals of Language Testing Dr Abbas Mousavi American Public University.

Jasmine Carey CDE Psychometrician Interpreting Science and Social Studies Assessment Results September 2014.

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,

Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.

How is the OER Scored? It’s rated on a scale of or 3 is considered “passing” So, what is a 0, 1, 2 or 3 answer?

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Transforming Scores Changing the scale of the data in some way. Change each score.

Assessment Training Nebo School District. Assessment Literacy.

Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)

Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.

ELA & Math Scale Scores Steven Katz, Director of State Assessment Dr. Zach Warner, State Psychometrician.

Fall 2007 MEAP Reporting 2007 OEAA Conference Jim Griffiths – Manager, Assessment Administration & Reporting Sue Peterman - Department Analyst, MEAP.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

Elementary School Accountability Weights Current Weights Proposed Weights 2007-Beyond Attachment B Kentucky Board of Education November 1, 2006.

Using the Iowa Assessments Interpretation Workshops Session 3 Using the Iowa Assessments to Track Readiness.

Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.

University of Ostrava Czech republic 26-31, March, 2012.

Connecticut SDE 2012 Connecticut Assessment Forum August 15, 2012 Bureau of Student Assessment.

Pike County Middle GA Milestones 2015

Evaluating individuals Evaluating classes Noting unusual patterns or trends Identifying gaps in learning USE OF SCORES.

Obtaining International Benchmarks for States Through Statistical Linking: Presentation at the Institute of Education Sciences (IES) National Center for.

Assessment at CPS A new way of working. Background - No more levels New National Curriculum to be taught in all schools from September 2014 (apart from.

LECTURE 14 NORMS, SCORES, AND EQUATING EPSY 625. NORMS Norm: sample of population Intent: representative of population Reality: hope to mirror population.

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,

Mathematics Initiative Office of Superintendent of Public Instruction Mathematics Initiative Office of Superintendent of Public Instruction CAA Options.

1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.

ELA teachers PD may 12, 2010 Jorge Garcia (this slide added for topic data– not part of presentation)

LaKenji Hastings, NWLC Assessment Program Specialist Georgia Milestones Parent Informational.

For Elementary Schools.  The structure of the new assessment  How does it inform instruction?  What the data tells us  Where are we now?  How do.

Assessment in Key Stage 1 Parent workshop. Tuesday 1 st March 2016.

The New MCAIII Science Benchmark Reports for 2015 Minnesota Department of Education Science Assessment Specialist Jim WoodDawn Cameron

Next Generation Iowa Assessments.  Overview of the Iowa Assessments ◦ Purpose of Forms E/F and Alignment Considerations ◦ Next Generation Iowa Assessments.

Lesson Thirteen Standardized Test. Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized Test.

NAEP What is it? What can I do with it? Kate Beattie MN NAEP State Coordinator MN Dept of Education This session will describe what the National Assessment.

How the CAP Science and Social Studies Tests Measure Student Growth.

Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.

Birmingham Falls Georgia Milestones

It Begins With How The CAP Tests Were Designed

Language Arts Assessment Update

Booklet Design and Equating

Interpreting Science and Social Studies Assessment Results

Analyzing Reliability and Validity in Outcomes Assessment Part 1

Brian Gong Center for Assessment

Cardinal Convo April &

Understanding and Using Standardized Tests

Margaret Wu University of Melbourne

Analyzing Reliability and Validity in Outcomes Assessment

GA EOG Milestone Parent Informational

Collecting and Interpreting Quantitative Data

Virginia Board of Education’s

Presentation transcript:

VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006

Scaling Definition: Scaling is a process in which raw scores on a test are transformed to a new scale with desired attributes (e.g., mean, SD)

Scaling Purposes: 1.Reporting scores on a convenient metric 2.Providing a common scale on which scores from different forms of a test can be reported (after equating or linking)

Scaling There are two distinct testing situations where scaling is needed

Scaling SITUATION 1 Examinees take different forms of a test for security reasons or at different times of year Forms are designed to the same specifications but may differ slightly in difficulty due to chance factors Examinee groups taking the different forms are not expected to differ greatly in proficiency

Scaling SITUATION 2 Test forms are intentionally designed to differ in difficulty Examinee groups are expected to be of differing proficiency EXAMPLE: test forms designed for different grade levels

EQUATING Equating is the process of mapping the scores on Test Y onto the scale of Test X so that we can say what the score of an examinee who took Test Y would have been had the examinee taken Test X (the scores are exchangeable) For SITUATION 1, we often refer to the scaling process as EQUATING

EQUATING This procedure is often called HORIZONTAL EQUATING

LINKING This process is sometimes called VERTICAL EQUATING, although equating is not strictly possible in this case For SITUATION 2, we refer to the scaling process as LINKING, or scaling to achieve comparability

REQUIREMENTS FOR SCALING In order to places the scores on two tests on a common scale, the tests must measure the same attribute e.g., the scores on a reading test cannot be converted to the scale of a mathematics test

EQUATING DESIGNS FOR VERTICAL SCALING 1.COMMON PERSON DESIGN Tests to be equated are given to different groups of examinees with a common group taking both tests 2.COMMON ITEM (ANCHOR TEST) DESIGN Tests to be equated are given to different groups of examinees with all examinees taking a common subset of items (anchor items)

EQUATING DESIGNS FOR VERTICAL SCALING EXTERNAL ANCHOR OR SCALING TEST DESIGN Different groups of examinees take different tests, but all take a common test in addition

Example of Vertical Scaling Design (Common Persons) Test Level GradeStudent234 2 (November testing) 12..N212..N2 Mean = 26.6 SD = (November testing) 12..N312..N3 Mean = 34.7 SD = 4.3 Mean = 26.1 SD = (November testing) 12..N12..N Mean = 35.3 SD = 5.1 Mean = 25.9 SD = (November testing) N+1 N+2. N 4 Mean = 26.0 SD = 5.0

Example of Vertical Scaling Design (Common Items) Item Block Year 1 Year 2 Year 3

Problems with Vertical Scaling If the construct or dimension being measured changes across grades/years/ forms, scores on different forms mean different things and we cannot reasonably place scores on a common scale May be appropriate for a construct like reading; less appropriate for mathematics, science, social studies, etc.

Problems with Vertical Scaling Both common person and common item designs have practical problems of items that may be too easy for one group and too hard for the other Must ensure that examinees have had exposure to content of common items or off-level test (cannot scale up, only down in common persons design)

Problems with Vertical Scaling Scaled scores are not interpretable in terms of what a student knows or can do Comparison of scores on scales that extend across several years is particularly risky