Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa.

Slides:



Advertisements
Similar presentations
Michigan Assessment Consortium Common Assessment Development Series Putting Together The Test Blueprint.
Advertisements

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.
Beyond Peer Review: Developing and Validating 21st-Century Assessment Systems Is it time for an audit? Thanos Patelis Center for Assessment Presentation.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Collecting data Chapter 5
STATE STANDARDIZED ASSESSMENTS. 1969The National Assessment for Educational Progress (NAEP) administered for the first time, Florida participated in the.
In Today’s Society Education = Testing Scores = Accountability Obviously, Students are held accountable, But also!  Teachers  School districts  States.
Developing an Assessment
Writing High Quality Assessment Items Using a Variety of Formats Scott Strother & Duane Benson 11/14/14.
Assessment: Reliability, Validity, and Absence of bias
ASSESSMENT IN EDUCATION ASSESSMENT IN EDUCATION. Copyright Keith Morrison, 2004 TESTS Purposes of the test Type of test Objectives of the test Content.
1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.
Consistency/Reliability
C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,
Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Now that you know what assessment is, you know that it begins with a test. Ch 4.
Technical Issues Two concerns Validity Reliability
Assessment Literacy Series
Revision Sampling error
Curriculum Alignment Refers the “match” between the content, format, and level of cognition of the curriculum or textbook (English, p ).
Keele Assessment of Participation (KAP): A new instrument for measuring participation restriction in population surveys Ross Wilkie, George Peat, Elaine.
NCCSAD Advisory Board1 Research Objective Two Alignment Methodologies Diane M. Browder, PhD Claudia Flowers, PhD University of North Carolina at Charlotte.
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Review: Cognitive Assessments II Ambiguity (extrinsic/intrinsic) Item difficulty/discrimination relationship Questionnaires assess opinions/attitudes Open-/Close-ended.
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Developing Structured Activity Tools. Aligning assessment methods and tools Often used where real work evidence not available / observable Method: Structured.
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.
Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
FCAT Science High School Test – Grade 11 How did this happen? Why didn’t I know?
RELIABILITY AND VALIDITY OF ASSESSMENT
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Reading and Evaluating Research Method. Essential question to ask about the Method: “Is the operationalization of the hypothesis valid? Sections: Section.
Assessment and Testing
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Assessment Developing an Assessment. Assessment Planning Process Analyze the environment Agency, clients, TR program, staff & resources Define parameters.
ASSESSMENT CONCERNS KNR 279. Stumbo, 2001 In the 60-plus years since the origins of the profession, it seems little progress in therapeutic recreation.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,
Review: Alternative Assessments Alternative/Authentic assessment Real-life setting Performance based Techniques: Observation Individual or Group Projects.
AIM: K–8 Science Iris Weiss Eric Banilower Horizon Research, Inc.
Imagine…  A hundred students is taking a 100 item test at 3 o'clock on a Tuesday afternoon.  The test is neither difficult nor easy. So, not ALL get.
No Child Left Behind Impact on Gwinnett County Public Schools’ Students and Schools.
Stages of Test Development By Lily Novita
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Review: Cognitive Assessments II Ambiguity (extrinsic/intrinsic) Item difficulty/discrimination relationship Questionnaires assess opinions/attitudes Open-/Close-ended.
Jan/Feb 2007CAPA Examiner Train-the-Trainer1 CAPA Training of CAPA Examiners.
California Assessment of Student Performance and Progress CAASPP Insert Your School Logo.
INTRODUCTION TO ASSESSMENT METHODS USED IN MEDICAL EDUCATION AND THEIR RATIONALE.
Alternative Assessment Larry D. Hensley University of Northern Iowa Chapter 8.
Next Generation Iowa Assessments.  Overview of the Iowa Assessments ◦ Purpose of Forms E/F and Alignment Considerations ◦ Next Generation Iowa Assessments.
Development of Assessments Laura Mason Consultant.
Reduced STAAR test blueprints
EVALUATING EPP-CREATED ASSESSMENTS
Assessment and Evaluation
Melanie Taylor Horizon Research, Inc.
Validity and Reliability
Validity and Reliability
Clinical Assessment Dr. H
Reliability.
The extent to which an experiment, test or any measuring procedure shows the same result on repeated trials.
Study Questions To what extent do English language learners have opportunity to learn the subject content specified in state academic standards and.
Critically Evaluating an Assessment Task
How can one measure intelligence?
Reliability and Validity
Presentation transcript:

Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa

Test Development: A Technical Concern Procedures are well-established – it’s sort of a ‘rocket-art’ Aspects of ‘quality’ that seem distinct to an observer are inseparable to a developer Quality control requires resources – talent, time, and money – to do well TD is the grunt work of assessment

Best Practice in Test Development Interpret content standards; translate into test specifications Search for stimulus material; draft items Do the 3Rs: REVIEW-REVISE-REPLACE Prepare material for field testing Oops – we forgot about finding the kids to participate in field testing, many comparable samples of them

More Best Practice in TD Administer, retrieve, and score tryout materials; get item analysis results to TDers Do the 3Rs: REVIEW-REVISE- REPLACE Prepare more material for field testing Oops – more kids for field testing, more comparable samples

What do we get from Best Practice? Something elusive (important content, interesting materials, good questions, cognitive complexity, comparability) Something intangible (fairness, alignment with standards, intended consequences) Something concrete (coverage, rater reliability, a validity or generalizability coefficient, acceptable cost)

Some TD Half Truths Multiple Choice Items  Development is hard  Scoring is easy (and public)  Quality Control built in to TD process Open-ended Items  Development is easy  Scoring is hard (and private)  Quality Control elusive due to scoring

Comparability in Test Materials Test form as the unit for judging comparability Easy to achieve with many items on the test and many potential throwaways in the pool Experienced test development staff Good field testing and scoring needed

Group Differences and Fairness TD seeks a balance Tension is that balance requires questions, lots of them Instructional influences confounded with group effects DIF requires good matching questions

Cost Factors in Large-Scale Testing Development Costs  Recur with each test form  Are fixed by instrument design Scoring Costs  Recur with each test administration  May change because of ‘unexpected’ circumstances

Validity in Test Development Best practice ensures content quality, balance, and alignment with standards – critical aspects of validity & reliability TD is predicated on anticipated use Other aspects of validity & reliability aren’t understood until it’s too late, i.e. when the test is operational

Validity & Capacity in NCLB NCLB is census testing Census testing places heavy demands on TD and other aspects of an accountability system Limit on capacity in TD means  only 1R, or 2Rs  fewer rounds of field testing  dwindling pools of test materials No item left behind