The evidence is in the specifications

Slides:



Advertisements
Similar presentations
TESTING SPEAKING AND LISTENING
Advertisements

1 The IELTS Academic Reading Module Background information Question types Skills Challenges Helping Ss prepare Questions?
Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department.
1 Assessment Use Argument Nancy Powers Chief of English Testing Section SHAPE, Mons, Belgium Sept 2013.
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
Higher English Listening Assessment. Internally Assessed Units Creation and Production ▫creating at least one written text using detailed and complex.
Developed by Marian Hargreaves for NEAS 2013
Universal Screening and Progress Monitoring Nebraska Department of Education Response-to-Intervention Consortium.
WASHBACK AND CONSEQUENCES Prepared by Natalya Milyavskaya, Tatiana Sadovskaya, Olga Mironova and Anzhelika Kalinina Based on material by Anthony Green.
RELIABILITY BY DESIGN Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.
AMEP Assessment Task Bank Professional Development Kit Reading Developed by Marian Hargreaves for NEAS 2013 © NEAS Ltd
Principles in language testing What is a good test?
Developing Assessments for and of Deeper Learning [Day 2b-afternoon session] Santa Clara County Office of Education June 25, 2014 Karin K. Hess, Ed.D.
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.
Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.
Lectures ASSESSING LANGUAGE SKILLS Receptive Skills Productive Skills Criteria for selecting language sub skills Different Test Types & Test Requirements.
Common Formative Assessments for Science Monica Burgio Daigler, Erie 1 BOCES.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
SPEAKING TESTS IN THE CONTEXT OF LANGUAGE LEARNING.
Stages of Test Development By Lily Novita
USING ILR/STANAG LEVEL DESCRIPTORS AND TEXT TYPOLOGY AND PASSAGE RATING IN CLASSROOM TEACHING TOWARDS PROFICIENCY James Dirgin Director Proficiency Standards.
Standards-Based Tests A measure of student achievement in which a student’s score is compared to a standard of performance.
 Florida Standards Assessment: Q & A with the State Literacy Department January Zone Meeting.
The TOEFL ® Test Jose Santiago Director, Client Relations Educational Testing Service Becas Talentia April 8, 2011 An Update from ETS.
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
Teacher(s): Time: The Course Organizer Student: Course Dates: This Course: Pre-K/ Kindergarten Language Arts Course Questions: is about Course Measures.
EMS 8th grade Social Studies
Reading Comprehension
You Can’t Afford to be Late!
BILC Seminar, Budapest, October 2016
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Developing items and assembling a test based on curriculum standards
Understanding Standards: Advanced Higher Event
Classroom Assessment A Practical Guide for Educators by Craig A
READING COMPREHENSION
Test Blueprints for Adaptive Assessments
What is the purpose of the IELTS test?
Smarter Balanced Assessment Results
Introduction of IELTS Test
Writing Tasks and Prompts
Good Practice in Test Development: Using test taker feedback questionnaires to provide validity evidence for CEFR related reading and listening tests.
Assessing Writing Module 5 Activity 2.
Week 3 Class Discussion.
پرسشنامه کارگاه.
Reading Comprehension
Validating Interim Assessments
recommendations for new teachers
Preparation for the American Literature Eoc
Roadmap Towards a Validity Argument
TOEFL iBT Speaking Overview
Testing Productive Skills
Critically Evaluating an Assessment Task
Assessment Use Argument
evaluating a test Test Usefulness (Bachman and Palmer, 1996)
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
Goals for Tonight Present the new Georgia Milestones Assessment System
Goals for Tonight Present the new Georgia Milestones Assessment System
Assessment in Language Learning
Florida Standards Assessment:
Managerial Decision Making and Evaluating Research
Why do we assess?.
TOEFL iBT Speaking Overview
Successful trialling: from trial and error to best practices
Presentation transcript:

The evidence is in the specifications Test validity: This presentation discusses how the building of test specifications plays a crucial role in strengthening the validity argument for a test’s use and score interpretations. The evidence is in the specifications

What validity evidence do test specs provide? (Test specifications) help to assure that the test developer’s intentions, as specified in the (validity argument), are actually implemented in the creation of assessments. In the process of assessment justification, the blueprint provides documentary evidence as backing for various warrants in the (validity argument). Bachman and Palmer (2010).

Specifications reveal a test’s : Purpose Content Format Length Delivery Mode Task Templates Administration Procedures Psychometric Characteristics Scoring Method and Reporting All of which strengthen the test’s interpretations ETS (Educational Testing Service) Standards for Quality and Fairness. (2014) & AERA, APA, and NCME Standards for Educational and Psychological Testing. (2014)

Test specifications are where many of the pieces of evidence contributing to validity are documented. STANAG 6001 SPEAKING ASSESSMENT

Test developers focus on clarifying the links from linguistic features in the domain to the results that reflect what test takers can do. Test Result Domain features My own definition.

What we’re looking for:

Example “test purpose” The purpose of the TOEFL iBT test is to evaluate the English proficiency of people whose native language is not English. The TOEFL iBT scores are primarily used as a measure of the ability of international students to use English in an academic environment. http://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf

Example “test content” Reading The Reading section measures test takers’ ability to understand university-level academic texts. TOEFL test takers read 3–5 passages of approximately 700 words each and answer 12–14 questions about each passage. The passages contain all of the information needed to answer the questions; they require no special background knowledge. The questions are mainly intended to assess the test taker’s ability to comprehend factual information, infer information from the passage, understand vocabulary in context, and understand the author’s purpose. http://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf

Example “task format”: These questions are multiple-choice questions with a single correct answer. Other types of questions are to assess the test taker’s ability to recognize relationships among facts and ideas in different parts of a passage. These questions have more than four choices and more than one answer, allowing for partial-credit scores. http://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf

Example “test structure” http://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf

Test specifications provide a solid foundation . . . Without detailed test specifications, score interpretations are in jeopardy.

Operationalizing the domain features creating tasks that match the claims

Test specification components Purpose Content Format Length Delivery Mode Task Templates Administration Procedures Psychometric Characteristics Scoring Method and Reporting

No test is perfect, but detailed test specs can minimize contaminants.

Audio: Would you like to play baseball? Stem: Yes, but I don’t have a _____. A. microscope B. mitt C. violin D. laptop

Audio: I can’t believe thousands died in the surprise attack. Stem: I know. It was a _____ . A. tribute B. massacre C. concession D. suspension

Decontaminating the evidence Conduct extensive reviews of tasks Develop a task writing checklist Ensure adequate content representation (despite challenges)

“Why do I need to know idioms for my training “Why do I need to know idioms for my training? I can’t believe they’re on the test!” “I’ve spoken to this student. He can’t pass the test, but he clearly knows English!” STUDENT STAKEHOLDER

Data that support the claim Evidence Test task Derived from domain (English for interoperability) Data that support the claim Prompt that generates the production of evidence

Claim Candidate can use hypothetical language to speculate about abstract topics Evidence Candidate uses hypothetical language to explain how he’d handle a social issue in his country if he were in a position of influence Test Task “During the speaking part of the test, the rater introduces an imaginary situation and asks the candidate a hypothetical question.” Claim Candidate can use hypothetical language to speculate about abstract topics Evidence Candidate uses hypothetical language to explain how he’d handle a social issue in his country if he were in a position of influence Test Task “During the speaking part of the test, the rater asks the candidate a preference question using hypothetical language.”

Confirming our evidence Test specifications component Type of evidence it provides for validity argument Content: Tasks, text types, audience, topics, etc. The test represents the target domain content and skills The test controls for construct-irrelevant variance and construct under-representation Structure: Format , Length, Delivery Mode Different test forms are parallel, yielding reliable results Psychometric Properties Results on different test forms are reliable (correlation coefficient) The test measures the intended construct (Factor analysis ) Items are performing well (Item difficulty index; item discrimination index) Administration procedures Test administration is fair to test takers Construct-irrelevant variance is minimized Scoring Method and Reporting Test results are reliable Stakeholders have information needed to make appropriate decisions ? ? ? ? ?

References American Educational Research Association, American Psychological Association, National Council on Measurement in Education [AERA, APA, and NCME]. (2014) Standards for Educational and Psychological Testing. Washington, D.C. Bachman, L. (2005) Building and Supporting a Case for Test Use. In Language Assessment Quarterly, 2(1). Bachman, L. and Palmer, A. (2010) Language Assessment in Practice. Oxford University Press. Davidson, F. and Lynch, B. (2002) Testcraft. Yale University Press. Downing, S. (2006) Test Specifications. In Downing, S. and Haladyna, T. Handbook of Test Development. (Ed). Routledge. ETS Standards for Quality and Fairness. (2014) https://www.ets.org/s/about/pdf/standards.pdf Hendrickson, A., Ewing, M., Kaliski, P. (2013) Evidence-Centered Design: Recommendations for Implementation and Practice. Journal of Applied Testing Technology, vol. 14. Hughes, A. (2003) Testing for Language Teachers. Cambridge University Press. Luecht, R. (2013) Assessment Engineering Task Model Maps, Task Models and Templates as a New Way to Develop and Implement Test Specifications. Journal of Applied Testing Technology, vol. 14. Messick, S. J. (1989) Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan. Mislevy, R., Almond, R., Lukas, J. (2003) A Brief Introduction to Evidence-Centered Design. Educational Testing Service. Resources used to prepare for this presentation.

Contact information Lisa Alderete Language Testing Analyst English Evaluation (Testing) Flight Defense Language Institute English Language Center JBSA Lackland, San Antonio, Texas lisa.alderete@us.af.mil