Principles in language testing What is a good test?

Slides:



Advertisements
Similar presentations
Assessment types and activities
Advertisements

Quality Control in Evaluation and Assessment
The Test of English for International Communication (TOEIC): necessity, proficiency levels, test score utilization and accuracy. Author: Paul Moritoshi.
Training teachers to use the European Language Portfolio Former les enseignants à lutilisation du Porfolio européen des langues.
TESTING SPEAKING AND LISTENING
Assessment & Evaluation adapted from a presentation by Som Mony
You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.
Presented by Eroika Jeniffer.  We want to set tasks that form a representative of the population of oral tasks that we expect candidates to be able to.
ELP-TT Training teachers to use the European Language Portfolio ECML-short term project ELP_TT2 Ülle Türk, Estonia.
1 SESSION 3 FORMAL ASSESSMENT TASKS CAT and IT ASSESSMENT TOOLS.
L2 program design Content, structure, evaluation.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Testing What You Teach: Eliminating the “Will this be on the final
Types of Tests. Why do we need tests? Why do we need tests?
TESTING ORAL PRODUCTION Presented by: Negin Maddah.
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
VALIDITY AND TEST VALIDATION Prepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga Kuznetsova Based on materials by Anthony Green 1.
Teaching and Testing Pertemuan 13
ELP-TT Training teachers to use the European Language Portfolio Short-term project ELP_TT2 Project coordinator: Margarete Nezbeda, Austria.
C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,
Linguistics and Language Teaching Lecture 9. Approaches to Language Teaching In order to improve the efficiency of language teaching, many approaches.
Understanding Validity for Teachers
LANGUAGE PROFICIENCY TESTING A Critical Survey Presented by Ruth Hungerland, Memorial University of Newfoundland, TESL Newfoundland and Labrador.
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Assessment Literacy for Language Teachers by Peggy Garza Partner Language Training Center Europe Associate BILC Secretary for Testing Programs.
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Shawna Williams BC TEAL Annual Conference May 24, 2014.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.
June 09 Testing: Back to Basics. Abdellatif Zoubair Abdellatif
ELD Transition Sessions
Validity & Practicality
Teaching Today: An Introduction to Education 8th edition
The Common European Framework of Reference for Languages: Uses and users.
Chap. 2 Principles of Language Assessment
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
Module 6 Testing & Assessment Part 1
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Background in Tests and Test Preparation Đ ặ ng Hi ệ p Giang SED-MOET.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
Washback in Language Testing
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
 A test is said to be valid if it measures accurately what it is supposed to measure and nothing else.  For Example; “Is photography an art or a science?
Evaluation, Testing and Assessment June 9, Curriculum Evaluation Necessary to determine – How the program works – How successfully it works – Whether.
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Tests can be categorised according to the types of information they provide. This categorisation will prove useful both in deciding whether an existing.
Language Assessment. Evaluation: The broadest term; looking at all factors that influence the learning process (syllabus, materials, learner achievements,
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Standards-Based Tests A measure of student achievement in which a student’s score is compared to a standard of performance.
COURSE AND SYLLABUS DESIGN
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
ELP-TT Training teachers to use the European Language Portfolio Short-term project ELP_TT2 Project team member: Martine Tchang-George Switzerland.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Types of test Test Types Based on Its Purposes. Placement Test The purpose is this test to place a student according to his/her language ability within.
Test evaluation (group of 4; oral presentation, mins) *Purpose: Apply the principles learned in your reading and class lectures to evaluate an existing.
Language Assessment.
Learner’s Competences
Introduction to the Specification Phase
Introduction to the Validation Phase
LANGUAGE CURRICULUM DESIGN
Concept of Test Validity
پرسشنامه کارگاه.
Learning About Language Assessment. Albany: Heinle & Heinle
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.
Testing Productive Skills
TEMPLATE ELEMENTS.
Why do we assess?.
Presentation transcript:

Principles in language testing What is a good test?

What is the purpose of testing? The purpose of testing is to obtain information on language skills of the learners. Information is very costly. The more specific it is, the more cost it involves. –Is language testing targeting specific information? –Costs here involve human and material resources and TIME. –Once an institution/teacher decided that the information is needed, it/he should be ready to meet the costs.

Types of tests Achievement tests (final or progress) Proficiency tests Pro-achievement tests Diagnostic tests Placement tests

Test marking Assessment scale (also: rating scale) –criteria by which performances at a given level will be recognized –levels of performance: 10 (excellent), 9 (very good), 8 (good) bands 0-9 in IELTS pts in the national English examination level descriptors – verbal descriptions of performances that illustrate each level of competence on the scale

Communicative language competences Linguistic competences –lexical, grammatical, semantic, phonological, orthographic, orthoepic Sociolinguistic competences –markers of social relations, politeness conventions, expressions of folk wisdom, register differences, dialect and accent Pragmatic competences –discourse comp. (ability to arrange sentences in proper sequence), functional (requests, invitations etc.) (adapted from CEFR 2001)

Competences vs. skills Competences are tested through skills The four major skills are subdivided into minor subskills: –reading comprehension: reading for general orientation reading for information reading for main ideas reading for specific information reading for implications etc. (CEFR 2001)

What is good testing? It is valid It is reliable It is practical It has positive impact on the teaching process VALIDITY RELIABILITY PRACTICALITY WASHBACK EFFECT

Test validity It appropriateness of the test; OR It shows that a test tests what it is supposed to test; OR A test is valid if it measures accurately what it is intended to measure. To establish that a test is valid, empirical evidence is needed. The evidence comes from different sources…

Types of validity Construct validity: –the extent to which a test measures the underlying psychological construct (“ability, capacity”) –the extent to which a test reflects the essential aspects of the theory on which that test is based –an overarching notion of validity reflected in many subordinate forms of validity

In a more complicated way… If a test does not have construct validity, test scores will show CONSTRUCT IRRELEVANT VARIANCE. –E. g. in an advanced speaking test candidates may be asked to speak on an abstract topic. Personal engagement in the topic, however, may weaken or improve the performance. BUT: having previous knowledge about the abstract topic should not be assessed.

Types of validity Content validity: –the extent to which a test adequately and sufficiently measures the particular skills it sets out to measure (cf. test specifications) Response validity: –… test takers respond in the way expected by the test developers Predictive validity: –… a test accurately predicts future performance Concurrent validity: –… one test relate to scores on another external measure Face validity: –… test appears to measure whatever it claims to measure (Hughes 2003: 26-35)

Types of validity Nearly 40 different types have been collected on a language testers’ forum… The more different types of validity are established in a test, the more valid that test is considered to be.

Test reliability Quality of test scores resulting from test administration: –accuracy of marking and fairness of scores –consistency of marking: similar scores on different days similar scores from different markers –inter-rater reliability –intra-rater reliability

Factors influencing reliability 1.The performance of test takers 1.a sufficient number of items 2.restricted freedom of test behaviour 3.unambiguous items, clear instructions and rubrics 4.layout, good copies, familiar format 5.proper administration 2.The reliability of scorers 1.objective scoring vs. subjective scoring 2.restricting freedom of response 3.a detailed scoring/marking key

Test feasibility/practicality It is the ease with which the items/tasks can be replicated in terms of resources needed, e. g. time, materials, people

Washback effect (sometimes ‘backwash’) It is a type of impact of examinations/tests on the classroom situation. Washback may be positive or negative.

How to achieve positive washback? 1.Test the abilities/skills whose development you want to encourage. 2.Sample widely and unpredictably. 3.Use direct testing. 4.Make testing criterion-referenced. 5.Base achievement tests on objectives. 6.Make sure that the test is known and understood by students and other teachers.

References and additional reading 1.Alderson, Ch., D. Clapham and D. Wall Language Test Construction and Evaluation. Cambridge: CUP 2.Hughes, A Testing for Language Teachers. 2 nd ed. Cambridge: CUP. 3.Council of Europe Common European Framework of Reference for Languages. Cambridge: CUP.