VALIDITY AND TEST VALIDATION Prepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga Kuznetsova Based on materials by Anthony Green 1.

Slides:

Advertisements

Similar presentations

An Introduction to Test Construction

Advertisements

Quality Control in Evaluation and Assessment

Evidence & Preference: Bias in Scoring TEDS-M Scoring Training Seminar Miami Beach, Florida.

TESTING SPEAKING AND LISTENING

Cross Cultural Research

Cal State Northridge Psy 427 Andrew Ainsworth PhD

Testing What You Teach: Eliminating the “Will this be on the final

Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.

Measurement Reliability and Validity

CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY

Assessment: Reliability, Validity, and Absence of bias

Basic Issues in Language Assessment 袁韻璧輔仁大學英文系. Contents Introduction: relationship between teaching & testing Introduction: relationship between teaching.

Linguistics and Language Teaching Lecture 9. Approaches to Language Teaching In order to improve the efficiency of language teaching, many approaches.

Classroom Assessment A Practical Guide for Educators by Craig A

Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.

Understanding Validity for Teachers

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

Chapter 4. Validity: Does the test cover what we are told (or believe)

WASHBACK AND CONSEQUENCES Prepared by Natalya Milyavskaya, Tatiana Sadovskaya, Olga Mironova and Anzhelika Kalinina Based on material by Anthony Green.

Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.

Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.

Shawna Williams BC TEAL Annual Conference May 24, 2014.

Technical Issues Two concerns Validity Reliability

Measurement and Data Quality

6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?

Ch 6 Validity of Instrument

ASSESSMENT IN EDUCATION ASSESSMENT IN EDUCATION. Copyright Keith Morrison, 2004 PERFORMANCE ASSESSMENT... Concerns direct reality rather than disconnected.

Reliability and Validity what is measured and how well.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

Validity & Practicality

Principles in language testing What is a good test?

Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.

Chap. 2 Principles of Language Assessment

Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.

USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar.

Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

Validity in Testing “Are we testing what we think we’re testing?”

Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.

Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.

 A test is said to be valid if it measures accurately what it is supposed to measure and nothing else.  For Example; “Is photography an art or a science?

Chapter 6 - Standardized Measurement and Assessment

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

Stages of Test Development By Lily Novita

PRINCIPLES OF LANGUAGE ASSESSMENT Riko Arfiyantama Ratnawati Olivia.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.

Barry O’Sullivan | British Council Re-conceptualising Validity in High Stakes Testing Invited Seminar February 11 th 2015 University.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

EVALUATING EPP-CREATED ASSESSMENTS

VALIDITY by Barli Tambunan/

Introduction to the Validation Phase

Reliability and Validity in Research

Concept of Test Validity

Test Design & Construction

Introduction to the Validation Phase

HRM – UNIT 10 Elspeth Woods 9 May 2013

Validity and Reliability

Human Resource Management By Dr. Debashish Sengupta

پرسشنامه کارگاه.

VALIDITY Ceren Çınar.

Presentation transcript:

VALIDITY AND TEST VALIDATION Prepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga Kuznetsova Based on materials by Anthony Green 1

Validity ABC Test of English Results Ivana 45% Irina 78% Which student is better at English?

Validity Language Ability Assessment tasks Some aspects may not be tested: Construct under-representation T Assessment tasks

Validity Language Ability Some abilities that are important to success in a test may not be connected to real-world language abilities: ability to cope with exam stress; awareness of how multiple-choice questions are written; willingness to guess etc. These are construct irrelevant factors.

What is validity? Tests are tools for helping us to make good decisions. Construct relevance: a test of maths (even if it’s very reliable) can’t tell us about someone’s ability to sing; a test of written grammar can’t tell us much about someone’s ability to hold a conversation. Construct representation: does the test cover all aspects of the relevant abilities?

What is validity? ‘validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests’ American Educational Research Association et al. (1999) This means that test results can be valid for one purpose and for one particular population of test takers, but not for others. A test may be valid for placement purposes on a general language course, but not for employment selection.

Building a validation argument What do we want the results to mean? What evidence can we collect to find out if scores really support this interpretation? evaluation – the test taker’s performance is a fair reflection of his/her abilities; generalization – similar scores would be obtained if the test taker was given a different form of the test, or if the raters scoring his/her performance were different; explanation – the test reflects a coherent theory of language ability; utilisation – the tested abilities are relevant to the decision being made about the test taker.

Validation in the assessment cycle: at different stages in the cycle, different questions need to be answered; different types of validity may be more relevant at each stage; tests made for different purposes raise different issues.

Building a validation argument: Evaluation – the test taker’s performance is a fair reflection of his/her abilities. Test form and administration. Generalization – similar scores would be obtained if the raters scoring his/her performance were different. Test score and rating scales. Explanation – the test reflects a coherent theory of language ability. Specification. Utilisation –the tested abilities are relevant to the decision being made about the test taker. Test purpose and target language use domain.

VALIDITY AND TEST VALIDATION 10

Validity in test design “Tests for the measurement of language abilities must be constructed according to a coherent validity framework based on the latest developments in theory and practice.” (Weir, 2005)

Socio-cognitive approach (O’Sullivan & Weir, 2010) CONTEXT VALIDITY COGNITIVE TEST TASK PERFORMANCE SCORING VALIDITY CONSEQUENTIAL VALIDITY CRITERION-RELATED VALIDITY

Content (context) validity Content validity is based on subject experts' judgments of test content. Does the content of the test adequately cover all the aspects of language ability we are interested in for making this decision?

Content (context) validity A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned. (Hughes, 2005) The term content validity was traditionally used to refer to the content coverage of the task. Context validity is preferred as a more inclusive superordinate which signals the need to consider the discoursal, social and cultural context as the linguistic parameters under which the task is performed (its operations and conditions). (Weir and Shaw, 2005)

Cognitive (or theory-based) validity Do test takers go through the same mental processes when responding to test tasks as when they use language in the real world in the situations we are interested in?

Cognitive (or theory-based) validity Theory-based validity involves collecting a priori evidence through piloting and trialling before the test event, for example through verbal reports from test takers on the cognitive processing activated by the test task, and a posteriori evidence involving statistical analysis of scores following test administration. (Weir and Shaw, 2005)

Scoring validity Scoring validity accounts for the extent to which test scores are: based on appropriate criteria; exhibit consensual agreement in their marking; free as possible from measurement error; stable over time; engender confidence as reliable decision making indicators. (Weir and Shaw, 2005)

Scoring validity = reliability Are the test scores consistent enough for us to have confidence in the results?

Criterion-related validity Criterion-related validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate's ability. This independent assessment is thus the criterion measure against which the test is validated. (Hughes, 2003) Are test results of the test consistent with other evidence we have about test takers’ abilities? Criterion-related validity takes two forms: concurrent validity predictive validity

Concurrent validity “involves the comparison of the test scores with some other measures of the same candidates taken at roughly the same time as the test.” (Alderson et al., 1995:177) Do scores on our test agree with the results of other tests of the same abilities?

Predictive validity Predictive validity entails the comparison of test scores with some other measure for the same candidates taken some time after the test has been given. (Alderson et al., 1995) The degree to which a test can predict candidates' future performance. (Hughes, 2003) Did the test accurately predict which test takers were going to perform best in their jobs/ in class/ etc.?

Consequential validity (impact) Does the introduction and use of the test have the intended social consequences? Is there any: bias in scoring and interpretation of results? unfairness in test use? positive or negative effect on teaching and learning?

Face validity Face validity refers to the test's “surface credibility or public acceptability” (Alderson, et al., 1995:172). Bachman (1990:307) states that “face validity is the appearance of real life.” Do test takers/ teachers/ politicians/ the public generally believe in the value of the test?

Face validity The assessment is credible to users: it looks as though it measures the skills or abilities of interest. For example, a multiple choice grammar test does not look as though it really tests the ability to speak English in real- world situations. All kinds of evidence could be used to show that people who pass the test are actually able to communicate effectively, but users may not be convinced because test takers are not actually required to speak. If the test does not have face validity, it is unlikely to be successful.

Construct validity In recent years the term construct validity has been used to refer to the general, overarching notion of validity. It is not enough to assert that a test has construct validity; empirical evidence is needed. (Hughes, 2003) The arguments for using the test as a reasonable justification for taking any decision must be presented and examined: validation.

Round-up: suitable data for test validity Face validity Questionnaires to and interviews with candidates, administrations and other users. Context validity a) Compare test content with specifications/syllabus. b) Questionnaires to and interviews with 'experts' such as teachers, subject specialists, applied linguists. c) Expert judges rate test items and texts according to precise list of criteria. Cognitive validity Students introspect on their test-taking procedures, either concurrently or retrospectively. Keystroke logs. Eye-tracking. Concurrent validity a) Compare students' test scores with their scores on another test. b) Compare students' test scores with teachers' rankings. c) Compare students' test scores with other measures of ability such as students' teacher rating.

Suitable data for test validity Predictive validity a) Compare students' test scores with their scores on tests taken some time later. b) Compare students' test scores with success in final exam. c) Compare students' test scores with other measures of their ability taken some time later, such as employers' assessments of their ability. Construct validity a) Compare performance on each subtest with other subtests. b) Compare performance on each subtest with total of all other subtests. d) Compare students' test score with students' biodata and psychological characteristics. e) Multitrait-multimethod studies. f) Factor analysis.

Who is a validator? Roles Designers Producers Organisers Administrators Assessees Scorers Users Example validity questions Does the design of the test reflect an adequate theory of language? Is an appropriate balance of abilities required for success on the test? Do the test items reflect the designers’ intentions? Is the test organised and administered in a way that will ensure fairness? Do assessees respond to the test tasks in a way that reflects realistic language processing? Do scorers consistently and accurately capture the qualities of test takers’ performance? Are decisions taken by users justified by the test?

Who is a validator? Assessment developers (teachers, testing agencies): to check the quality of their own work; to showcase the quality of their tests. Assessment users: to check that tests are giving them accurate and relevant information. Independent agencies: to enforce/ encourage good quality assessment.

Conclusion Test validity, according to Alderson et al., (1995:193), is 'time-consuming and difficult'. However, it is essential as a test without validity cannot be useful as a decision making tool. Applied linguists and teachers should focus more of their efforts on practical research in this field.