V ALIDITY - C ONSEQUANTIALISM Assoc. Prof. Dr. Sehnaz Sahinkarakas.

Slides:



Advertisements
Similar presentations
Elliott / October Understanding the Construct to be Assessed Stephen N. Elliott, PhD Learning Science Institute & Dept. of Special Education Vanderbilt.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
The Research Consumer Evaluates Measurement Reliability and Validity
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 6 Validity.
Assessment: Reliability, Validity, and Absence of bias
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
Chapter 4 Validity.
Reliability or Validity Reliability gets more attention: n n Easier to understand n n Easier to measure n n More formulas (like stats!) n n Base for validity.
VALIDITY.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.
Classroom Assessment A Practical Guide for Educators by Craig A
Assessing and Evaluating Learning
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Understanding Validity for Teachers
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Chapter 4. Validity: Does the test cover what we are told (or believe)
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Shawna Williams BC TEAL Annual Conference May 24, 2014.
Technical Issues Two concerns Validity Reliability
Validity and Reliability
Measurement in Exercise and Sport Psychology Research EPHE 348.
Bryman: Social Research Methods, 4 th edition What is a concept? Concepts are: Building blocks of theory Labels that we give to elements of the social.
Validity & Practicality
Principles in language testing What is a good test?
Measuring Complex Achievement
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
MGTO 231 Human Resources Management Personnel selection II Dr. Kin Fai Ellick WONG.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
TRUSTWORTHINESS OF THE RESEARCH Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Performance-Based Assessment HPHE 3150 Dr. Ayers.
Measurement Validity.
EPSY 546: LECTURE 3 GENERALIZABILITY THEORY AND VALIDITY
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
The Development and Validation of the Evaluation Involvement Scale for Use in Multi-site Evaluations Stacie A. ToalUniversity of Minnesota Why Validate.
ARG symposium discussion Dylan Wiliam Annual conference of the British Educational Research Association; London, UK:
Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,
Chapter 6 - Standardized Measurement and Assessment
The task The task: You need to create a set of slides to use as your evaluation tools Once created please print them out and bring to your lesson. Use.
PRINCIPLES OF LANGUAGE ASSESSMENT Riko Arfiyantama Ratnawati Olivia.
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Validity EDUC 307 Chapter 3. What does this test tell me?  Validity as defined by Chase: "A test is valid to the extent that it helps educators make.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Reliability and Validity
BILC Seminar, Budapest, October 2016
VALIDITY by Barli Tambunan/
Concept of Test Validity
Test Validity.
Introduction to the Validation Phase
Validity and Reliability
Journalism 614: Reliability and Validity
Classroom Assessment Validity And Bias in Assessment.
Validity.
Week 3 Class Discussion.
پرسشنامه کارگاه.
Reliability and Validity of Measurement
VALIDITY Ceren Çınar.
Assessment Literacy: Test Purpose and Use
Why do we assess?.
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Presentation transcript:

V ALIDITY - C ONSEQUANTIALISM Assoc. Prof. Dr. Sehnaz Sahinkarakas

“ Effect-driven testing ” (Fulcher & Davidson, 2007) “the effect that the test is intended to have and to structure the test development to achieve that effect” (p.144) What does this mean?

D EFINITION OF VALIDITY “Overall judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions on the basis of test scores or other modes of assessment” (Messick, 1995, p. 741). What is score? In general it is “any coding or summarization of observed consistencies or performance regularities on a test, questionnaire, observation procedure, or other assessment devices such as work samples, portfolios, and realistic problem simulations” (p. 741).

Then validity is making inferences about scores; scores are the reflections of a test taker’s knowledge and/or skills based on test tasks. Different from early definitions of validity: the degree of correlation between the test and the criterion (validity coefficient) In early definition: there is an upper limit for the possible correlation it is directly related to the reliability of the test (without high reliability a test cannot be valid) New definition (especially after Messicks), validity changed as the meaning of the test scores, not a property

Final remarks for validity (and reliability, fairness…): not based on just measurement principles; they are social values correlation coefficients and/or content validity analysis are not enough to assume validity (Messick). So, “score validation is an empirical evaluation of the meaning and consequences of measurement” (Messick)

C ONSTRUCT V ALIDITY What is construct? To define a concept in such a way that it becomes measureable (operational definition) it can have relationship with other different constructs (e.g. the more anxious, the less self-confidence) Construct validity is the degree to which inferences can be made from the operational definitions to theoretical constructs those definitions are based What does this mean?

Two things to consider in construct validation: Theory (what goes on in our mind: ideas, theories, beliefs…) Observation (what we see happening around us; our actual program/treatment) i.e., we develop something (observation) to reflect what is in our mind (theory) Construct validity is assessing how well we have transformed our ideas/theories to our actual programs/measures What does this mean in testing? How do we do it in testing?

S OURCES OF I NVALIDITY Two major threats: Construct underrepresentation: assessment is too narrow: does not include important dimensions of the construct Construct-irrelevant variance: assessment is too broad: contains variance associated with other distinct constructs

C ONSTRUCT -I RRELEVANT V ARIABLE Two kinds Construct-irrelevant difficulty (e.g., undue reading text based on subject-matter knowledge): leads to invalid low scores Construct-irrelevant easiness (e.g., highly familiar texts to some): leads to invalid high scores What do you think about KPDS/YDS in terms of threats to validity

S OURCES OF E VIDENCE IN C ONSTRUCT V ALIDITY (M ESSICK, 1995) Construct Validity= the evidential basis for score interpretation How do we interpret scores? Any score interpretation is needed, not just ‘theoretical constructs’ How do we do this?

E VIDENCE -R ELATED V ALIDITY Two types: Convergent validity consists of providing evidence that two tests that are believed to measure closely related skills or types of knowledge correlate strongly. (i.e. The test MEASURES what it clasims to measure) Discriminant validity consists of providing evidence that two tests that do not measure closely related skills or types of knowledge do not correlate strongly. (i.e. The test does NOT MEASURE irrelevant attributes)

A SPECTS OF C ONSTRUCT V ALIDITY Validity is a unified concept but it can be differentiated into distinct aspects: Content Substantive Structural Generalizability External Consequential

C ONTENT A SPECT Content relevance; Representativeness; Technical quality (to what extent does it represent the domain?) It requires identifying the construct DOMAIN to be assessed To what extent does the domain/task cover the construct All important parts of the construct domain should be covered

S UBSTANTIVE A SPECT The process of the construct and the degree these processes are reflected It includes content aspect in it but empirical evidence is also needed. This can be done using a variety of sources; e.g. think-aloud protocols

The concept bridging content and substantive is representativeness. Representativeness has two distinct meanings: Mental representation (cognitive psyhchology) Brunswinkian sense of ecological sampling: correlation between a cue and a property. (e.g. Color of banana is a cue and it indicates the ripeness of the fruit)

S TRUCTURAL A SPECT Related to scoring The scoring criteria and rubrics should be rationally developed (based on the constructs)

G ENERALIZABILITY Interpretations should not be limited to the task assessed Should be generalizable to the construct domain (degree of correlation between the task and the others)

E XTERNAL V ARIABLES Scores’ relationship with other measures and nonassessment behaviours Convergent (correspondence between measures of the same construct) and Discriminant evidence (distinctness from measures of other constructs) are important

C ONSEQUENCES Evaluating intended and unintended consequences of score interpretation both positive and negative impact But, negative impact should NOT be because of the construct underrepresentation or construct irrelevant variance. Two facets: (a) justification of the testing based on score meaning or consequences contributing to score valuation; (b) function or outcome of the testing—as interpretaion or applied use

F ACETS OF V ALIDITY AS A P ROGRESSIVE M ATRIX (M ESSICKS, 1995, P. 748) Test InterpretationTest Use Evidential BasisConstruct Validity (CV) CV + Relevance/Utility(R/U) Consequential Basis CV + Value Implication (VI) CV + R/U + VI + Social Consequences Two facets: (a) justification of the testing based on score meaning or consequences contributing to score valuation; (b) function or outcome of the testing—as interpretaion or applied use. When they are crossed with each other a four-fold classification is obtained

Construct validity appears in every cell in the figure. This means: Validity issues are unified into a unitary concept But also distinct features of construct validity should be emphasized What is the implication here? Both meaning and values are interwined in the validation process. Thus, ‘Validity and values are one imperative, not two, and test validation implicates both the science and the ethics of assessment, which is why validity has force as a social value’ (Messick, 1995, p. 749).

C ONSEQUENTIAL V ALIDITY & W ASHBACK Messician view (Unified version) of Construct Validity = Considering the consequences of test use (i.e., washback) What does this mean in validity studies?

Washback is a particular instance of consequential aspect of construct validity Investigating washback and other consequences is a crucial step in the process of test validation i.e., Washback is one (not the only) indicator of consequential aspect of validity It is important to investigate washback to establish the validity of a test

Put it differently: Modern paradigm of validity comes with its consequential nature Test impact is part of a validation argument Thus, effect-driven testing should be considered: testers should build tests with the intended effects in mind

To put it all together Value implication + Social consequences = CONSEQUENTIAL VALIDITY (two fairness-related elements of Messick’s consequential validity)

I MPLICATION Positive washback Consequential validity Promoting learning Negative washback Lack of validity Unfairness

But who brings about washback (positive or negative)? People in classrooms (T / Ss)? Test Developers? For Fulcher and Davidson, it is the people in classrooms Thus more attention should be given to teachers’ beliefs about teaching and learning and the degree of their PROFESSIONALISM

T ASK A9.2 Course book (p.143) Select one large-scale test you are familiar with. What is its influence upon whom? Does it seem reasonable to define these tests as their influence as well?