Test Validity: What it is, and why we care.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Cal State Northridge Psy 427 Andrew Ainsworth PhD
The Research Consumer Evaluates Measurement Reliability and Validity
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
VALIDITY AND RELIABILITY
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
RESEARCH METHODS Lecture 18
REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.
VALIDITY.
Predictive Tests. Overview Introduction Some theoretical issues –The failings of human intuitions in prediction –Issues in formal prediction –Inference.
Concept of Measurement
Psych 231: Research Methods in Psychology
Validity, Reliability, & Sampling
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Understanding Validity for Teachers
Chapter 4. Validity: Does the test cover what we are told (or believe)
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Technical Issues Two concerns Validity Reliability
Validity and Reliability
Ch 6 Validity of Instrument
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
MGTO 324 Recruitment and Selections Validity II (Criterion Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.
Validity & Practicality
Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
EDU 8603 Day 6. What do the following numbers mean?
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Validity.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity Test Validity: What it is, and why we care.
Chapter 4 Validity Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Week 4 Slides. Conscientiousness was most highly voted for construct We will also give other measures – protestant work ethic and turnover intentions.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Chapter 6 - Standardized Measurement and Assessment
Reliability and Validity in Testing. What is Reliability? Consistency Accuracy There is a value related to reliability that ranges from -1 to 1.
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
Ch. 5 Measurement Concepts.
VALIDITY by Barli Tambunan/
Questions What are the sources of error in measurement?
Concept of Test Validity
Test Validity.
Evaluation of measuring tools: validity
Reliability & Validity
پرسشنامه کارگاه.
Reliability and Validity of Measurement
RESEARCH METHODS Lecture 18
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Presentation transcript:

Test Validity: What it is, and why we care.

Validity What is validity? Types of validity Content validity Criterion-related validity Construct Validity Incremental Validity Validity

What is validity? The validity of a test is the extent to which it measures what it is designed to measure As we shall see, there are many ways for a test to fail or succeed = validity is not a single measure Validity

How to measure validity Analyze the content of the test Relate test scores to specific criteria Examine the psychological constructs measured by the test Validity

Content validity Content validity = the extent to which the test elicits a range of responses over the range of of skills, understanding, or behavior the test measures Most important with achievement tests, because there are usually no external criteria How can we determine content validity? (or: How will you know if you get given a good exam in this class?) Compare the questions on the test to the subject matter If it looks like a measure of the skill or knowledge it is supposed to measure, we say it has face validity Validity

Criterion-related validity Criterion-related validity depends upon relating test scores to performance on some relevant criterion or set of criteria i.e. Validate tests against school marks, supervisor ratings, or dollar value of productive work Two kinds: concurrent and predictive Validity

Criterion-related validity II Concurrent validity = the criterion are available at the time of testing i.e. give the test to subjects selected for their economic background or diagnostic group the validity of the MMPI was determined in this manner Predictive validity = the criterion are not available at the time of testing concerned with how well test scores predict future performance For example, IQ tests should correlate with academic ratings, grades, problem-solving skills etc. A good r-value would be .60: How much variance is accounted for? Validity

What affects criterion-related validity? i.) Moderator variables: those characteristics that define groups, such as sex, age, personality type etc. - a test that is well-validated on one group may be less good with another - validity is usually better with more heterogeneous groups, because the range of behaviors and test scores is larger And therefore: ii.) Base rates: Tests are less effective when base rates are very high (why?) or very low Validity

What affects criterion-related validity? iii.) Test length - For similar reasons of the size of the domain sampled, longer tests tend to be more reliable - Note that this depends on the questions being independent (= every question increasing information) - when it is not, longer tests arenot more reliable - eg. short forms of WAIS Validity

What affects criterion-related validity? iv.) The nature of the validity criterion - criterion can be contaminated, especially if the interpretation of test responses is not well-specified - then there is confusion between the validation criteria and the test results = self-fulfilling prophecies Validity

Construct validity Construct validity = the extent to which a test measures the construct it claims to measure Does an intelligence test measure intelligence? Does a neuroticism test measure neuroticism? What is latent hostility since it is latent? It is of particular importance when the thing measured by a test is not operationally-defined (as when it is obtained by factor analysis) As Meehl notes, construct validity is very general and often very difficult to determine in a definitive manner Validity

What is a construct, anyway? Meehl’s nomological net: 1.) To say what something is means to say what laws it is subject to. The sum of all laws is a construct’s nomological network. 2.) Laws may relate observable and theoretical elements 3.) A construct is only admissable if at least some of the laws to which it is subject involve observables 4.) Elaboration of a construct’s nomological net = learning more about that construct 5.) Ockham’s razor + Einstein’s addendum (make things as simple as possible, but no simpler) 6.) Identity means ‘playing the same role in the same net’ Validity

How to measure construct validity i.) Get expert judgments of the content ii.) Analyze the internal consistency of the test iii.) Study the relationships between test scores and other non-test variables which are known/presumed to relate the same construct (sometimes called ‘empirical validity’) - eg. Meehl mentions Binet’s vindication by teachers iv.) Question your subjects about their responses in order to elicit underlying reasons for their responses. v.) Demonstrate expected changes over time Validity

How to measure construct validity vi.) Study the relationships between test scores and other test scores which are known/presumed to relate to the same construct - Convergent versus discriminant validity - Multitrait-multimethod approach: Correlations of the same trait measured by the same and different measures > correlations of a different trait measured by the same and different measures What if correlations of measures of different traits using the same method > correlations of measures of the same trait using different methods? Validity

Incremental validity Incremental validity refers to the amount of gain in predictive value obtained by using a particular test (or test subset) If we give N tests and are 90% sure of the diagnosis after that, and the next test will make us 91% sure, is it worth ‘buying’ that gain in validity? Cost/benefit analysis is required. Validity

Measuring validation error Validity coefficient = correlation (r) between test score and a criterion There is no general answer to the question: how high should a validity coefficient be? Validity

Measuring validation error Coefficient of alienation = k = (1 - r2)0.5 = the proportion of the error inherent in guessing that your estimate has If k = 1.0, you have 100% of the error you’d have had if you just guessed (since this means your r was 0) If k = 0, you have achieved perfection = your r was 1, and there was no error at all* If k = 0.6, you have 60% of the error you’d have had if you guessed * N.B. This never happens. Validity

Why should we care? We care because k is useful in interpreting accuracy of an individual’s scores r = 0.6 (good), k = 0.80 (not good) r = 0.7 (great), k = 0.71 (not so great) r = 0.95 (fantastic!), k = 0.31 (so so) Since even high values of r give us fairly large error margins, the prediction of any individual’s criterion score is always accompanied by a wide margin of error The moral: If you want to predict an individual’s performance, you need an extremely high validity coefficient (and even then, you probably won’t be able to…) Validity