Validity Test Validity: What it is, and why we care.

Slides:

Advertisements

Similar presentations

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.

Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

 Degree to which inferences made using data are justified or supported by evidence  Some types of validity ◦ Criterion-related ◦ Content ◦ Construct.

Cal State Northridge Psy 427 Andrew Ainsworth PhD

The Research Consumer Evaluates Measurement Reliability and Validity

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

VALIDITY AND RELIABILITY

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.

Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.

RESEARCH METHODS Lecture 18

Chapter 4 Validity.

Test Validity: What it is, and why we care.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Concept of Measurement

Class 3: Thursday, Sept. 16 Reliability and Validity of Measurements Introduction to Regression Analysis Simple Linear Regression (2.3)

Validity, Reliability, & Sampling

Chapter 7 Correlational Research Gay, Mills, and Airasian

Chapter 7 Evaluating What a Test Really Measures

Classroom Assessment A Practical Guide for Educators by Craig A

Norms & Norming Raw score: straightforward, unmodified accounting of performance Norms: test performance data of a particular group of test takers that.

Understanding Validity for Teachers

Chapter 4. Validity: Does the test cover what we are told (or believe)

Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.

Measurement and Data Quality

Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.

Instrumentation.

Foundations of Educational Measurement

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Technical Adequacy Session One Part Three.

MGTO 324 Recruitment and Selections Validity II (Criterion Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.

Validity & Practicality

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.

Reliability & Validity

Tests and Measurements Intersession 2006.

EDU 8603 Day 6. What do the following numbers mean?

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Measurement Validity.

Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Chapter 4 Validity Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Week 4 Slides. Conscientiousness was most highly voted for construct We will also give other measures – protestant work ethic and turnover intentions.

Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.

Education 795 Class Notes Factor Analysis Note set 6.

Chapter 6 - Standardized Measurement and Assessment

Reliability and Validity in Testing. What is Reliability? Consistency Accuracy There is a value related to reliability that ranges from -1 to 1.

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.

Ch. 5 Measurement Concepts.

VALIDITY by Barli Tambunan/

Questions What are the sources of error in measurement?

Concept of Test Validity

Evaluation of measuring tools: validity

Journalism 614: Reliability and Validity

Reliability & Validity

Week 3 Class Discussion.

پرسشنامه کارگاه.

Reliability and Validity of Measurement

RESEARCH METHODS Lecture 18

Cal State Northridge Psy 427 Andrew Ainsworth PhD

Presentation transcript:

Validity Test Validity: What it is, and why we care.

Validity What is validity? What is a construct? Types of validity –Content validity –Criterion-related validity –Construct Validity –Incremental Validity

Validity What is validity? The validity of a test is the extent to which it measures the construct that it is designed to measure –As we shall see, there are many ways for a test to fail or succeed = validity is not a single measure

Validity Paul Meehl: What is a construct? Meehl’s definition of a construct has 6 main elements, as follows: 1.) To say what a construct is means to say what laws it is subject to. - The sum of all laws is called a construct’s nomological network.

Validity What does ‘nomological’ mean? I had always believed (and previously said) it came from: ad. L. nomin meaning ‘name’ I was wrong. In fact it comes from: ad. Gr. nom combining form of a word meaning ‘law’ So ‘psychonomics’ is the study of the laws of the psyche, and ‘nomological network’ refers to a network whose components can be described by laws or rules

Validity Image from:

Validity Paul Meehl: What is a construct? 2.) Laws may relate observable and theoretical elements - The relations must be ‘lawful’, but they may be either causal or statistical (what’s the relation?) - What are the ‘theoretical elements’? Constructs!

Validity Paul Meehl: What is a construct? - What are the ‘theoretical elements’? Constructs! - To escape from circularity and pure speculation, we need to anchor the nomological net, hence: 3.) A construct is only admissable if at least some of the laws to which it is subject involve observables - If not, we could define a self-consistent network of ideas that had no relevance to the real world (and many such networks have been defined!)

Validity Paul Meehl: What is a construct? 4.) Elaboration of a construct’s nomological net = learning more about that construct - We elaborate a construct by drawing new relations, either between elements already in the network, or between those elements and new elements outside of the network - This elaboration is precisely the work of psychometrics

Validity Paul Meehl: What is a construct? 5.) Ockham’s razor + Einstein’s addendum - that is: make things as simple as possible, but no simpler 6.) Identity means ‘playing the same role in the same network’ - If it looks like a duck, walks like a duck, and quacks like a duck: then it is a duck!* * at least pending further investigation

Validity How to measure validity Analyze the content of the test Relate test scores to specific criteria Examine the psychological constructs measured by the test

Validity Content validity Content validity = the extent to which the test elicits a range of responses over the range of of skills, understanding, or behavior the test measures Most important with achievement tests, because there are usually no external criteria How can we determine content validity? (or: How will you know if you get given a good exam next week?) Compare the questions on the test to the subject matter If it looks like a measure of the skill or knowledge it is supposed to measure, we say it has face validity

Validity Criterion-related validity Criterion-related validity depends upon relating test scores to performance on some relevant criterion or set of criteria – i.e. Validate tests against school marks, supervisor ratings, or dollar value of productive work Two kinds: concurrent and predictive

Validity Criterion-related validity II Concurrent validity = the criterion are available at the time of testing –i.e. give the test to subjects selected for their economic background or diagnostic group –the validity of the MMPI was determined in this manner Predictive validity = the criterion are not available at the time of testing –concerned with how well test scores predict future performance –For example, IQ tests should correlate with academic ratings, grades, problem-solving skills etc. –A good r-value would be.60

Validity What affects criterion-related validity? i.) Moderator variables: those characteristics that define groups, such as sex, age, personality type etc. - a test that is well-validated on one group may be less good with another - validity is usually better with more heterogeneous groups, because the range of behaviors and test scores is larger And therefore: ii.) Base rates: Tests are less effective when base rates are very high or very low (that is, whenever they are skewed from 50/50)

Validity What affects criterion-related validity? iii.) Test length - For similar reasons of the size of the domain sampled (think of the binomial rabbits and the quincunx), longer tests tend to be more reliably related to the criterion, as we have previously discussed - Note that this depends on the questions being independent (= every question increasing information) - when it is not, longer tests are not more reliable - eg. short forms of WAIS - However, note that independence need only be partial (r < 1, but not necessarily r = 0)

Validity What affects criterion-related validity? iv.) The nature of the validity criterion - Criterion can be contaminated, especially if the interpretation of test responses is not well-specified - then there is confusion between the validation criteria and the test results = self-fulfilling prophecies - in essence we are then stuck at the theoretical level of the nomological net, with no way for empirical study to tell us we are wrong

Validity Construct validity Construct validity = the extent to which a test measures the construct it claims to measure –Does an intelligence test measure intelligence? Does a neuroticism test measure neuroticism? What is latent hostility since it is latent? It is of particular importance when the thing measured by a test is not operationally-defined (as when it is obtained by factor analysis) As Meehl notes, construct validity is very general and often very difficult to determine in a definitive manner

Validity How to measure construct validity i.) Get expert judgments of the content ii.) Analyze the internal consistency of the test (How can we do this?) iii.) Study the relationships between test scores and other non- test variables which are known/presumed to relate the same construct (sometimes called ‘empirical validity’) - eg. Meehl mentions Binet’s vindication by teachers iv.) Question your subjects about their responses in order to elicit underlying reasons for their responses. v.) Demonstrate expected changes over time

Validity How to measure construct validity vi.) Study the relationships between test scores and other test scores which are known/presumed to relate to (or depart from) the construct - Convergent versus discriminant validity - Multitrait-multimethod approach: Correlations of the same trait measured by the same and different measures > correlations of a different trait measured by the same and different measures What if correlations of measures of different traits using the same method > correlations of measures of the same trait using different methods?

Validity Incremental validity Incremental validity refers to the amount of gain in predictive value obtained by using a particular test (or test subset) If we give N tests and are 90% sure of the diagnosis after that, and the next test will make us 91% sure, is it worth ‘buying’ that gain in validity? –Cost/benefit analysis is required.

Validity Validity coeffiecient Validity coefficient = correlation (r) between test score and a criterion There is no general answer to the questions: how high should a validity coefficient be? Or: What shall we use for a criterion?

Validity Measuring validation error Coefficient of determination = r 2 = the amount of variation explained Coefficient of alienation = k = (1 - r 2 ) 0.5 k is the inverse to correlation: a measure of nonassociation between two variables If k = 1.0, you have 100% of the error you’d have had if you just guessed (since this means your r was 0) If k = 0, you have achieved perfection = your r was 1, and there was no error at all* If k = 0.6, you have 60% of the error you’d have had if you guessed * N.B. This never happens.

Validity Example The correlation between SAT scores and college performance is How much of the variation in college performance is explained by SAT Scores? r 2 = 0.16, so 16% of the variance is explained (and so 84% is not explained). What is the coefficient of alienation? –Sqrt( ) = Sqrt(0.84) = 0.92

Validity Why should we care? k is useful in reporting accuracy of a test in a way which is unit free –the standard error of estimate is not: it will be bigger for tests that measure with larger units than tests that measure with smaller units