Validity: Conceptual Issues Furr & Bacharach Chapter 8.

Slides:



Advertisements
Similar presentations
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Reliability and Validity
Research Methodology Lecture No : 11 (Goodness Of Measures)
1 Psychology 305A: Personality Psychology September 11 Lecture 3.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
RESEARCH METHODS Lecture 18
VALIDITY.
Validity Does test measure what it says it does? Is the test useful? Can a test be reliable, but not valid? Can a test be valid, but not reliable?
Personality, 9e Jerry M. Burger
ITEC6310 Research Methods in Information Technology
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Understanding Validity for Teachers
Chapter 4. Validity: Does the test cover what we are told (or believe)
1 Evaluating Psychological Tests. 2 Psychological testing Suffers a credibility problem within the eyes of general public Two main problems –Tests used.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
PhD Research Seminar Series: Reliability and Validity in Tests and Measures Dr. K. A. Korb University of Jos.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Reliability and Validity what is measured and how well.
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
MGTO 324 Recruitment and Selections Validity I (Construct Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.
Introduction to Validity
Validity RMS – May 28, Measurement Reliability The extent to which a measurement gives results that are consistent.
Measurement Validity.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Week 4 Slides. Conscientiousness was most highly voted for construct We will also give other measures – protestant work ethic and turnover intentions.
1 Psychology 305A: Personality Psychology January 14 Lecture 3.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Validity and Reliability in Instrumentation : Research I: Basics Dr. Leonard February 24, 2010.
Chapter 6 - Standardized Measurement and Assessment
Psychology 3051 Psychology 305A: Theories of Personality Lecture 2 1.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Psychology 3307D Fall 2015 Lecture Two September 14, 2015.
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
Psychology 3051 Psychology 305A: Theories of Personality Lecture 3 1.
VALIDITY by Barli Tambunan/
Reliability and Validity in Research
Concept of Test Validity
Test Validity.
Validity and Reliability
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
Week 3 Class Discussion.
پرسشنامه کارگاه.
5. Reliability and Validity
Reliability and Validity of Measurement
Analyzing Reliability and Validity in Outcomes Assessment Part 1
RESEARCH METHODS Lecture 18
Measurement Concepts and scale evaluation
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Presentation transcript:

Validity: Conceptual Issues Furr & Bacharach Chapter 8

Contrasting Reliability & Validity Both fundamental to a sophisticated understanding of psychometrics Must have a clear understanding of the relationship between the two

Definitions – notice differences Reliability Degree to which differences in test scores reflect differences among people in their levels of the trait that affects those scores, whatever that trait may be Quantitative property of the test scores Validity Tied to interpretation of test score Tied to theory and implication of scores

LINK Validity requires reliability Stable traits (Intelligence & IQ) Measure at two point in time, scores should be stable across time (test-retest reliability) If not, the test cannot be a valid test of IQ States (Depression & BDI) If poor internal consistency, can’t be valid Reliability does not imply validity Stable Trait (Autism & AQ) May have excellent test-retest reliability or good internal consistency, but may not be interpreted in a valid manner

Iowa story Don’t want to hire people who might abuse clients anymore!!! Personality tests… Is there a test that measures the construct? Does it validly measure abusive personality? Is there a test that was designed to predict the likelihood that a particular individual will abuse people?

What is validity? Definition Implications of the contemporary definition of validity

Validity Definition Basic Definition The degree to which a test measures what it is supposed to measure Contemporary Definition “The degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses” of the test

Implications of the contemporary definition

Implication 1 Interpretation and use of test scores

Validity  about interpretation & use of test scores NEO-PI-R Conscientiousness scale – 48 items High scores reflect an “active process of planning, organizing and carrying out tasks,“ and people with high scores on this scale are “purposeful, strong willed, and determined”

NEO-PI-R  Conscientiousness Scale What is the correct question about the scale’s validity or invalidity? Are the test items valid or invalid? Are the test scores valid or invalid? Is the interpretation of the test scores valid or invalid?

Not “are items or scores valid or invalid?” The question is: Are the authors’ interpretations of the scores valid or invalid? Are conscientiousness scores validly interpreted in terms of planfulness, organization, and determination?

Proposed use of scores… Employers may use NEO-PI-R Conscientiousness Scale to screen potential employees BELIEF: Differentiates potentially better and worse employees? Predictive power of conscientiousness scale score?

Hammer is a useful tool if you need to drive a nail…

What if you need to saw a piece of wood? Hammer is not a useful tool irrespective of the need

Simplistic & inaccurate to say…  “Conscientiousness scale is valid without regard to the way in which it will be interpreted and used” Rather (what is accurate) Scores can be interpreted validly as an indicator of conscientiousness Scale is not valid as a measure of intelligence or extraversion Not a valid predictor of successful employment

Compare:  “Scores on the Conscientiousness scale of the NEO-PI-R are validly interpreted as a measure of conscientiousness.” vs.  “The Conscientiousness scale of the NEO-PI-R is valid.”

Implication 2 Validity is a matter of degree Strong vs. weak NOT valid vs. invalid Select test if strong enough evidence supporting intended interpretation and use 12/aqtest.html 12/aqtest.html

Concern about the Autism Spectrum Quotient… Marginal internal consistency, so reliability is already of concern What about validity? Is it valid to interpret a high score on the test as reflecting a high degree of autism traits?

Interpretation of AQ

Regret vs. Autism? (r =.45)

AQ m/wired/archive/9.12 /aqtest.html m/wired/archive/9.12 /aqtest.html

What is to be measured? What are the relative strengths of the alternatives that are available to measure that construct? Select best measures of specific characteristics to be assessed

Implication 3 Validity of a test’s interpretation is based on evidence and theory Human resources: “…in her experience, use of NEO-PI-R was useful in selection”

“Personality Color Test” Based on color psychology (Max Luscher) Color preferences reveal something about your personality Survey of scientific literature finds almost no empirical evidence of validity of color preferences as a measure of personality characteristics

Evidence for “color test” Less than clear Cite implies validity Web site: “Is the test reliable? We leave that to your opinion. We can only say that there are a number of corporations and colleges that use the Lûscher test as part of their hiring/admissions processes. It can be a useful tool for doctors and psychologists as well and is used to get a quick overview of potential issues patients may have in their lives.”

“Color Quiz” Is the test useful as a measure of personality? Denied employment based on such a test?

Empirical evidence & theoretical underpinnings? Data from high quality research must be available. Theory alone is not adequate.

Contemporary view of validity Although 3 forms, content, criterion, and construct, contemporary perspective highlights CONSTRUCT VALIDITY

Standards Standards for Educational and Psychological Testing - revised (1999) Co-published by American Education Research Association (AERA) American Psychological Association (APA) National Council on Measurement in Education (NCME

Remember Contemporary perspective highlights CONSTRUCT VALIDITY

Standards outline 5 types of evidence relevant for establishing validity of test interpretations (AERA, APA, NCME, 1999) Construct Validity Associations With Other Variables Internal Structure Test Content Response Processes Consequences of Use

Construct Validity Test Content

Validity Evidence: Test Content Match between the actual content of a test and the content that should be included in the test. Psychological nature of the construct should dictate the appropriate content of the test.

Face Validity Face validity – the degree to which a measure appears to be related to a specific construct in the judgment of non-experts such as test takers and representatives of the legal system. LOOKS relevant, and this fact may increase likelihood that the test will be well received by users and takers

Threats to content validity Construct-irrelevant content – e.g., test includes questions on content not covered in book, lecture, or discussion Construct under-representation – e.g., test content fails to represent the full scope of the content implied from the construct Related practical issues – e.g., time, respondent fatigue, respondent attention, and etc. – Is content a fair representation?

Content Validity vs. Face Validity Content validity is the degree to which the content reflects the full domain of the construct & can only be evaluated by experts who have a deep understanding of the construct Face validity is the degree to which non-experts perceive the test to be relevant to what they believe is being measured by it

Construct Validity Internal Structure

Validity Evidence: Internal Structure of the Test For a test to be validly interpreted as a measure of a particular construct, the actual structure of the test should match the theoretically based structure of the construct Does the theoretical basis suggest a unidimensional or a multi-dimensional structure?

Internal Structure Often assess via examination of factor structure (factor analysis) Items that are more strongly correlated with each other than other items form clusters called factors… Factor analysis should clarify the number of factors within a set of test questions Example: Self esteem – is the construct uni- or multi-dimensional?

Factor analysis 1.Clarifies number of factors 2.Reveals associations among the factors within a multi-dimensional test 3.Identifies which items are linked to which factors

Rosenberg Self-Esteem Inventory (RSEI; Rosenberg 1989) 1.On the whole, I am satisfied with myself 2.At times, I think I am no good at all. 3.I feel that I have a number of good qualities 4.I am able to do things as well as most other people 5.I feel I do not have much to be proud of 6.I certainly feel useless at time 7.I feel that I’m a person of worth, at least on an equal plan with others 8.I wish I could have more respect for myself 9.All in all, I am inclined to feel that I am a failure 10.I take a positive attitude toward myself

RSEI - Scree Plot Number of factors evident in the plot? Question: This scree plot provides evidence for what type of structure a.Unidimensional b.Multidimensional

Construct Validity Response Processes

Validity Evidence: Response Processes Match between the psychological processes that respondents actually use when completing a measure and the processes that they should use. When I say start, raise your finger when you feel 10 s have elapsed. Assumption: should use “feel” (feels like time is up) but could use another process such as covert counting, copying others, or looking at a second hand on a watch

Response processes If a different response process used is different than the one assumed to be used, then the scores may not be interpretable as the test developer intended Attention to the internal feel of time passing vs. use of some selected process to intentionally mark passage of time

Construct Validity Associations With Other Variables

Validity Evidence: Association With Other Variables Match between a measure’s actual associations with other measures and the associations that the test should have with the other measures.

Convergent evidence The degree to which test scores are correlated with tests of related constructs

Discriminant evidence Degree to which test scores are uncorrelated with tests of unrelated constructs

Example Hypothesis: Schizophrenia and autism are diametrically opposed constructs

Measure of autism should be uncorrelated with measures of schizophrenia

Support for C & B’s theory? NO: Convergent evidence - autism measure correlated positively with sz measures Finding: AU & SZ are related constructs? i.e., Crespi & Badcock are wrong Or Not really yes, but could assume strong correlations indicate weak validity of AQ as a measure of autism construct

Concurrent validity evidence The degree to which test scores are correlated with other relevant variables that are measured at the same time as the primary test of interest SAT is a measure of skills needed for academic success? Compare SAT administered during high school senior year to hs senior year GPA

Predictive validity evidence The degree to which test scores are correlated with relevant variables that are measured at a future point in time. SAT is a measure of skills needed for academic success? Compare SAT administered during senior year of high school to college freshman year GPA

Validity Evidence: Consequences of Testing Social consequences of test are a facet of validity… Standards for Educational and Psychological Testing Validity includes “the intended and unintended consequences of test use” E.g., does a construct and its measurement benefit one group?

Not all agree… Consequences of a testing program should be considered a facet of the scientific evaluation of the meaning of a test score. Some feel that this is an intrusion of politics into science… Can science be separated from personal and social values?

Summary Conceptual basis for validity Construct Validity Associations With Other Variables Internal Structure Test Content Response Processes Consequences of Use

Validity Standard for Education and Psychological Tests (1999) The degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of a test

Validity Are decisions based on valid interpretations of test scores? Educational placement Access to services Hiring Clinical decisions