Face, Content & Construct Validity

Slides:



Advertisements
Similar presentations
 Degree to which inferences made using data are justified or supported by evidence  Some types of validity ◦ Criterion-related ◦ Content ◦ Construct.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
Reliability and Validity
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
MEQ Analysis. Outline Validity Validity Reliability Reliability Difficulty Index Difficulty Index Power of Discrimination Power of Discrimination.
5/15/2015Marketing Research1 MEASUREMENT  An attempt to provide an objective estimate of a natural phenomenon ◦ e.g. measuring height ◦ or weight.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Operationalizing Concepts Issues in operationally defining your concepts, validity, reliability and scales.
VALIDITY.
RELIABILITY & VALIDITY
Introduction to Psychometrics Psychometrics & Measurement Validity Some important language Properties of a “good measure” –Standardization –Reliability.
Reliability & Validity
Developing a Hiring System Reliability of Measurement.
Psychometrics, Measurement Validity & Data Collection
Reliability and Validity
Introduction to Psychometrics
Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.
Criterion-related Validity About asking if a test “is valid” Criterion related validity types –Predictive, concurrent, & postdictive –Incremental, local,
Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a “good item” Item Analysis Internal Reliability.
Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.
Psych 231: Research Methods in Psychology
C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,
Validity, Reliability, & Sampling
Practical Psychometrics Preliminary Decisions Components of an item # Items & Response Approach to the Validation Process.
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Understanding Validity for Teachers
Chapter 4. Validity: Does the test cover what we are told (or believe)
Validity.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Reliability, Validity, & Scaling
Ch 6 Validity of Instrument
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Reliability and Validity what is measured and how well.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
MGTO 324 Recruitment and Selections Validity II (Criterion Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
MGTO 231 Human Resources Management Personnel selection II Dr. Kin Fai Ellick WONG.
Validity Is the Test Appropriate, Useful, and Meaningful?
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Measurement Validity.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Chapter 4 Validity Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Reliability: The degree to which a measurement can be successfully repeated.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a “good item” Item Analysis Internal Reliability.
PSY 231: Introduction to Industrial and Organizational Psychology
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
VALIDITY by Barli Tambunan/
Measurement: Part 2.
Reliability and Validity
Questions What are the sources of error in measurement?
Test Validity.
Classroom Assessment Validity And Bias in Assessment.
Research Methods Lesson 2 Validity.
Week 3 Class Discussion.
پرسشنامه کارگاه.
RESEARCH METHODS Lecture 18
Presentation transcript:

Face, Content & Construct Validity Kinds of attributes we measure Face Validity Content Validity Construct Validity Discriminant Validity  Convergent & Divergent evidence Summary of Reliability & Validity types and how they are demonstrated

What are the different types of “things we measure” ??? The most commonly discussed types are ... Achievement -- “performance” broadly defined (judgements) e.g., scholastic skills, job-related skills, research DVs, etc. Attitude/Opinion -- “how things should be” (sentiments) polls, product evaluations, etc. Personality -- “characterological attributes” (keyed sentiments) anxiety, psychoses, assertiveness, etc. There are other types of measures that are often used… Social Skills -- achievement or personality ?? Aptitude -- “how well some will perform after then are trained and experiences” but measures before the training & experience” some combo of achievement, personality and “likes” IQ -- is it achievement (things learned) or is it “aptitude for academics, career and life” ??

Face Validity Does the test “look like” a measure of the construct of interest? “looks like” a measure of the desired construct to a member of the target population will someone recognize the type of information they are responding to? Possible advantage of face validity .. If the respondent knows what information we are looking for, they can use that “context” to help interpret the questions and provide more useful, accurate answers Possible limitation of face validity … if the respondent knows what information we are looking for, they might try to “bend & shape” their answers to what they think we want -- “fake good” or “fake bad”

Content Validity Does the test contain items from the desired “content domain”? Based on assessment by experts in that content domain Is especially important when a test is designed to have low face validity e.g., tests of “honesty” used for hiring decisions Is generally simpler for “achievement tests” than for “psychological constructs” (or other “less concrete” ideas) e.g., it is a lot easier for “math experts” to agree whether or not an item should be on an algebra test than it is for “psychological experts” to agree whether or not an items should be on a measure of depression. Content validity is not “tested for”. Rather it is “assured” by the informed item selections made by experts in the domain.

Construct Validity Does the test interrelate with other tests as a measure of this construct should ? We use the term construct to remind ourselves that many of the terms we use do not have an objective, concrete reality. Rather they are “made up” or “constructed” by us in our attempts to organize and make sense of behavior and other psychological processes attention to construct validity reminds us that our defense of the constructs we create is really based on the “whole package” of how the measures of different constructs relate to each other So, construct validity “begins” with content validity (are these the right types of items) and then adds the question, “does this test relate as it should to other tests of similar and different constructs?

Discriminant Validity The statistical assessment of Construct Validity … Discriminant Validity Does the test show the “right” pattern of interrelationships with other variables? -- has two parts Convergent Validity -- test correlates with other measures of similar constructs Divergent Validity -- test isn’t correlated with measures of “other, different constructs” e.g., a new measure of depression should … have “strong” correlations with other measures of “depression” have negative correlations with measures of “happiness” have “substantial” correlation with measures of “anxiety” have “minimal” correlations with tests of “physical health”, “faking bad”, “self-evaluation”, etc.

Evaluate this measure of depression…. New Dep Dep1 Dep2 Anx Happy PhyHlth FakBad New Dep Old Dep1 .61 Old Dep2 .49 .76 Anx .43 .30 .28 Happy -.59 -.61 -.56 -.75 PhyHlth .60 .18 .22 .45 -.35 FakBad .55 .14 .26 .10 -.21 .31 Tell the elements of discriminant validity tested and the “conclusion”

Evaluate this measure of depression…. New Dep Dep1 Dep2 Anx Happy PhyHlth FakBad New Dep convergent validity (but bit lower than r(dep1, dep2) Old Dep1 .61 Old Dep2 .49 .76 more correlated with anx than dep1 or dep2 Anx .43 .30 .28 corr w/ happy about same as Dep1-2 Happy -.59 -.61 -.56 -.75 “too” r with PhyHlth PhyHlth .60 .18 .22 .45 -.35 “too” r with FakBad FakBad .55 .14 .26 .10 -.21 .31 This pattern of results does not show strong discriminant validity !!

Summary Based on the things we’ve discussed, what are the analyses we should do to “validate” a measure, what order do we do them (consider the flow chart next page) and why do we do each? Inter-rater reliability -- if test is not “objective” Item-analysis -- looking for items not “positive monotonic” Chronbach’s  -- internal reliability Test-Retest Analysis (r & wg-t) -- temporal reliability Alternate Forms (if there are two forms of the test) Content Validity -- inspection of items for “proper domain” Construct Validity -- correlation and factor analyses to check on discriminant validity of the measure Criterion-related Validity -- predictive, concurrent and/or postdictive