Test Validity.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Research Methodology Lecture No : 11 (Goodness Of Measures)
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Measurement in Psychology: Validity Lawrence R. Gordon Psychology Research Methods I.
RESEARCH METHODS Lecture 18
Chapter 7 Evaluating What a Test Really Measures
Intelligence A.P. Psych Information adapted from:
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Reliability and Validity what is measured and how well.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Technical Adequacy Session One Part Three.
Lecture 6: Reliability and validity of scales (cont) 1. In relation to scales, define the following terms: - Content validity - Criterion validity (concurrent.
Unit 11 – Intelligence and Personality Assessing Intelligence and Test Construction.
MGTO 231 Human Resources Management Personnel selection II Dr. Kin Fai Ellick WONG.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Measurement Validity.
Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Chapter 6 - Standardized Measurement and Assessment
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Assessing Intelligence. Test Construction Standardization – defining the meaning of scores by comparing to a pretested “standard group”. Reliability –
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Measurement and Scaling Concepts
Chapter 9 Intelligence. Objectives 9.1 The Nature of Intelligence Define intelligence from an adaptation perspective. Compare and contrast theories of.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Survey Methodology Reliability and Validity
Reliability and Validity
Chapter 2 Theoretical statement:
Cari-Ana, Alexis, Sean, Matt
Ch. 5 Measurement Concepts.
Assessing Intelligence
Chapter 4 Research Methods in Clinical Psychology
Lecture 5 Validity and Reliability
Product Reliability Measuring
Reliability and Validity
MEASUREMENT: RELIABILITY AND VALIDITY
Concept of Test Validity
Reliability & Validity
Test Design & Construction
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Tests and Measurements: Reliability
Journalism 614: Reliability and Validity
Reliability & Validity
Classroom Assessment Validity And Bias in Assessment.
EXPLORING PSYCHOLOGY Unit 6 – Part 2 Intelligence Ms. Markham.
Human Resource Management By Dr. Debashish Sengupta
Week 3 Class Discussion.
پرسشنامه کارگاه.
5. Reliability and Validity
Reliability and Validity of Measurement
RESEARCH METHODS Lecture 18
How can one measure intelligence?
Methodology Week 5.
Measurement Concepts and scale evaluation
Chapter 10: Intelligence & Testing
Cal State Northridge Psy 427 Andrew Ainsworth PhD
First Hour - How can one measure intelligence?
Chapter 3: How Standardized Test….
Presentation transcript:

Test Validity

Validity of measurement Reliability refers to consistency Are we getting something stable over time? Internally consistent? Validity refers to accuracy Is the measure accurate? Are we really measuring what we want?

Important distinction Important distinction! The term “validity” is used in two different ways Validity of an assessment or method of collecting data The validity of a test or questionnaire or interview Validity of a research study Was the entire study of high quality Did it have high internal and external validity

Important distinction Important distinction! The term “validity” is used in two different ways Referring to entire studies or research reports: OK: “We examined the internal validity of the study.” OK: “We looked for the threats to validity.” OK: “That study involved randomly assigning students to groups, so it had strong internal validity, but it was carried out in a special school, so it is weak on external validity.” Referring to a test or questionnaire or some assessment: OK: “The test is a widely used and well-validated measure of student achievement.” OK: “The checklist they used seemed reasonable, but they did not present any information on its reliability or validity.” NOT: “The test lacked internal validity.” (This sounds very strange to me.)

Types of validity Validity – the extent to which the instrument (test, questionnaire, etc.) is measuring what it intends to measure Examples: Math test is it covering the right content and concepts? is it also influenced by reading level or background knowledge? Attitude assessment are the questions appropriate? does it assess different dimensions of attitudes (intensity, direction, etc.) Validity is also assessed in a particular context A test may be valid in some contexts and not in others A questionnaire may be useful with some populations and not so useful with other groups Not: “The test has high validity.” OK: “The test has been useful in assessing early reading skills among native speakers of English.”

Types of validity Content validity The extent to which the items reflect a specific domain of content Is the sample of items really representative? Often a matter of judgment Experts may be asked to rate the relevance and appropriateness of the items or questions e.g., rate each item: very important / nice to know / not important “Face validity” refers to whether the items appear to be valid (to the test taker or test user)

Criterion-related validity Types of validity Criterion-related validity Concurrent validity agreement with a separate measure common in educational assessments e.g., Bayley Scales and S-B IQ test Complete version and screening test version Issue: Is there really a strong existing measure, a “gold standard” we can use for validating a new measure? Predictive validity agreement with some future measure SAT scores and college GPA GRE scores and graduate school performance

Types of validity (cont.) Construct validity Does the measure appear to produce results that are consistent with our theories about the construct? Example: We have a “stage-model” of development, so does out measure produce scores/results that look like “stages”? Convergent validity Does out measure converge or agree with other measures that should be similar? And . . . Discriminant validity Does our measure disagree (or diverge) where it should be different?

Stanford Achievement Test Example – Grade 1

Stanford Achievement Test Example – Grade 12

McCarthy Screening test example A test for pre-school children (2.5 – 8.5) Six subtests: Verbal, perceptual-performance, quantitative, general cognitive (composite), memory, motor Reliability evidence for using a short version as a screening test Split-half correlations for several scales (r = .60 to .80) Test-retest reliability for other scales (on a subset of children) showed a range of correlations, from .32 to .70.

McCarthy Scales of Children’s Abilities Reliability The internal consistency coefficients for the General Cognitive Index (GCI) averaged .93 across 10 age groups between 2.5, and 8.5 years. Test-retest reliability of GCI over a one month interval was .80. Stability coefficients of the cognitive scales ranged from .62 to .76 with the Motor Scale emerging as the only scale that lacked stability (r=.33).

A short version developed as a screening test Validity information for a short version A sample of 60 children with learning disabilities On full version of entire test 53 out of 60 (88%) failed at least 2 of the 6 subtests On the short version (the proposed screening version) 40 out of 60 (67%) failed (and would be identified) Is this enough information?