Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Research Methodology Lecture No : 11 (Goodness Of Measures)
Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Measurement in Psychology: Validity Lawrence R. Gordon Psychology Research Methods I.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Test Validity: What it is, and why we care.
Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Intelligence A.P. Psych Information adapted from:
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Reliability, Validity, & Scaling
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Principles of Test Construction
Validity & Practicality
Lecture 6: Reliability and validity of scales (cont) 1. In relation to scales, define the following terms: - Content validity - Criterion validity (concurrent.
MGTO 231 Human Resources Management Personnel selection II Dr. Kin Fai Ellick WONG.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Measurement Validity.
Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Assessment: Reliability & Validity. Reliability Refers to the repeatability of a given testing instrument The extent to which a student would be expected.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Chapter 6 - Standardized Measurement and Assessment
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Chapter 11 Intelligence “Just Think Mr. Thompson”.
Assessing Intelligence. Test Construction Standardization – defining the meaning of scores by comparing to a pretested “standard group”. Reliability –
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Measurement and Scaling Concepts
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
Chapter 9 Intelligence. Objectives 9.1 The Nature of Intelligence Define intelligence from an adaptation perspective. Compare and contrast theories of.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Survey Methodology Reliability and Validity
Ch. 5 Measurement Concepts.
Assessing Intelligence
Lecture 5 Validity and Reliability
Reliability and Validity
Test Validity.
Journalism 614: Reliability and Validity
پرسشنامه کارگاه.
5. Reliability and Validity
Reliability and Validity of Measurement
RESEARCH METHODS Lecture 18
Measurement Concepts and scale evaluation
Presentation transcript:

Test Validity S-005

Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity refers to accuracy –Is the measure accurate? –Are we really measuring what we want?

Important distinction! The term “validity” is used in two different ways 1.Validity of an assessment or method of collecting data The validity of a test or questionnaire or interview 2.Validity of a research study Was the entire study of high quality Did it have high internal and external validity

Important distinction! The term “validity” is used in two different ways 1.Referring to entire studies or research reports: –OK:“We examined the internal validity of the study.” –OK:“We looked for the threats to validity.” –OK:“That study involved randomly assigning students to groups, so it had strong internal validity, but it was carried out in a special school, so it is weak on external validity.” 2.Referring to a test or questionnaire or some assessment: –OK:“The test is a widely used and well-validated measure of student achievement.” –OK:“The checklist they used seemed reasonable, but they did not present any information on its reliability or validity.” –NOT: “The test lacked internal validity.” (This sounds very strange to me.)

Types of validity Validity – the extent to which the instrument (test, questionnaire, etc.) is measuring what it intends to measure –Examples:  Math test  is it covering the right content and concepts?  is it also influenced by reading level or background knowledge?  Attitude assessment  are the questions appropriate?  does it assess different dimensions of attitudes (intensity, direction, etc.) Validity is also assessed in a particular context –A test may be valid in some contexts and not in others –A questionnaire may be useful with some populations and not so useful with other groups –Not: “The test has high validity.” –OK: “The test has been useful in assessing early reading skills among native speakers of English.”

Types of validity Content validity –The extent to which the items reflect a specific domain of content  Is the sample of items really representative? –Often a matter of judgment –Experts may be asked to rate the relevance and appropriateness of the items or questions  e.g., rate each item: very important / nice to know / not important –“Face validity” refers to whether the items appear to be valid (to the test taker or test user)

Types of validity Criterion-related validity Concurrent validity –agreement with a separate measure –common in educational assessments  e.g., Bayley Scales and S-B IQ test  Complete version and screening test version –Issue: Is there really a strong existing measure, a “gold standard” we can use for validating a new measure? Predictive validity –agreement with some future measure –SAT scores and college GPA –GRE scores and graduate school performance

Types of validity (cont.) Construct validity Does the measure appear to produce results that are consistent with our theories about the construct? –Example: We have a “stage-model” of development, so does out measure produce scores/results that look like “stages”? Convergent validity –Does out measure converge or agree with other measures that should be similar? And... Discriminant validity –Does our measure disagree (or diverge) where it should be different?

Stanford Achievement Test Example – Grade 1

Stanford Achievement Test Example – Grade 12

McCarthy Screening test example A test for pre-school children (2.5 – 8.5) Six subtests: –Verbal, perceptual-performance, quantitative, general cognitive (composite), memory, motor Reliability evidence for using a short version as a screening test –Split-half correlations for several scales (r =.60 to.80) –Test-retest reliability for other scales (on a subset of children) showed a range of correlations, from.32 to.70.

McCarthy Scales of Children’s Abilities Reliability The internal consistency coefficients for the General Cognitive Index (GCI) averaged.93 across 10 age groups between 2.5, and 8.5 years. Test-retest reliability of GCI over a one month interval was.80. Stability coefficients of the cognitive scales ranged from.62 to.76 with the Motor Scale emerging as the only scale that lacked stability (r=.33).

A short version developed as a screening test Validity information for a short version A sample of 60 children with learning disabilities On full version of entire test –53 out of 60 (88%) failed at least 2 of the 6 subtests On the short version (the proposed screening version) –40 out of 60 (67%) failed (and would be identified) Is this enough information?