Part II Knowing How to Assess Chapter 5 Minimizing Error

Slides:



Advertisements
Similar presentations
Chapter Eight & Chapter Nine
Advertisements

Topics: Quality of Measurements
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Research Methodology Lecture No : 11 (Goodness Of Measures)
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
RELIABILITY consistency or reproducibility of a test score (or measurement)
Concept of Reliability and Validity. Learning Objectives  Discuss the fundamentals of measurement  Understand the relationship between Reliability and.
MGTO 231 Human Resources Management Personnel selection I Dr. Kin Fai Ellick WONG.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment A Practical Guide for Educators by Craig A
Rosnow, Beginning Behavioral Research, 5/e. Copyright 2005 by Prentice Hall Ch. 6: Reliability and Validity in Measurement and Research.
Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University.
Reliability and Validity what is measured and how well.
Instrumentation.
Foundations of Educational Measurement
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Tests and Measurements Intersession 2006.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Chapter 6 - Standardized Measurement and Assessment
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
MGMT 588 Research Methods for Business Studies
Reliability and Validity
Lecture 5 Validity and Reliability
Catching Up: Review.
Reliability and Validity
Questions What are the sources of error in measurement?
Concept of Test Validity
Assessment Theory and Models Part II
Evaluation of measuring tools: validity
Understanding Results
Validity and Reliability
Reliability & Validity
Introduction to Measurement
Human Resource Management By Dr. Debashish Sengupta
Week 3 Class Discussion.
پرسشنامه کارگاه.
Reliability and Validity of Measurement
PSY 614 Instructor: Emily Bullock, Ph.D.
Unit IX: Validity and Reliability in nursing research
Evaluation of measuring tools: reliability
RESEARCH METHODS Lecture 18
EPSY 5245 EPSY 5245 Michael C. Rodriguez
The first test of validity
How can one measure intelligence?
Psy 425 Tests & Measurements
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Part II Knowing How to Assess Chapter 5 Minimizing Error Review of Appl 644 Personnel Psychology Measurement Theory Reliability Validity Assessment is broader term than Measurement What does this mean? chapter 5 Minimizing Error Measurement = precision, a defined scale Assessment = other assessment is often ad hoc, for specific purposes

Background Queletet (1835) Over time measurement Established the normal distribution Used by: Galton (measurement of genius) Binet et al. (cognitive ability, i.e.”IQ”) Munsterberg (employment testing) J. M. Cattell (perceptual and sensory tests) Over time measurement Focus changed from reliability to validity chapter 5 Minimizing Error

Background in Measurement Adolphe Quetelet – (1835) conception of the homme moyen (“average man”) as the central value about which measurements of a human trait are grouped according to the normal distribution. Physical and mental attributes are normally distributed Errors of measurement are normally distributed Foundation for psychological measurement chapter 5 Minimizing Error

RELIABILITY CONCEPTS OF MEASUREMENT ERROR Measurement Error and Error variance (error happens!) Table 5.1 Reasons for differences in performance I Person characteristics –long term, permanent Influence scores on all tests, e.g. language, skills II Person characteristics specific to test permanent E.g type of words on test more/less recognizable to some III temporary characteristics that Influence scores on any test (e.g. evaluation apprehension) IV Temporary and specific to the test E.g. stumped by a word, e.g. V Administration effects E.g. interaction administrator and examinee VI pure chance Which is the one of primary interest (non-error variance)? chapter 5 Minimizing Error

Measure Error and Error Variance (con’t) Category II A – of most interest Others reflect unwanted sources of variance Classical theory: X = t + e Assumptions: (errors are truly random) Obtained score = algebraic sum of t + e Not correlated: t scores and e scores (in one test not correlated) errors in different measures (not correlated) errors in one measure with true scores in another (not correlated) chapter 5 Minimizing Error

Measurement Error Xob = s + e (one individual’s score) Why was t replaced with s? total variance (all of an individual’s scores) σx2 = σs2 + σe2 (all people’s scores – population) systematic causes + random error systematic = wanted systematic var (true ability) + unwanted systematic (error) (test-wise) e.g., prefer type of format (systematic error) e.g. sleepy every time he takes the test chapter 5 Minimizing Error

Reliability Is…Consistency – in sets of measures Free from random error variance Measurement error = random sources of var (not unwanted systematic error) Reliability = 1 – (random sources of var/total variance ) rxx = 1 – ( σ 2 e / σ 2 x ) Reliability also defined also as: “proportion of total var attributable to systematic sources (wanted and unwanted) chapter 5 Minimizing Error

Reliability (con’t) A necessary (but not sufficient) condition for validity (reliability places a ceiling on validity) Theoretical relationship between reliability and validity: rxoo (test) ryoo (crit)= rxy /√(rxx * ryy) Why would you want to know what the validity coefficient would be if it were corrected for attenuation in both the test and criterion? chapter 5 Minimizing Error

Reliability and Validity Correction for unreliability (attenuation) in the criterion (why not for the test as well?) Obtained validity coefficient rxyoo = rxy / √ ryy (for the criterion only) Assume: Obtained validity coefficient = .40 Reliability of criterion is .25 What is the estimated validity coefficient corrected for attenuation in the criterion? What is the coefficient of determination? chapter 5 Minimizing Error

Accuracy / Reliability/ Validity Accuracy is ≠ reliability An inaccurate thermometer may be consistent (reliable) Accuracy is ≠ validity An inaccurate thermometer may show validity (high correlations with Bureau of standards instrument But is inaccurate (consistently lower for each paired observation), i.e. not accurate (Figures 5.1; 5.2) Why is the concept of “accuracy” meaningless for psychological constructs? chapter 5 Minimizing Error

RELIABILITY ESTIMATION Coefficients of Stability Over time (test-retest) Coefficients of Equivalence Equivalent forms (e.g. A and B) (errors in sampling test content domain) Coefficients of Internal Consistency Kuder-Richardson Estimates (assumes homogeneity) K-R 20 (preferred) Cronbach’s alpha α (general version of K-R 20) Where is this in SPSS? chapter 5 Minimizing Error

Reliability Estimation (con’t) Inter-rater Agreement v. reliability ICC Rwg % agreement (Kappa) See Rosenthal & Rosnow table (hand out) Comparisons Among Reliability Estimates Systematic variance must be stable characteristics of examinee what is measured Use estimates that make sense for the purpose, For re-testing an applicant due to admin mistake what’s most appropriate? For production over a long period? An e.g. of a job requiring stability of an attribute? chapter 5 Minimizing Error

RELIABILITY ESTIMATION Standard Error of Measurement: se = sx 1−rxx Where (in test score units) se = SEM; sx = SD of test; rxx = reliability of test SEM decreases as reliability increases Three purposes: to determine if Two individuals’ scores really differ Individual’s obtained score differs from true score Scores discriminate differently in different groups Do group scores from different geographical areas matter? Why? Give an example chapter 5 Minimizing Error

INTERPRETATIONS OF RELAIBILITY COEFFICIENTS Important to remember: Size of coefficient needed depends upon: The purpose for which it is used The history of the type of measure what would be acceptable for a GMA test? for a panel interview? Length of test (how many items are needed?) chapter 5 Minimizing Error

VALIDITY: AN EVOLVING CONCEPT Why is it important for I/O to distinguish between A Test “… purports to measure something” validity: “the degree to which it predicts something else” (i.e. criterion) (making inferences) Three Troublesome Adjectives Content, criterion related, construct (“kinds v. “aspects”) Meaning (interpretation) v. inferences about a person “Unitarian” view that they are all aspects of Construct validity Descriptive and Relational Inferences Descriptive inferences (about the score itself) High IQ means the person is smart (trait) Relational inferences (about what can be predicted) High scorer will perform on the job (sign) chapter 5 Minimizing Error

Psychometric Validity v. Job Relatedness Confirm the meaning of the test intended by the test developer Job-relatedness (validity) – in Personnel assessment Examples? How does psychometric validity differ from Job- relatedness ? chapter 5 Minimizing Error

VARIETIES OF PSYCHOMETRIC VALIDITY EVIDENCE Evidence Based on Test Development Provide evidence for a test you plan to use questions to guide evaluation: answer them for your job Did the developer have a clear idea of the attribute? Are the mechanics of the measurement consistent with the concepts? Is the stimulus content appropriate? What the test carefully and skillfully developed? Evidence Based on Reliability - questions to guide evaluation: answer them for your job Is the internal statistical evidence satisfactory? Are scores stable over time and consistent with alternative measures? chapter 5 Minimizing Error Example: small scaled model of a steam engine (for apprentice to become journeyman)

Evidence from Patterns of Correlates Questions for evaluation: Confirmatory and dis-confirmatory Questions for evaluation: Answer them for a test you will use Does empirical evidence confirm logically expected relations with other variables? Does empirical evidence disconfirm alternative meanings of test scores? Are the consequences of the test consistent with the meaning of the construct being measured? chapter 5 Minimizing Error Compare FFM to EQ

Beyond Classical Test Theory Factor Analysis (identify latent variables in a set of scores) EFA (Exploratory) CFA (Confirmatory) Which would be most likely to be used to develop a test? chapter 5 Minimizing Error

GENERALIZABILITY THEORY Can the validity of the test be generalized to: other times? Other circumstances? Other behavior samples? Other test forms? Other raters/ interviewers? Other geographical populations? Give an example of where a test will not perform the same for applicants in different geographical locations chapter 5 Minimizing Error

ITEM RESPONSE THEORY Classical test: IRT A person’s score on a test relates to others IRT A person’s score on a test reflects standing on the latent variable (i.e. “sample free”) Computerized adaptive testing with IRT Analysis of Bias with Adverse Impact Differential item functioning to assess adverse impact chapter 5 Minimizing Error