Test Validity “… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Part 4 Staffing Activities: Selection
VALIDITY AND RELIABILITY
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Managing Human Resources, 12e, by Bohlander/Snell/Sherman © 2001 South-Western/Thomson Learning 5-1 Managing Human Resources Managing Human Resources Bohlander.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Chapter 4 Validity.
Test Validity: What it is, and why we care.
VALIDITY.
Concept of Measurement
Developing a Hiring System Reliability of Measurement.
Concept of Reliability and Validity. Learning Objectives  Discuss the fundamentals of measurement  Understand the relationship between Reliability and.
Validity Does test measure what it says it does? Is the test useful? Can a test be reliable, but not valid? Can a test be valid, but not reliable?
OS 352 2/28/08 I. Exam I results next class. II. Selection A. Employment-at-will. B. Two types of discrimination. C. Defined and methods. D. Validation.
PSYCHOMETRICS RELIABILITY VALIDITY. RELIABILITY X obtained = X true – X error IDEAL DOES NOT EXIST USEFUL CONCEPTION.
PowerPoint Slides developed by Ms. Elizabeth Freeman
Part 5 Staffing Activities: Employment
Validity of Selection. Objectives Define Validity Relation between Reliability and Validity Types of Validity Strategies.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
Mgmt Staffing Prof. Howard Miller. Staffing Function  Among several human resource functions  Benefits  Compensation  Safety  Labor Relations.
6-1 McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. fundamentals of Human Resource Management 3 rd edition by.
Torrington, Hall & Taylor, Human Resource Management 6e, © Pearson Education Limited 2005 Slide 7.1 Importance of Selection The search for the perfect.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Technical Adequacy Session One Part Three.
Foundations of Recruitment and Selection I: Reliability and Validity
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
MGTO 324 Recruitment and Selections Validity II (Criterion Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Managing Human Resources, 12e, by Bohlander/Snell/Sherman © 2001 South-Western/Thomson Learning 5-1 Managing Human Resources Managing Human Resources Bohlander.
Selection 1- Measurement 2- External. Organization Strategy HR and Staffing Strategy Staffing Policies and Programs Staffing System and Retention Management.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Chapter Seven Measurement and Decision-Making Issues in Selection.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
Tests and Measurements Intersession 2006.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Part 5 Staffing Activities: Employment
Measurement Validity.
~ Test Construction and Validation ~
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
“… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation process begins.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Measurement MANA 4328 Dr. Jeanne Michalski
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
CHAPTER 6 Selecting Employees and Placing Them in Jobs
Chapter 6 - Standardized Measurement and Assessment
WEEK 5 Staffing Activities: Selection Chapter 7: Measurement.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
6 Selecting Employees and Placing Them in Jobs
VALIDITY by Barli Tambunan/
Reliability and Validity in Research
Evaluation of measuring tools: validity
Week 3 Class Discussion.
Reliability and Validity of Measurement
5 6 Selecting Employees C H A P T E R Training Employees
Understanding Statistical Inferences
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Test Validity “… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation process begins with the formulation of detailed trait or construct definitions … Test items are then prepared to fit the construct definitions. Empirical item analyses follow … Other appropriate internal analyses may then be carried out … The final stage includes validation and cross-validation of various scores and interpretive combinations of scores through statistical analyses against external, real-life criteria.” (Anastasi, 1986, p.3) Almost any information gathered in the process of developing or using a test is relevant to its validity … If we think of test validity in terms of understanding what a particular test measures, it should be apparent that virtually any empirical data obtained with the test represent a potential source of validity information.” (Anastasi, 1986, p.3)

Test Validation Process Define Objectives State Inferences Decide on Methods to Test Inferences Collect Evidence

Types of Validity Content Validity [the extent to which test items represent a domain] a) Subject Matter Expert Opinions (e.g., CVR statistic) b) Internal consistency reliability c) Correlation with other similar tests Content relevance Domain specification Content coverage Domain representativeness

Steps in a Content Validation Effort 1) Perform a job analysis Description of job tasks Rating of job tasks on various criteria Specification of KSAs Rating of KSAs on various criteria Link/connect tasks to KSAs From SIOP Principles: “The characterization of the work domain should be based on accurate and thorough information about the work including analysis of work behaviors and activities, responsibilities of the job incumbents, and/or the KSAOs prerequisite to effective to effective performance on the job. The researcher should indicate what important work behaviors , activities, and worker KSAOs are included in the domain, describe how the content of the domain is linked to the selection procedure, and explain why certain parts of the domain were or were not included in the selection procedure.” (p. 22) 2) Selection of SMEs From SIOP Principles: “ The success of the content-based study is closely related to the qualifications of the subject matter experts (SMEs) … The experts should have thorough knowledge of the work behaviors and activities, responsibilities of job incumbents, and the KSAOs prerequisite to effective to effective performance on the job” (p. 22) 3) Writing (or selecting) and evaluation of selection measure content (test items)

TASK -- KSA MATRIX To what extent is each KSA needed when performing each job task? 5 = Extremely necessary, the job task cannot be performed without the KSA 4 = Very necessary, the KSA is very helpful when performing the job task 3 = Moderately necessary, the KSA is moderately helpful when performing the job task 2 = Slightly necessary, the KSA is slightly helpful when performing the job task 1 = Not necessary, the KSA is not used when performing the job task KSA A B C D E F G H I J K L M N O P Q R Job Tasks 1 2 3 4 5 6 7 8 9 10 11 12 13

ITEM RATING FORM Job classification under review ____________________________________________ item # KSA B C 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52

Content Validity Issues Are the job activities and requirements stable across time? Does successful performance on the test require the same KSAs as successful performance on the job? Is the type (or mode) of testing procedure the same as that required on the job? Do some KSAs not required on the job exist on the test? Not useful when abstract constructs are being measured (a small inferential leap is required between the test content and job requirements) From Anastasi (1986): “When tests are designed for use within special contexts, the relevant constructs are usually derived from content analysis of particular behavior domains” (p. 7). From SIOP Principles: “ When selection procedure content is linked to job content, content-oriented strategies are useful. When selection procedure content is less clearly linked to job content, other sources of validity evidence take precedence” (p. 23).

Types of Validity (cont.) Criterion-related Validity Predictive [Correlation between test scores of applicants and their performance scores when some time interval has passed after they are hired] Range restriction issue on performance scores Time, cost, & pragmatic concerns Concurrent [Correlation between test scores and performance scores of current employees] Motivation level Guessing, Faking Job experience factor Range restriction issue on performance scores

Criterion-related Validity Issues Job Stability Reliable and relevant measure of job performance From SIOP Principles: “A relevant, reliable, and uncontaminated criterion(s) must be obtained or developed. Of these characteristics, the most important is relevance. A relevant criterion is one that reflects the relative standing of employees with respect to important work behavior(s) or outcome measure(s). If such a criterion measure does not exist or cannot be developed, use of a criterion-related validation strategy is not feasible (p. 14). Use of a representative sample of people and jobs D) Large sample (on predictor and criterion) From SIOP Principles: “A competent criterion-related validity study should be based on a sample that is reasonably representative of the work and candidate pool … A number of factors related to statistical power can influence the feasibility of a criterion-related study. Among these factors are the degree (and type) of range restriction in the predictor or the criterion, reliability of the criterion, and statistical power (p. 14)

Legal Issues and Criterion-related Validity Court focus on the content of measures as opposed to criterion validity evidence (relationship between test cores and job performance) Emphasis on the legal history of tests Criterion-validity emphasis versus concurrent validity designs Statistical significant relationships are not always acceptable (consideration of other factors such as test utility)

Factors Affecting the Validity Coefficient [correlation between a test and job performance] Reliability of both the criterion (job performance) and the predictor (test) Restriction of range (on both the test and job performance measure) Contamination of the criterion (e.g., measure of job performance is affected by other variables rather than one’s ability or knowledge) y = standard deviation of y (criterion) r2xy = correlation between x and y squared Standard error of estimate (validity coefficient): y’ = y 1 - r2xy

Correction for Attenuation Observed validity coefficient T =    x y xy yy Criterion reliability Validity coefficient  of unrestricted sample S 1  S 2  of restricted sample S 1  = 1 2  = 1 - (1 -  )   S 2 2 S 1 2 2 1 1 -  +  2 S 1 Range of Restriction (Predictor) Range Restriction (Criterion)

Test Utility Key Points # Job openings Selection Ratio (SR) = n N # Applicants Test Validity [Criterion-related]: The extent to which test scores correlate with job performance scores [Range is from 0 to 1.0]

and Given Selection Ratio, for Base Rate .60. Proportion of “Successes” Expected Through the Use of Test of Given Validity and Given Selection Ratio, for Base Rate .60. (From Taylor & Russell, 1939, p. 576) Selection Ratio Validity .05 .10 .20 .30 .40 .50 .60 .70 .80 .90 .95 .00 .60 .60 .60 .60 .60 .60 .60 .60 .60 .60 .60 .05 .64 .63 .63 .62 .62 .62 .61 .61 .61 .60 .60 .10 .68 .67 .65 .64 .64 .63 .63 .62 .61 .61 .60 .15 .71 .70 .68 .67 .66 .65 .64 .63 .62 .61 .60 .20 .75 .73 .71 .69 .67 .66 .65 .64 .63 .62 .61 .25 .78 .76 .73 .71 .69 .68 .66 .65 .63 .62 .61 .30 .82 .79 .76 .73 .71 .69 .68 .66 .64 .62 .61 .35 .85 .82 .78 .75 .73 .71 .69 .67 .65 .63 .62 .40 .88 .85 .81 .78 .75 .73 .70 .68 .66 .63 .62 .45 .90 .87 .83 .80 .77 .74 .72 .69 .66 .64 .62 .50 .93 .90 .86 .82 .79 .76 .73 .70 .67 .64 .62 .55 .95 .92 .88 .84 .81 .78 .75 .71 .68 .64 .62 .60 .96 .94 .90 .87 .83 .80 .76 .73 .69 .65 .63 .65 .98 .96 .92 .89 .85 .82 .78 .74 .70 .65 .63 .70 .99 .97 .94 .91 .87 .84 .80 .75 .71 .66 .63 .75 .99 .99 .96 .93 .90 .86 .81 .77 .71 .66 .63 .80 1.00 .99 .98 .95 .92 .88 .83 .78 .72 .66 .63 .85 1.00 1.00 .99 .97 .95 .91 .86 .80 .73 .66 .63 .90 1.00 1.00 1.00 .99 .97 .94 .88 .82 .74 .67 .63 .95 1.00 1.00 1.00 1.00 .99 .97 .92 .84 .75 .67 .63 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .86 .75 .67 .63 Note: A full set of tables can be found I Taylor and Russell (1939) and in McCormick and Ilgen (1980, Appendix B).

Selection Ratio Example (cont.) Mean Standard Criterion Score of Accepted Cases in Relation to Test Validity and Selection Ratio (From Brown & Ghiselli, 1953, p. 342) Validity Coefficient Selection Ratio .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 .05 .00 .10 .21 .31 .42 .52 .62 .73 .83 .94 1.04 1.14 1.25 1.35 1.46 1.56 1.66 1.77 1.87 1.98 2.08 .10 .00 .09 .18 .26 .35 .44 .53 .62 .70 .79 .88 .97 1.05 1.14 1.23 1.32 1.41 1.49 1.58 1.67 1.76 .15 .00 .08 .15 .23 .31 .39 .46 .54 .62 .70 .77 .85 .93 1.01 1.08 1.16 1.24 1.32 1.39 1.47 1.55 .20 .00 .07 .14 .21 .28 .35 .42 .49 .56 .63 .70 .77 .84 .91 .98 1.05 1.12 1.19 1.26 1.33 1.40 .25 .00 .06 .13 .19 .25 .32 .38 .44 .51 .57 .63 .70 .76 .82 .89 .95 1.01 1.08 1.14 1.20 1.27 .30 .00 .06 .12 .17 .23 .29 .35 .40 .46 .52 .58 .64 .69 .75 .81 .87 .92 .98 1.04 1.10 1.16 .35 .00 .05 .11 .16 .21 .26 .32 .37 .42 .48 .53 .58 .63 .69 .74 .79 .84 .90 .95 1.00 1.06 .40 .00 .05 .10 .15 .19 .24 .29 .34 .39 .44 .48 .53 .58 .63 .68 .73 .77 .82 .87 .92 .97 .45 .00 .04 .09 .13 .18 .22 .26 .31 .35 .40 .44 .48 .53 .57 .62 .66 .70 .75 .79 .84 .88 .50 .00 .04 .08 .12 .16 .20 .24 .28 .32 .36 .40 .44 .48 .52 .56 .60 .64 .68 .72 .76 .80 .50 .00 .04 .07 .11 .14 .18 .22 .25 .29 .32 .36 .40 .43 .47 .50 .54 .58 .61 .65 .68 .72 .60 .00 .03 .06 .10 .13 .16 .19 .23 .26 .29 .32 .35 .39 .42 .45 .48 .52 .55 .58 .61 .64 .65 .00 .03 .06 .09 .11 .14 .17 .20 .23 .26 .28 .31 .34 .37 .40 .43 .46 .48 .51 .54 .57 .70 .00 .02 .05 .07 .10 .12 .15 .17 .20 .22 .25 .27 .30 .32 .35 .37 .40 .42 .45 .47 .50 .75 .00 .02 .04 .06 .08 .11 .13 .15 .17 .19 .21 .23 .25 .27 .30 .32 .33 .36 .38 .40 .42 .80 .00 .02 .04 .05 .07 .09 .11 .12 .14 .16 .18 .19 .21 .22 .25 .26 .28 .30 .32 .33 .35 .85 .00 .01 .03 .04 .05 .07 .08 .10 .11 .12 .14 .15 .16 .18 .19 .20 .22 .23 .25 .26 .27 .90 .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 .10 .11 .12 .13 .14 .15 .16 .17 .18 .19 .20 .95 .00 .01 .01 .02 .02 .03 .03 .04 .04 .05 .05 .06 .07 .07 .08 .08 .09 .09 .10 .10 .11

Ns rxy SDyZx – NT (C) Example of Brogden and Cronbach & Gleser Models cost of assessing each applicant number of applicants assessed validity coefficient # of applicants selected average score on the selection procedure of those selected (standard score) standard deviation of job performance in dollar terms

Construct Validity Multitrait-Multimethod Matrix (Campbell & Fiske, 1959)

[extent to which a test assesses the construct it intends to measure] Types of Validity (cont.) Construct Validity [extent to which a test assesses the construct it intends to measure] Correlation between scores measuring a construct (e.g., anxiety) with one method (e.g., paper & pencil) with scores on the same construct using a different method (e.g., interview) [Convergent validation] Correlation between scores measuring a construct (e.g., anxiety) using one method (e.g., paper & pencil) with scores on a different construct (e.g., leadership) assessed with a different method (e.g., interview) [Discriminant validation] “Construct validation is indeed a never-ending process. However, that should not preclude using the test operationally to help solve practical problems and reach real-life decisions as soon as the available validity information has reached an acceptable level for a particular application. This level varies with the type of test and the way it will be used. Establishing this level requires informed professional judgment within the appropriate specialty of professional practice.” (Anastasi, p.4)

Equal validity, unequal criterion means Satisfactory Non minority Performance Criterion Minority Unsatisfactory Reject Accept Predictor Score Equal validity, unequal criterion means Equal test scores; Minorities performing less well on job (over predicting performance) Minorities hired same as non minorities but probability of success is small. Can reinforce existing stereotypes.

Equal validity, unequal predictor means Intercept Bias (Test) Satisfactory Minority Performance Criterion Non minority Unsatisfactory Reject Accept Predictor Score Equal validity, unequal predictor means Job performance is equal Test scores are greater for non-minorities

Equal predictor means, but validity only for non minority groups Satisfactory Minority Non minority Performance Criterion Unsatisfactory Accept Reject Predictor score Equal predictor means, but validity only for non minority groups Equal test scores and criterion scores No validity for minorities (only should be used for non minorities) No adverse impact same numbers hired in each group However, more non-minorities will succeed on jobs; can reinforced stereotypes

Situational specificity or Generalizibility of test validity across samples? Fluctuations in validity coefficients may often be due to: Small sample sizes (e.g., many have samples of 50 or less employees) Unreliable criterion measures Restriction of range in employee samples Some evidence that certain tests (e.g., aptitude tests) may can be generalized across a variety of occupations

Statistical Power and Hypothesis Testing Reality Significance exists Type II error (“false positive”) Correct decision (reject null) Correct decision (accept null) No significance exists Type I error (“false positive”) No significant effect found Significant effect found Findings of study “Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.” (Fisher, 1935, p.19)