MGTO 324 Recruitment and Selections Scale and Test Construction Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University of Science & Technology
Prologue In the last lesson, I have discussed the scientific elements in testing Today, we focus on how a test can be constructed In particular, you are expected to understand the following The concepts of “item”, “scale”, and “test” Different item formats How to write a set of good items Multiple-item scaling
Outline
Outline
Part I: Basic Concepts What is a (psychological) test? Measurement device or technique To quantify behavior or aid in the understanding and prediction of behavior A set of items designed to measure characteristics of human beings that pertain to behavior
Part I: Basic Concepts What is an item? A specific stimulus to which a person responds overtly What is the English word of “邂逅” You can manage well in interpersonal relationship Overt behaviors (Scientific Standard) Observable Measurable Can be replicated 1 2 3 4 5 Strongly disagree Strongly agree
Part I: Basic Concepts What is a scale? The quantified scores obtained from a test The raw scores are related to some defined theoretical or empirical distribution The matching between the raw score and the theoretical meaning of that score E.g., 0oC = freezing point; 100oC = boiling point The same theoretical meaning could be represented by different scales Temperature: Degree Celsius vs. Degree Fahrenheit Length: Meter vs. Feet; Kilometer vs. Mile Weight: lb vs. kg Wealth: HK$ vs. US$
Part I: Basic Concepts What is a scale? Examples Thermometer HKCEE Raw scores = 2.4 cm; Degree Celsius = 100 Theoretically (empirically) = boiling point HKCEE Raw scores = 87 Grade = A Theoretically (empirically) = Excellent students IQ test Raw scores = 2400 IQ = 130 Theoretically (empirically) = Gifted individual
Part I: Basic Concepts Essential steps in scale and test construction Have a clear definition of what (i.e., the psychological construct) the test is supposed to measure E.g., Locus of control; self-efficacy Generating a set of items that at least seems to capture the construct Determining the scale format Pilot tests to assess the face validity, reliability, construct validity, and criterion validity Deleting or revising less useful items Assessing reliability and validity again Revising again Assessing again Shorten the scale Assessing again….
Outline
Part II: Item Formats
Part II: Item Formats Dichotomous formats Offers two alternatives for each item True/False; or Select a more appropriate statement “You often spend more than three hours on typing every day” Agree vs. Disagree “Generally speaking, the salary for me accurately represents my contribution to the organization” “Which job, technical or administrative, do you prefer?” Technical vs. Administrative Scores Simply count the number of items a person endorse Commonly used in both educational and personality tests
Part II: Item Formats Dichotomous formats Some famous scales used dichotomous formats Locus of control (Rotter, 1966) “the degree to which people believe they are masters of their own fates” from the OB textbook (Robin, 2003) High in externality Less satisfied with jobs; Higher absenteeism rates; More alienated from the work; Less involved in their jobs Choose one A. Many of the unhappy things in people’s lives are partly due to bad luck B. People’s misfortunes result from the mistakes they make Heredity plays the major role in determining one’s personality It is one’s experiences in life which determine what one is like
Part II: Item Formats Dichotomous formats Advantages Disadvantages Simple, easy to administer and score Absolute judgment; people should declare one of the two alternatives Disadvantages Effect of memorizing materials Requires numerous items to produce reliable results (chance = 50%)
Part II: Item Formats
Part II: Item Formats Polytomous formats Offers more than two alternatives (e.g., usually 4 to 5) One correct choice and other distractors Multiple-choice Scores Number of items correctly answered How many distractors? More distractors may not be better than less distractors (Sidick, Barrett, & Doverspike, 1994) Less influenced memory effects (relative to dichotomous format) Mainly used in educational tests
Part II: Item Formats
Part II: Item Formats Likert-type formats Usually, people are required to express their degree of dimension on a statement I can manage well in interpersonal relationship Monotone items A higher score suggests higher agreement 1 2 3 4 5 Strongly disagree disagree neutral agree Strongly agree
Part II: Item Formats Likert-type formats Number of alternatives Odd (5, 7, or 9) Even (6, or 10): avoid midpoint Popular in psychological tests It can be subject to various psychometric analyses, such as factor analysis For example, the General Self-efficacy Scale Self-efficacy “The individual’s belief that he or she is capable of performing a task” from OB textbook (Robin, 2003) Higher self-efficacy Less likely to give up under difficult situations Usually perform better than
Part II: Item Formats
Part II: Item Formats Cumulative (Guttman) formats Items on the same dimension are set up in ascending order Subject with a particular attitude will agree with all items on one side of that position and disagree with other items Example Addition, long-division, and calculus Monotone items
Part II: Item Formats Cumulative (Guttman) formats
Part II: Item Formats Cumulative (Guttman) formats
Part II: Item Formats
Part II: Item Formats Cumulative (Guttman) formats Advantages A single score carries complete information about the response patterns Calculus OK = division OK = addition OK Marriage OK = being neighbor OK When there is no random error Provides a test of the unidimensionality of what are to be tested Cumulative response pattern will not be obtained when the items do not measure only one dimension Disadvantages Problems resulted from random error Difficult to find domains that are unidimensional Less popular than Likert-type format
Outline
Part III: Writing good items Define clearly Clearly define what you want to measure Check the face and content validity Is the items out of syllabus How many psychological factor I want to measure? List them all I want to develop a test that helps me hire employees who have strong self-learning tendency What do you want to measure from the test? What is “self-learning tendency”? Give a clear definition before moving to the next step
Part III: Writing good items Clearly think about the item formats Think carefully what type of tests as well as what statistical analyses you want to use Some statistical analyses may not be appropriate for certain item formats Rank order is not appropriate for t test Polytomous responses may not be able to be analyzed by Factor Analysis Likert-type or dichotomous format seems to be a good choice as the default
Part III: Writing good items Sources of items Discourse and text From brainstorming or informal conversation Asking others the meaning of “self-learning” Use qualitative method is a more systematic method Open-ended interview or question Focus group Content analysis Newspaper Classic literature The goal is to generate a pool of items that seems to measure what we want to measure E.g., “When I have problems, I’ll seek help from books prior to people.”
Part III: Writing good items Nature of items The meanings of items should be clear, straightforward, and can be easily understood Avoid exceptionally long items Avoid reading difficulty Don’t use jargon…. Social desirability “I like self-learning” Offensiveness Pay attention to the problems of sexism and racism Avoid double-barreled items “seeking help from people is a better method than seeking help from books because people are more accessible. Reverse items “Most knowledge could not be learned without the help from others” “I can learn almost all knowledge by myself”
Part III: Writing good items Generate an item pool You first need to generate a set of item pool The numbers of items in this pool are usually much more than the final version of the scale Do a preliminary test Select useful items Discard or revise other items (you will learn these skills later in this course) You may need several rounds of revision before the scale becomes reliable and valid
Part III: Writing good items What should be included in a test apart from the basic items? Clear instruction for subjects Give examples whenever possible Questions for demographic information Age, gender, education level, etc. Declare that how the collected information will be used Is it confidential? The purpose of collecting the data; who will assess the data, etc.
Outline
Part IV: Multiple-item scaling True score theory Due to the matter of precision, measures may vary from time to time The true scores can hardly be obtained by only one measure Sometimes it may overestimate or underestimate the true score, the errors are assumed to be randomly distributed The true scores can be obtained by averaging multiple responses
Part IV: Multiple-item scaling The concepts of multiple-item scaling I want to measure one’s attitude toward whether secondary students can have romantic relationships Single item I encourage secondary students to develop romantic relationship Two items I encourage secondary students to develop romantic relationships We should respect the secondary students’ freedom of engaging in romantic relationships The issues of reliability and validity
Part IV: Multiple-item scaling Advantages multiple-item scaling It allows testing the nature of a construct A set of items may capture more than one dimension Intelligence: Memory span, verbal ability, and visual-spatial ability Items measuring same dimension or same construct should be highly inter-correlated Items measuring different dimensions are supposed to have relatively low inter-item correlation It allows us to check scale validity through Factor analysis Discuss again later Increase reliability and validity Remember the concept of true score theory Random errors may exert impacts on a particular single response Effects of random errors can be “neutralized” by measuring multiple responses