Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.

Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization –Reliability –Validity Standardization & Inter-rater Reliabiligy

Psychometrics (Psychological Measurement) The process of assigning a value to represent the amount or kind of a specific attribute of an individual. “Individuals” can be participants, collectives, stimuli, or processes We do not “measure individuals” We measure specific attributes of an individual E.g., Each participant in the Heptagonal Condition was presented with a 2 inch wide polygon to view for 10 seconds. Then this polygon and four similar ones were presented and the participant’s reaction time to identify the polygon presented previously was recorded. We will focus on measuring attributes of persons in this introduction!

Psychometrics is the “centerpiece” of scientific empirical psychological research & practice. All psychological data result from some form of “measurement” “Behaviors” are collected by observation, self-report or behavioral traces. Measurement is the process of turning those “behaviors” into “data” for analysis For those data to be useful we need “Measurement Validity” The better the measurement, the better the data, the more accurate and the more useful are the conclusions of the data analysis for the intended psychological research or application Without Measurement Validity, there can’t be Internal Validity, External Validity, or Statistical Conclusion Validity!

Most of what we try to measure in Psychology are constructs They’re called this because most of what we care about as psychologists are not physical measurements, such as height, weight, pressure & velocity… …rather the “stuff of psychology”  learning, motivation, anxiety, social skills, depression, wellness, etc. are things that “don’t really exist”. Rather, they are attributes and characteristics that we’ve constructed to give organization and structure to behavior. Essentially all of the things we psychologists research, both as causes and effects, are Attributive Hypotheses with different levels of support and acceptance!!!!

Measurement of constructs is more difficult than measurement of physical properties! We can’t just walk up to someone with a scale, ruler, graduated cylinder or velocimeter and measure how depressed they are. We have to figure out some way to turn observations of their behavior, self-reports or traces of their behavior into variables that give values for the constructs we want to measure. So, measurement is, just like the rest what we’ve learned about so far in this course, all about representation !!! Measurement Validity is the extent to which the data (variable values) we have represent the behaviors (constructs) we want to study.

What are the different types of constructs we measure from persons ??? The most commonly discussed types are... Demographics – population/subpopulation identifiers e.g., age, gender, race/ethnic, history variables Ability/Skill – “performance” broadly defined e.g., scholastic skills, job-related skills, research DVs, etc. Attitude/Opinion – “how things are or should be” e.g., polls, product evaluations, etc. Personality – “characterological & contextual attributes of an individual” e.g., anxiety, psychoses, assertiveness, extroversion, etc.

However, it is difficult to categorize many of the things we Psychologists measure.. Diagnostic Category achievement – limits of what can be learned/expressed &/or personality – private & social expressions &/or attitude/opinion – beliefs & feelings Social Skills achievement – something that has been learned ? &/or personality – how we get along socially is part of “who we are” ? Intelligence innate (biological) preparedness for learning &/or achievement – earlier learning = more intelligence Aptitude achievement – know things necessary to learn other things &/or specific capacity – the ability to learn certain skills

Each separate “thing” we measure is called an “item” e.g., a question, a problem, a page, a trial, etc. Collections of items are called many things… e.g., survey, questionnaire, instrument, measure, test, or scale Three “kinds” of item collections you should know.. Scale (Test) - all items are “put together” to get a single score Subscale (Subtest) – item sets “put together” to get multiple separate scores Surveys – each item gives a specific piece of information Most “questionnaires,” “surveys” or “interviews” are a combination of all three.

Kinds of items #1  objective items vs. subject items “objective” does not mean “true” “real” or “accurate” “subjective” does not mean “made up” or “inaccurate” Defined by “how the observer/interviewer/coder transforms participant’s responses into data” There are “skads” of ways of classifying or categorizing items, here are three ways that I want you to be familiar with … Objective Items - no evaluation or decision is needed either “response = data” or a “mathematical transformation” e.g., multiple choice, T&F, matching, fill-in-the-blanks (strict) Subjective Items – response must be evaluated and a decision or judgment made what should be the data value content coding, diagnostic systems, behavioral taxonomies e.g., essays, interview answers, drawings, facial expressions

Bit more about objective vs. subjective… Seems simple … the objective measure IS the behavior of interest … e.g., # impolite statements, GPA, hourly sales, # publications problems? Objective doesn’t mean “representative” … Seems harder … subjective rating of behavior IS the behavior of interest … e.g., friend’s eval, advisor’s eval, manager’s eval, Chair’s eval problems? Good subjective measures are “hard work,” but … Hardest & most common … construct of interest isn’t a specific behavior … e.g., social skills, preparation for the professorate, sales skill, contribution to the department problems ? What is construct & how represent it ???

Kind #2  Judgments, Sentiments & Scored Sentiments Judgments  do have a correct answer (e.g., 2 + 2 = 4) the “behavior,” “response” or “trace” must be scored (compared it to the correct answer) to produce the variable/data scoring may be objective or subjective, depending on item Sentiments  do not have a correct answer (e.g., Like Psyc350?) or have a correct answer, but “we won’t check (e.g., age) the “behavior,” “response” or “trace” is the variable/data scoring may be objective or subjective, depending on item Scored Sentiments  do not have a correct answer but do have an “indicative answer” (e.g., Do you prefer to be alone?) “behavior,” “response” or “trace” must be scored (compared it to the indicative answer) to produce the variable/data scoring may be objective or subjective, depending on item

Using Judgments, Sentiments & Scored Sentiments Judgments  do have a correct answer Ability/skill Intelligence Diagnostic category Aptitude Sentiments  do not have a correct answer or have a correct answer, but “we won’t check” Demographics Attitude/Opinion Scored Sentiments  do not have a correct answer but do have an “indicative answer” Personality Diagnostic category Aptitude

Kind #3  Direct Keying vs. Reverse Keying We want the respondents to carefully read and respond to each item of our scale/test. One thing we do is to write the items so that some of them are “backwards” or “reversed” … Consider these items from a depression measure… 1. It is tough to get out of bed some mornings. disagree 1 2 3 4 5 agree 2. I’m generally happy about my life. 1 2 3 4 5 3. I sometimes just want to sit and cry. 1 2 3 4 5 4. Most of the time I have a smile on my face. 1 2 3 4 5 If the person is “depressed”, we would expect then to give a fairly high rating for questions 1 & 3, but a low rating on 2 & 4. Before aggregating these items into a composite scale or test score, we would direct key (1=1, 2=2, 3=3, 4=4, 5=5) and reverse key items 2 & 4 (1=5, 2=4, 4=2, 5=1)

Desirable Properties of Psychological Measures Interpretability of Individual and Group Scores Population Norms Validity Reliability Standardization

Desirable Properties of Psychological Measures Standardization Administration & Scoring Reliability Inter-rater, Internal Consistency, Test-Retest & Alternate Forms Validity Face, Content, Criterioin-Related, Construct Population Norms Scoring Distribution & Cutoffs Interpretability of Individual & Group Scores

Standardization Administration – test is “given” the same way every time who administers the instrument specific instructions, order of items, timing, etc. Varies greatly - multiple-choice classroom test  hand it out - MMPI  hand it out - WAIS  whole books & courses Scoring – test is “scored” the same way every time who scores the instrument correct, “partial” and incorrect answers, points awarded, etc. Varies greatly - multiple choice test  fill in the bubble sheet - MMPI  whole books & courses - WAIS  whole books & courses

We need to assess the inter-rater reliability of the scores from “subjective” items. Have two or more raters score the same set of tests (usually 25-50% of the tests) Assess the consistency of the scores different ways for different types of items Quantitative Items correlation, intraclass correlation, RMSD Ordered & Categorical Items % agreement, Cohen’s Kappa Keep in mind  what we really want is “rater validity” we don’t really want raters to agree, we want then to be right! so it is best to compare raters with a “standard” rather than just with each other

Ways to improve inter-rater reliability… improved standardization of the measurement instrument do questions focus respondent’s answers? will “single sentence” or or other response limitations help? instruction in the elements of the standardization is complete explication possible? (borders on “objective”) if not, need “conceptual matches” practice with the instrument -- with feedback “walk-through” with experienced coders practice with “common problems” or “historical challenges” experience with the instrument really no substitute have to worry about “drift” & “generational reinterpretation” use of the instrument to the intended population different populations can have different response tendencies

Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.

Similar presentations

Presentation on theme: "Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.

Similar presentations

Presentation on theme: "Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization."— Presentation transcript:

Similar presentations

About project

Feedback