Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization.

Slides:



Advertisements
Similar presentations
The meaning of Reliability and Validity in psychological research
Advertisements

Test Development.
Standardized Scales.
Conceptualization and Measurement
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Measurement Reliability and Validity
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Face, Content & Construct Validity
Measurement  The process whereby individual instances within a defined population are scored on an attribute according to rules Usually given a numeric.
Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.
Data and the Nature of Measurement
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Introduction to Psychometrics Psychometrics & Measurement Validity Some important language Properties of a “good measure” –Standardization –Reliability.
Concept of Measurement
Reliability & Validity
Beginning the Research Design
Psychometrics, Measurement Validity & Data Collection
The Methods of Social Psychology
Introduction to Psychometrics
Measurement Reliability Objective & Subjective tests Standardization & Inter-rater reliability Properties of a “good item” Item Analysis Internal Reliability.
Psych 231: Research Methods in Psychology
Variables cont. Psych 231: Research Methods in Psychology.
Measuring Human Performance. Introduction n Kirkpatrick (1994) provides a very usable model for measurement across the four levels; Reaction, Learning,
Validity, Reliability, & Sampling
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Understanding Validity for Teachers
Measurement and Data Quality
Revision Sampling error
Descriptive and Causal Research Designs
Experimental Research
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Study of the day Misattribution of arousal (Dutton & Aron, 1974)
SOCIOLOGICAL INVESTIGATION
Chapter 4 – Research Methods in Clinical Psych Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Variables and their Operational Definitions
Data Collection and Reliability All this data, but can I really count on it??
Measurement Validity.
Advanced Research Methods Unit 3 Reliability and Validity.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Unit 5: Improving and Assessing the Quality of Behavioral Measurement
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Measurement and Scaling
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
INSTRUMENTATION QUESTIONNAIRE EDU 702 RESEARCH METHODOLOGY ZUBAIDAH ABDUL GHANI ( ) NORELA ELIAS ( ) ROSLINA AHMED TAJUDDIN ( )
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Validity and Reliability in Instrumentation : Research I: Basics Dr. Leonard February 24, 2010.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
PSY 432: Personality Chapter 1: What is Personality?
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Chapter 2 Theoretical statement:
Understanding Results
Lecture 6 Structured Interviews and Instrument Design Part II:
5. Reliability and Validity
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
REVIEW I Reliability scraps Index of Reliability
Presentation transcript:

Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement “Kinds” of Items Properties of a “good measure” –Standardization –Reliability –Validity Standardization & Inter-rater Reliabiligy

Psychometrics (Psychological Measurement) The process of assigning a value to represent the amount or kind of a specific attribute of an individual. “Individuals” can be participants, collectives, stimuli, or processes We do not “measure individuals” We measure specific attributes of an individual E.g., Each participant in the Heptagonal Condition was presented with a 2 inch wide polygon to view for 10 seconds. Then this polygon and four similar ones were presented and the participant’s reaction time to identify the polygon presented previously was recorded. We will focus on measuring attributes of persons in this introduction!

Psychometrics is the “centerpiece” of scientific empirical psychological research & practice. All psychological data result from some form of “measurement” “Behaviors” are collected by observation, self-report or behavioral traces. Measurement is the process of turning those “behaviors” into “data” for analysis For those data to be useful we need “Measurement Validity” The better the measurement, the better the data, the more accurate and the more useful are the conclusions of the data analysis for the intended psychological research or application Without Measurement Validity, there can’t be Internal Validity, External Validity, or Statistical Conclusion Validity!

Most of what we try to measure in Psychology are constructs They’re called this because most of what we care about as psychologists are not physical measurements, such as height, weight, pressure & velocity… …rather the “stuff of psychology”  learning, motivation, anxiety, social skills, depression, wellness, etc. are things that “don’t really exist”. Rather, they are attributes and characteristics that we’ve constructed to give organization and structure to behavior. Essentially all of the things we psychologists research, both as causes and effects, are Attributive Hypotheses with different levels of support and acceptance!!!!

Measurement of constructs is more difficult than measurement of physical properties! We can’t just walk up to someone with a scale, ruler, graduated cylinder or velocimeter and measure how depressed they are. We have to figure out some way to turn observations of their behavior, self-reports or traces of their behavior into variables that give values for the constructs we want to measure. So, measurement is, just like the rest what we’ve learned about so far in this course, all about representation !!! Measurement Validity is the extent to which the data (variable values) we have represent the behaviors (constructs) we want to study.

What are the different types of constructs we measure from persons ??? The most commonly discussed types are... Demographics – population/subpopulation identifiers e.g., age, gender, race/ethnic, history variables Ability/Skill – “performance” broadly defined e.g., scholastic skills, job-related skills, research DVs, etc. Attitude/Opinion – “how things are or should be” e.g., polls, product evaluations, etc. Personality – “characterological & contextual attributes of an individual” e.g., anxiety, psychoses, assertiveness, extroversion, etc.

However, it is difficult to categorize many of the things we Psychologists measure.. Diagnostic Category achievement – limits of what can be learned/expressed &/or personality – private & social expressions &/or attitude/opinion – beliefs & feelings Social Skills achievement – something that has been learned ? &/or personality – how we get along socially is part of “who we are” ? Intelligence innate (biological) preparedness for learning &/or achievement – earlier learning = more intelligence Aptitude achievement – know things necessary to learn other things &/or specific capacity – the ability to learn certain skills

Each separate “thing” we measure is called an “item” e.g., a question, a problem, a page, a trial, etc. Collections of items are called many things… e.g., survey, questionnaire, instrument, measure, test, or scale Three “kinds” of item collections you should know.. Scale (Test) - all items are “put together” to get a single score Subscale (Subtest) – item sets “put together” to get multiple separate scores Surveys – each item gives a specific piece of information Most “questionnaires,” “surveys” or “interviews” are a combination of all three.

Kinds of items #1  objective items vs. subject items “objective” does not mean “true” “real” or “accurate” “subjective” does not mean “made up” or “inaccurate” Defined by “how the observer/interviewer/coder transforms participant’s responses into data” There are “skads” of ways of classifying or categorizing items, here are three ways that I want you to be familiar with … Objective Items - no evaluation or decision is needed either “response = data” or a “mathematical transformation” e.g., multiple choice, T&F, matching, fill-in-the-blanks (strict) Subjective Items – response must be evaluated and a decision or judgment made what should be the data value content coding, diagnostic systems, behavioral taxonomies e.g., essays, interview answers, drawings, facial expressions

Bit more about objective vs. subjective… Seems simple … the objective measure IS the behavior of interest … e.g., # impolite statements, GPA, hourly sales, # publications problems? Objective doesn’t mean “representative” … Seems harder … subjective rating of behavior IS the behavior of interest … e.g., friend’s eval, advisor’s eval, manager’s eval, Chair’s eval problems? Good subjective measures are “hard work,” but … Hardest & most common … construct of interest isn’t a specific behavior … e.g., social skills, preparation for the professorate, sales skill, contribution to the department problems ? What is construct & how represent it ???

Kind #2  Judgments, Sentiments & Scored Sentiments Judgments  do have a correct answer (e.g., = 4) the “behavior,” “response” or “trace” must be scored (compared it to the correct answer) to produce the variable/data scoring may be objective or subjective, depending on item Sentiments  do not have a correct answer (e.g., Like Psyc350?) or have a correct answer, but “we won’t check (e.g., age) the “behavior,” “response” or “trace” is the variable/data scoring may be objective or subjective, depending on item Scored Sentiments  do not have a correct answer but do have an “indicative answer” (e.g., Do you prefer to be alone?) “behavior,” “response” or “trace” must be scored (compared it to the indicative answer) to produce the variable/data scoring may be objective or subjective, depending on item

Using Judgments, Sentiments & Scored Sentiments Judgments  do have a correct answer Ability/skill Intelligence Diagnostic category Aptitude Sentiments  do not have a correct answer or have a correct answer, but “we won’t check” Demographics Attitude/Opinion Scored Sentiments  do not have a correct answer but do have an “indicative answer” Personality Diagnostic category Aptitude

Kind #3  Direct Keying vs. Reverse Keying We want the respondents to carefully read and respond to each item of our scale/test. One thing we do is to write the items so that some of them are “backwards” or “reversed” … Consider these items from a depression measure… 1. It is tough to get out of bed some mornings. disagree agree 2. I’m generally happy about my life I sometimes just want to sit and cry Most of the time I have a smile on my face If the person is “depressed”, we would expect then to give a fairly high rating for questions 1 & 3, but a low rating on 2 & 4. Before aggregating these items into a composite scale or test score, we would direct key (1=1, 2=2, 3=3, 4=4, 5=5) and reverse key items 2 & 4 (1=5, 2=4, 4=2, 5=1)

Desirable Properties of Psychological Measures Interpretability of Individual and Group Scores Population Norms Validity Reliability Standardization

Desirable Properties of Psychological Measures Standardization Administration & Scoring Reliability Inter-rater, Internal Consistency, Test-Retest & Alternate Forms Validity Face, Content, Criterioin-Related, Construct Population Norms Scoring Distribution & Cutoffs Interpretability of Individual & Group Scores

Standardization Administration – test is “given” the same way every time who administers the instrument specific instructions, order of items, timing, etc. Varies greatly - multiple-choice classroom test  hand it out - MMPI  hand it out - WAIS  whole books & courses Scoring – test is “scored” the same way every time who scores the instrument correct, “partial” and incorrect answers, points awarded, etc. Varies greatly - multiple choice test  fill in the bubble sheet - MMPI  whole books & courses - WAIS  whole books & courses

We need to assess the inter-rater reliability of the scores from “subjective” items. Have two or more raters score the same set of tests (usually 25-50% of the tests) Assess the consistency of the scores different ways for different types of items Quantitative Items correlation, intraclass correlation, RMSD Ordered & Categorical Items % agreement, Cohen’s Kappa Keep in mind  what we really want is “rater validity” we don’t really want raters to agree, we want then to be right! so it is best to compare raters with a “standard” rather than just with each other

Ways to improve inter-rater reliability… improved standardization of the measurement instrument do questions focus respondent’s answers? will “single sentence” or or other response limitations help? instruction in the elements of the standardization is complete explication possible? (borders on “objective”) if not, need “conceptual matches” practice with the instrument -- with feedback “walk-through” with experienced coders practice with “common problems” or “historical challenges” experience with the instrument really no substitute have to worry about “drift” & “generational reinterpretation” use of the instrument to the intended population different populations can have different response tendencies