The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Logistic Regression Psy 524 Ainsworth.
Item Response Theory in Health Measurement
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Introduction to Item Response Theory
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Psyc 235: Introduction to Statistics
Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.
Linear Regression.
A Method for Estimating the Correlations Between Observed and IRT Latent Variables or Between Pairs of IRT Latent Variables Alan Nicewander Pacific Metrics.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Zen and the Art of Significance Testing At the center of it all: the sampling distribution The task: learn something about an unobserved population on.
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
The Basics of Regression continued
Class 3: Thursday, Sept. 16 Reliability and Validity of Measurements Introduction to Regression Analysis Simple Linear Regression (2.3)
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
1 Sociology 601, Class 4: September 10, 2009 Chapter 4: Distributions Probability distributions (4.1) The normal probability distribution (4.2) Sampling.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Quiz Name one latent variable Name 2 manifest variables that are indicators for the latent variable.
Chapter 6 (cont.) Regression Estimation. Simple Linear Regression: review of least squares procedure 2.
Psychometric Defined by Research. Goals of This Session Brief wrap up of brown bags Psychometrics defined through research –Broad historical perspective.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Computerized Adaptive Testing: What is it and How Does it Work?
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Lecture 15 Basics of Regression Analysis
Issues in Experimental Design Reliability and ‘Error’
CORRELATION & REGRESSION
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Basic linear regression and multiple regression Psych Fraley.
Presented by Dr. Del Ferster.  We’ll spend a bit of time looking at some “test-type” problems.  We’ll re-visit quadratic functions. This time, we’ll.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Big Ideas & Better Questions, Part II Marian Small May, ©Marian Small, 2009.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Multiple linear indicators A better scenario, but one that is more challenging to use, is to work with multiple linear indicators. Example: Attraction.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Central Tendency & Dispersion
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
1 Psych 5510/6510 Chapter 13 ANCOVA: Models with Continuous and Categorical Predictors Part 2: Controlling for Confounding Variables Spring, 2009.
Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.
Analysis of AP Exam Scores
Item Analysis: Classical and Beyond
Introduction to Summary Statistics
Statistical Methods For Engineers
Introduction to Summary Statistics
Inferential Statistics
Item Analysis: Classical and Beyond
Multitrait Scaling and IRT: Part I
Item Analysis: Classical and Beyond
Presentation transcript:

The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D.

Goals of this session Present commonly used measurement models –IRT Show how these models form the backbone of any large scale assessment program –Equating –Scaling To discuss the meaning of the measurement models –Ability estimation –Item characteristics

Now, really …. This session is an introduction to the world of psychometrics My goal is for you to understand that psychometrics: –Is not a black box –Is really just a set of procedures

A little about me Yes, I’m a psychometrican B.A. in psychology at Kent State Ph. D. in psychology at Univ of Minn Working at Measured Progress since 1999 Research areas of interest: IRT, equating, scaling, person fit, adaptive testing

Why Psychology? Psychometricians typically come from: –Educational measurement programs –Psychometric programs –I/O programs Ultimately, we are all after the pursuit of understanding people by way of “quantification”

Psychometrics Defined Psychological Measurement Psycho metrics The business of measuring psychological “things”

What are psychological things? Any “latent” trait Any “characteristic” that is not directly “observable” Examples: –Depression, bi-polar, personality disorder –Math, reading, writing, science abilities We don’t care – let’s use “  ”

Counterparts to Psychometrics Econometrics –Measurement of economic things Sociometrics –Measurement of social things

All “metrics” are ultimately a blend of things Psycho- metrics PsychStatsMath

Quantification in Psychology Deep roots that came originally from philosophy Philosophy in the 1500s branched into several disciplines because of the need to quantify certain things to better understand human beings

Philosophy’s Many Branches This desire to better understand humans lead to two primary areas of study –Physiology 1543 Belgian physiologists practices the dissection of cadavers –Psychology 1524 Marco Marulik publishes The Psychology of Human Thought

Yes, I did just use the word “cadaver” … but trust me it’s okay

The last 100 years of psychometrics Classical test theory and Spearman’s 1904 contribution –True score theory –Reliability theory, p-values, point biserial coefficients Item response theory

Let’s talk about IRT When I say “measurement model” I really do mean some sort of IRT model Lots of historical developments Lord & Novick text of 1968 Many advantages over CTT

So, what is IRT? A family of mathematical models that describe the interaction between examinees and test items Examinee performance can be predicted in terms of the underlying trait Provides a means for estimating scores for people and characteristics of items Common framework for describing people and items

The ogive Natural occurring form that describes something about people Used throughout science, engineering, and the social sciences Also, used in architecture, carpentry, engineering, photograph, art, and so forth

The ogive

A little jargon The item characteristic curve (ICC) Also called: –Item response function –Trace line –Etc. Stochastic: 1) involving a random variable, or 2) involving chance or probability

The ICC Does this one little function really do everything? –Scale items & people onto a common metric? –Help in standard setting? –Foundation of equating? –Some meaning in terms of student ability?

Does this one little function really do everything? Let’s talk more about the ICC

The ICC Any line in a Cartesian system can be defined by a formula The simplest formula for the ogive is the logistic function:

The ICC Where  is the item parameter, and  is the person parameter The function represents the probability of responding correctly to item i given the ability of person j.

 is the inflection point Item i  i =0.125

We can now use the item parameter to calculate p Let’s assume we have a student with  =1.0, and we have our  = Then we can simply plug in the numbers into our formula

Using the item parameters to calculate p p =  i =1.00

Wait a minute What do you mean a student with an ability of 1.0?? Does an ability of 0.0 mean that a student has NO ability? What if my student has a reading ability estimate of -1.2? What in the world does that mean????

The ability scale Ability is on an arbitrary scale that just so happens to be centered around 0.0 We use arbitrary all the time: –Fahrenheit –Celsius –Decibels –DJIA

Scaled Scores Although ability estimates are centered around zero – reported scores are not However, scaled scores are typically a linear transformation of ability estimates Example of a linear transformation: –(Ability x Slope) + Intercept

The need for scaled scores ½ the kids will have negative ability estimates

Scaled Scores

Use of scaled scores Student/parent level report School/district report Cross year comparisons Performance level categorization

There’s a lot here Scaled scores are surface level information Behind the scenes: –we use fancy formulas to depict interaction between students and test items –there’s a “probabilistic” relationship between students and test items

Unfortunately, life can get a lot worse Items vary from one another in a variety of ways: –Difficulty –Discrimination –Guessing –Item type (MC vs. CR)

Items can vary in terms of difficulty Ability of a student Easier item Harder item

Items can vary in terms of discrimination Discrimination is reflected by the “pitch” in the ICC Thus, we allow the ICCs to vary in terms of their slope

Good item discrimination 2 close ability levels Noticeable difference in p

Poor item discrimination smaller difference Same 2 ability levels

Guessing This item is asymptotically approaching 0.25

Polytomous Items

I’m sure by now you might be having a couple of thoughts How can I get up, open the door, and walk out without anybody noticing? I’m stuck in a “psycho”metric prison … help me!

But, trust me … I’m really trying to make a simple point

Items and people Interact in a variety of ways We can use IRT to show that there exists a nice little s-shaped curve that shows this interaction As ability increases – the probability of a correct response increases

Advantages of IRT Because of the stochastic nature of IRT there are many statistical principles we can take advantage of A test is a sum of its parts

The test characteristic curve A test is made up of many items The TCC can be used to summarize across all of our items The TCC is simply the summation of ICCs along our ability continuum For any ability level we can use the TCC to estimate the overall test score for an examinee

A bunch of ICCs are on a test

The test characteristic curve

From an observed test score (i.e., a student’s total test score) we can estimate ability The TCC is used in standard setting to establish performance levels The TCC can also be used to equate tests from one year to the next The test characteristic curve

Estimating Ability Total score = 3 Ability≈0.175

Standard Setting Advanced Prof. Basic Below Failing Basic

Equating Year 1 TCC

Equating Year 2 TCC & Scale

Equating Our 2 nd scale goes away and our TCC are closer together

Equating Remaining differences due to non-common items

Equating The adjustment to the TCC can be done a variety of different ways Let’s take a look a one commonly used method of equating, namely the Mean Shift method

Mean shift method of equating

These items are common between the two years

Mean shift method of equating 

Mean shift method of equating  

Mean shift method of equating The difference between  1 and  2 is our “scaling constant” This is used to make an adjustment to all the items administered in Year 2, so that they are then on the same scale as Year 1

Example   = 0.20   2 =  We need to add 0.30 (  1-  2:.2+.1=.3) in order for the equating items to have the same mean  This 0.30 difference is due to an arbitrary scaling difference and NOT due to any differences in ability

Mean shift method of equating  0.30 is then added to all the item difficulty values

Mean shift method of equating By shifting all our item difficulties to last years scale we are ultimately putting this year’s TCC onto last year’s scale

Equating The example we just saw was merely one example of an equating methods There are several methods (Kolen & Brennan) that are available

What have we learned? IRT: used to model interaction between items and people Item characteristics: item vary in terms of difficulty, discrimination, guessing, etc. Equating: used to relate test from one year to the next Scaling: used to represent student ability

The Assessment Cycle Administration ICCs & TCCs Equating Ability estimates & scaling Reporting

So, how is this all done?

Psychometricians often play the role of the magical wizard of assessments

But, really This session has served as your training in psychometric methods For career opportunities please send along a copy of your vita to: Measured Progress Attn: psychometric Department 171 Watson Road Dover, Nh 03820