An Introduction to Item Response Theory - IRT

An Introduction to Item Response Theory - IRT
Dalton Andrade (UFSC) Héliton Tavares (UFPA) Adriano Borgatto (UFSC)

Overview Main ideas, concepts and applications of Item Response Theory – IRT in different areas Session 1: Main ideas and the concepts of IRT Unidimensional models for dicothomous and polythomous items Estimation methods Construction of latent trace scales Session 2: Equating methods Differential item functioning – DIF Computerized adaptive testing – CAT Several applications of IRT in many different areas

Why do we need to get measurements?
We have got measurements since the first day of our lives!

Temperature (degree): Celsius / Fahrenheit  C = (F – 32)/1.8 Body weight (mass): Kilogram / Pound  1 kg = lb Body height: Meter / Feet  1 m = ft Blood pressure: millimeters of mercury (mm Hg) etc ...

What about these caracteristics? Satisfaction Depression (Psychiatry) Life Quality Math proficiency/ability (Education) Statistics use in workplace WEB usability (e-commerce) Diagnostic reasoning – Nursing

(cont.) Resistance to change Organizational environmental performance and soo on ..... They are all examples of what we call “Latent Trace” They are “caracteristics” that can not be measured /observed directly We need to “build scales/metrics” to measure them

How to measure Latent trace?
To build it, we will need: Measurement instrument: Questionnary, Test .... Scale/metric

Motivation Is it possible to estimate the body height of a person?
INSTRUCTIONS: Please, read the questions below about you and answer 1 for “YES” and 0 for “NO”, filling up in the green line or use the NO/YES Options. 1. In bed, I often suffer from cold feet 2. When walking down the stairs, I take two steps at a time 3. I think I would do well on a basket ball team 4. As a police officer, I would not make much of an impression 5. In most cars, I sit uncomfortably 6. I literally look up to most of my friends 7. I am able to pick up an object on top of a cabinet, without using stairs

Motivation Is it possible to estimate the body height of a person?
INSTRUCTIONS: Please, read the questions below about you and answer 1 for “YES” and 0 for “NO”, filling up in the green line or use the NO/YES Options. 8. I bump my head quite often 9. I can store luggage in the trunk of the plane or bus 10. I usually set the car seat back 11. Usually when I'm walking ride they offer me the front seat 12. For school pictures I was always asked to stand in the last row 13. I have trouble to accommodate me on the bus 14. Among several friends, you’re would be preferred for changing light bulbs

“Playing” with body height(*)
Motivation “Playing” with body height(*) 1,50m 1,55m 1,60m 1,65m 1,70m 1,75m 1,80m 1,85m (*) Many thanks to Prof. C. A. W. Glas – University of Twente – Netherlands - ABE - SINAPE 2006.

Positioning respondents and items on the same scale

Dalton

Dalton Adriano Héliton

Héliton Dalton Adriano

-4 -3 -2 -1 1 2 3 9 7 10 2 4 Dalton Adriano Héliton

60 70 80 90 100 110 120 130 9 7 10 2 4 Dalton Adriano Héliton

Concepts and Constructs
“An abstraction formed by generalization from particulars” Abstracts are hard to define E.g. intelligence Construct: A concept with scientific purpose (i.e. operationalized) Can be measured and studied. E.g. IQ Psy Cal State Northridge

What is item analysis in general?
Item analysis provides a way of measuring the quality of questions - seeing how appropriate they were for the respondents and how well they measured their ability/trait. It also provides a way of re-using items over and over again in different tests with prior knowledge of how they are going to perform; creating a population of questions with known properties (e.g. test bank)

Item Analysis Classical Test Theory Latent Trait Models
Item Response Theory Rasch Models … 1PL 2PL 3,4,5 PL Grad Nom Mult Ass Unfold Similar, but different concepts

Types of items Dichotomous: Body height
“In bed, I often suffer from cold feet”: Yes/No Polytomous ordinal: Memory “Do you forget to give messages?”: Never, Rarely, Some times, Frequently, Always Polytomous nominal: Math “Multiple choice item with five categories A, B, C, D, E”: usually treated as Right/Wrong …… One can have more than one type of item in the same questionnary

Classical Test Theory CTT

Classical test Theory (CTT)
Classical Test Theory (CTT) – often called the “true score model” Called classic relative to Item Response Theory (IRT) which is a more modern approach (Digital vs Analogical) CTT describes a set of psychometric procedures used to test items and scales reliability, difficulty, discrimination, etc.

Classical Test Theory vs. Latent Trait Models
Classical analysis has the test (not the item) as its basis. Although the statistics generated are often generalised to similar students taking a similar test; they only really apply to those students taking that test Latent trait models aim to look beyond that at the underlying traits which are producing the test performance. They are measured at item level and provide sample-free measurement

Classical test Theory (CTT)
Assumes that every person has a true score on an item or a scale if we can only measure it directly without error CTT analyses assumes that a person’s test score is comprised of their “true” score plus some measurement error. This is the common true score model

CTT: Internal Consistency Reliability
Coefficient Alpha (Cronbach´s) can also be defined as: 𝑆 𝑇𝑜𝑡𝑎𝑙 2 is the composite variance (if items were summed) 𝑆 𝑖 2 is variance for each item i=1,…,k k is the number of items

Standard Error of Measurement
The standard error of measurement is the error associated with trying to estimate a true score from a specific test This error can come from many sources We can calculate it’s size by: 𝑆 𝑚𝑒𝑎𝑠 =𝑆 1−𝛼 S is the standard deviation; 𝛼 is reliability

Graphical Information
Key N Dificulty Discrimination B 20438 Average Very Good CTT Parameters Responses A C D % Total 24,10 53,30 16,10 6,50 % Group 1 28,20 31,10 28,10 12,60 % Group 2 29,10 47,50 17,40 6,00 % Group 3 17,10 75,50 5,50 1,90 Rbis -0,10 0,40 -0,35 -0,34

Item Response Theory IRT

Item Response Theory (IRT)
Item Response Theory (IRT) – refers to a family of latent trait models used to establish psychometric properties of items and scales Sometimes referred to as modern psychometrics because in large-scale education assessment, testing programs and professional testing firms IRT has almost completely replaced CTT as method of choice IRT has many advantages over CTT that have brought IRT into more frequent use Item response theory (IRT) Set of probabilistic models that… Describes the relationship between a respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)… To his or her probability of a particular response to an individual item

Some other advantages Provides more information than classical test theory Classical test statistics depend on the set of items and sample examined IRT modeling not dependent on sample examined Can examine item bias/ measurement equivalence and provide conditional standard errors of measurement

Three Basics Components of IRT
Item Response Function (IRF) – Mathematical function that relates the latent trait to the probability of endorsing an item IRFs can then be converted into Item Characteristic Curves (ICC) which are graphical functions that represents the respondents ability as a function of the probability of endorsing the item Item Information Function – an indication of item quality; an item’s ability to differentiate among respondents Invariance – position on the latent trait can be estimated by any items with know IRFs and item characteristics are population independent within a linear transformation

Item Response Theory IRT

Item Response Theory Models: they depend on the item type
Items scores as right/wrong, Yes/No etc. Logistic Model (one dimensional trait) with 1, 2 or 3 parameters

3 Parameters Logistic model (3PL)
b c a = 1.7 c b = 0 b c = 0

IRF – Item Parameters Location (b)
An item’s location is defined as the amount of the latent trait needed to have a .5 probability of endorsing the item. The higher the “b” parameter the higher on the trait level a respondent needs to be in order to endorse the item Analogous to difficulty in CTT Like Z scores, the values of b typically range from -3 to +3, when considering de scale (0,1)

IRF – Item Parameters Discrimination (a)
Indicates the steepness of the IRF at the items location An items discrimination indicates how strongly related the item is to the latent trait like loadings in a factor analysis Items with high discriminations are better at differentiating respondents around the location point; small changes in the latent trait lead to large changes in probability Vice versa for items with low discriminations

IRF – Item Parameters Guessing (c)
The inclusion of a “c” parameter suggests that respondents very low on the trait may still choose the correct answer. In other words respondents with low trait levels may still have a small probability of endorsing an item This is mostly used with multiple choice testing…and the value should not vary excessively from the reciprocal of the number of choices.

IRF – Item Parameters Upper asymptote (d)
The inclusion of a “d” parameter suggests that respondents very high on the latent trait are not guaranteed (i.e. have less than 1 probability) to endorse the item Often an item that is difficult to endorse (e.g. suicide ideation as an indicator of depression) Not used in most cases IRF – Item Parameters Lower asymptote (f) The inclusion of a “f” parameter suggests that the item IRT may not be symmetrical around “b” parameter. Not used in most cases

Effect of the “a” parameter
Small “a,” poor discrimination

Effect of the “a” parameter
Larger “a,” better discrimination

Effect of the “b” parameter
Low “b,” “easy item”

Effect of the “b” parameter
Higher “b,” more difficult item “b” inversely proportional to CTT p

Effect of the “c” parameter
c=0, asymptote at zero

Effect of the “c” parameter
“low ability” respondents may endorse correct response

ICC from real data

More calibrated items Some items have problems and should be revised.

ICCs from body height application

Item Response Theory Some other models

IRT: Nominal Response Model (NRM)
Introduced by Bock (1972) Polytomous responses in NRM are unordered It considers all response categories (h=1,...,mi) Interpretation of ais and bis are such as in Logistic model

IRT: Gradual Response Model (GRM)
Samejima (1969, 1972, 1995) Likert-scale items (strongly disagree, disagree, neutral, agree, and strongly agree) GRM considers ordered categories (h=1,...,mi)

Item Response Models Partial Credit Model (PCM): GRM with ai=1
Generalized Partial Credit Model (GPCM) Rating Scale Model (RSM): Andrich (1978a, 1978b) GRM with bis = bi – ds Unfolding Models (Roberts, 2000): Non Cumulative latent traces (attitude, behavior ….) Multidimensional (Dr. Reckase): Compensatory Three-Parameter Logistic Model (MC3PLM) … These models can be used to One-Group or Multiple Group Analysis

Item Response Theory Item and Test Information Function

IRT – Item Information Function Statistical FISHER Information
Each IRF can be transformed into an item information function (IIF); the precision an item provides at all levels of the latent trait. The information is an index representing the item’s ability to differentiate among individuals. The standard error of measurement (which is the variance of the latent trait level) is the reciprocal of information, and thus, more information means less error. Measurement error is expressed on the same metric as the latent trait level, so it can be used to build confidence intervals.

IRT – Item Information Function
Difficulty parameter - the location of the highest information point Discrimination - height of the information. Large discriminations - tall and narrow IIFs; high precision/narrow range Low discrimination - short and wide IIFs; low precision/broad range.

IRT – Test Information Function
Test Information Function (TIF) – The IIFs are also additive so that we can judge the test as a whole and see at which part of the trait range it is working the best.

Item Response Theory Important Assumptions: 1. Invariance 2
Item Response Theory Important Assumptions: 1. Invariance 2. Dimensionality

IRT - Invariance Invariance - IRT model parameters have an invariance property Examinee trait level estimates do not depend on which items are administered, and in turn, item parameters do not depend on a particular sample of examinees (within a linear transformation). Invariance allows researchers to: efficiently “link” different scales that measure the same construct, compare examinees even if they responded to different items, and implement computerized adaptive testing.

IRT - Dimensionality The models presented make a common assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1st eigenvalue to the 2nd eigenvalue (Lord, 1980)

PAF and scree plots If the data are dichotomous, factor analyze tetrachoric correlations Assume continuum underlies item responses Dominant first factor

Item Response Theory Estimation: 1. Items (Calibration): Marginal Maximum Likelihood 2. Conditional Ability Distribution

Estimating equations for Item Parameters
Numerical process: EM Algorithm

Estimating equation for ability
Based on the ability distribution, conditional to response vector

Example 1 SARESP 2007: Portuguese(LP)
3a. Grade high school: Prova-POR-3EM-Manha.pdf 30 multiple choice items 1,001 students (small sample just for presentation) Data: 3EM_Manha.DAT file Software: BilogMG - 3EM_Manha.BLM (sintaxe) Results: 3EM_Manha.PH1 3EM_Manha.PH2 3EM_Manha.PH3 3EM_Manha.PAR 3EM_Manha.SCO

Example 2 Lifestyle Questionnaire: next page
15 polytomous ordinal items with four categories: No, Sometimes, Almost always, Always 580 respondents Data: Estilo.dat file Software: Multilog – Estilo.MLG (sintaxe) Results: Estilo.OUT

Você não fuma e não ingere álcool (ou com moderação).
Itens Dimensões Descrição 1. Alimentação Sua alimentação diária inclui pelo menos 5 porções de frutas e verduras. 2. Você evita ingerir alimentos gordurosos (carnes gordas, frituras) e doces. 3. Você faz de 5 refeições variadas ao dia, incluindo café da manhã completo. 4. Ativídade Física Você realiza ao menos 30 minutos de atividades moderadas/ intensas, de forma contínua ou acumulada, 5 ou mais dias na semana. 5. Ao menos duas vezes por semana você realiza exercícios que envolvam força e alongamento muscular. 6. No seu dia-a-dia, você caminha ou pedala como meio de transporte e, preferencialmente, usa as escadas ao invés do elevador. 7. Comportamento Preventivo Você conhece sua pressão arterial, seus níveis de colesterol e procura controlá- los. 8. Você não fuma e não ingere álcool (ou com moderação). 9. Você respeita as normas de transito (como pedestre ciclista ou motorista); se dirige usa sempre o cinto de segurança e nunca ingere álcool. 10. Relacionamento Social Você procura cultivar amigos e está satisfeito com seus relacionamentos. 11. Seu lazer inclui encontros com amigos, atividades esportivas em grupo, participação em associações ou entidades sociais. 12. Você procura ser ativo em sua comunidade, sentindo-se útil no seu ambiente social. 13. Controle do Estresse Você reserva tempo (ao menos 5 minutos) todos os dias para relaxar. 14. Você mantém uma discussão sem alterar-se, mesmo quando contrariado. 15. Você equilibra o tempo dedicado ao trabalho com o tempo dedicado ao lazer

Is it possible to estimate the body height of a person?
Example 3: Body Height Is it possible to estimate the body height of a person? INSTRCTIONS: Please, read the questions below about you and answer 1 for “YES” and 0 for “NO”, filling up in the green line or use the NO/YES Options. 1. In bed, I often suffer from cold feet 2. When walking down the stairs, I take two steps at a time 3. I think I would do well on a basket ball team 4. As a police officer, I would not make much of an impression 5. In most cars, I sit uncomfortably 6. I literally look up to most of my friends 7. I am able to pick up an object on top of a cabinet, without using stairs 8. I bump my head quite often 9. I can store luggage in the trunk of the plane or bus 10. I usually set the car seat back 11. Usually when I'm walking ride they offer me the front seat 12. For school pictures I was always asked to stand in the last row 13. I have trouble to accommodate me on the bus 14. Among several friends, you’re would be preferred for changing light bulbs Responses to items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Height (cm) Score 184 6'0" NO YES

Item Response Theory Building the Ability scale 1. Positioning items 2
Item Response Theory Building the Ability scale 1. Positioning items 2. Interpretation

Positioning Items Definition of anchoring items:
Two consecutive levels Y and Z, with Y < Z Y Z We say that an item is anchor at a level Z if and only if a) P(X=1/=Z)  0,65 b) P(X=1/=Y) < 0,50 c) P(X=1/=Z) - P(X=1/=Y)  0,30

Positioning Items Back to Exemplo 1: Portuguese, 30 items
PositioningItems.xlsx Interpretation of the scale

Test Equating Participants that have taken different tests measuring the same construct, can be placed on the same scale and compared or scored equivalently Equating across grades on math ability Equating across years for placement or admissions tests

Test Equating Example 3: Evening students
3a. Grade high school: Prova-POR-3EM-Noite.pdf 30 multiple choice items 1,001 students (small sample just for presentation) Data: 3EM_Noite.DAT file Software: BilogMG - 3EM_Noite.BLM (sintaxe) Results: 3EM_Noite.PH1 3EM_Noite.PH2 3EM_Noite.PH3 3EM_Noite.PAR 3EM_Noite.SCO

Test Equating Five common items between the two Tests: Items 15 to 19 in both Tests Invariance principle 𝑃 𝑈=1 𝜃,𝑎,𝑏,𝑐 =𝑃 𝑈=1 𝜃∗,𝑎∗,𝑏∗,𝑐∗ with θ* = λ 𝜃 + β, b* = λ 𝑏 + β 𝑎* = 𝑎/λ and c* = c Equal_24_07_Posteriori_ManhaNoite.xls

Test Equating Multiple group equating: k = 1, 2, …., K
Group k: mean µk and variance σk2 For the reference group R, we set µR = 0 and σR2 = 1

Test Equating Example 4: Example 1 + Example 3 with K=2 and R=1  µ1 = 0 and σ12 = 1 55 (= – 5) multiple choice items 2,002 students Data: 3EM_Equat_MxE.DAT file Software: BilogMG - 3EM_Equat_MxE.BLM (sintaxe) Results: 3EM_Equat_MxE.PH1 3EM_Equat_MxE.PH2 3EM_Equat_MxE.PH3 3EM_Equat_MxE.PAR 3EM_Equat_MxE.SCO

Test Equating Example 5: National Basic Education Assessment System - SAEB 5th and 9th grades (Fundamental) and 3th grade (High school) Every two years (odd years) In each grade, the amount of items needed is much bigger than what one student can answer

How many items we need to cover a matrix?
One example (SARESP): 13 Booklets with 8 items each (104 items). Every examinee takes just 3 Booklet ITEMS 1 2 3 4 5 6 7 8 10 79 52 81 68 16 44 22 93 27 100 76 53 91 87 34 29 84 45 62 14 61 26 43 55 28 72 80 97 15 94 67 64 24 63 78 99 20 11 75 51 85 69 60 103 48 59 25 30 56 35 92 21 47 13 33 83 57 71 42 9 74 102 32 86 41 98 89 95 12 39 70 49 101 18 82 77 37 23 65 58 38 66 19 96 104 31 88 36 40 90 73 46 54 50 17

BIB (Balanced Incomplete Block) design:
Total: 26 bundles. Booklet Booklet Bundle Bundle 1 2 3 1 2 3 1 1 2 3 14 1 2 5 2 2 3 4 15 2 3 6 3 3 4 5 16 3 4 7 4 4 5 6 17 4 5 8 5 5 6 7 18 5 6 9 6 6 7 8 19 6 7 10 7 7 8 9 20 7 8 11 8 8 9 10 21 8 9 12 9 9 10 11 22 9 10 13 10 10 11 12 23 10 11 1 11 11 12 13 24 11 12 2 12 12 13 1 25 12 13 3 13 13 1 2 26 13 1 4

Test Equating Example 5: National Basic Education Assessment System - SAEB Common items between grades Common items between years Multiple groups model Items already calibrated and new items

SAEB - LP

SAEB - MT

Differential Item Functioning (DIF)

Differential Item Functioning DIF
How can age groups, genders, cultures, ethnic groups, and socioeconomic backgrounds be meaningfully compared? Can be a research goal as opposed to just a test of an assumption? Test equivalency of test items translated into multiple languages Test items influenced by cultural differences Test for intelligence items that gender biased Test for age differences in response to personality items

Atividades Pós-Administração
Ajuda identificar se um item de um teste está refletindo acuradamente reais diferenças entre grupos ou se o item por si mesmo está produzindo diferenças injustas. Descartar itens que são comprovadamente injustos Indivíduos de mesmo escore/proficiência respondem de forma diferenciada a um item pelo fato de pertencerem a grupos diferentes Exemplos: Sexo, Raça, Região, EJA/Não EJA, etc …

Importante: Não estamos dizendo, por exemplo que os alunos do Nordeste não podem apresentar uma maior proporção de acerto a um item de matemática do que os alunos do Sul!!!! O que estamos dizendo é que alunos de mesma proficiência em matemática, tanto do Nordeste quanto do Sul, devem apresentar a mesma performance no item

DIFERENÇAS entre grupos = Impacto
DIFERENÇAS entre matched-for-ability grupos = DIF

DIF by IRT DIF uniform Only b (difficulty) parameter DIF non uniform Parameters b (difficulty) and a (discrimination)

DIF Uniform

DIF Non Uniform

Computerized Adaptive Testing CAT
An item is given to the participant (usually easy to moderate difficulty) and their answer allows their trait score to be estimated, so that the next item is chosen to target that trait level After the second item is answered their trait score is re-estimated, etc. CA tests are at least twice as efficient as their paper and pencil counterparts with no loss of precision

Computerized Adaptive Testing CAT
The implementation of a CAT is not an easy task. It involves different skills in different areas of knowledge and very sensitive issues such as information security, item bank development, choice of estimation methods, criteria for selecting the next item, stopping rules, incorporation of new items etc.

Implementation and use in Brazil
University of São Paulo at São Carlos (USP-SC) Federal University of Santa Catarina (UFSC) Federal University of Pará (UFPA) Cesgranrio Foundation University of Brasília - Cespe/Cebraspe Vunesp Foundation

Applications of IRT in Education
Brazilian assessments ENEM (Exame Nacional do Ensino Médio / National High School Exam) SAEB (Sistema Nacional de Avaliação da Educação Básica / National Basic Education Assessment System) ENCCEJA: National Exam for Certification of Competences of Youngsters and Adults ANA: National Assessment of Alphabetization SARESP, SisPAE, SaePE …

Applications of IRT in Education
International assessments PISA: Programme for International Student Assessment - OECD TIMSS: Trends in International Mathematics and Science Study TALIS: Teaching and Learning International Survey (T)ERCE (Unesco): (Third) Comparative Latin America and Caribe Study: “Estudio de logro de aprendizaje a gran escala más importante de la región, ya que comprende 15 países (Argentina, Brasil, Chile, Colombia, Costa Rica, Ecuador, Guatemala, Honduras, México, Nicaragua, Panamá, Paraguay, Perú, República Dominicana y Uruguay) más el Estado de Nuevo León (México).”

Applications of IRT in other areas
Environment and Ecology Almeida, V. L. Avaliação do Desempenho Ambiental de Estabelecimentos de Saúde, por meio da Teoria da Resposta ao Item, como Incremento da criação do Conhecimento Organizacional. Tese de Doutorado, PPGEGC/UFSC, 2009. Trierweiller, A. C., Peixe, B. C. S., Tezza, R., Bornia , A. C., Andrade, D. F. and Campos, L. M. S. Environmental management performance for brazilian industrials: measuring with the item response theory. Work, 41, , 2011. Afonso, M. H. F. Mensuração da Predisposição ao Comportamento Sustentável por Meio da Teoria da Resposta ao Item. Dissertação de Mestrado, PPGEP/UFSC, 2013.

Environment and Ecology Trierweiller, A. C., Peixe, B. C. S., Bornia , A. C., Campos, L. M. S. and Tezza, R. (2013). Evidenciation of environmental management: an evaluation with item response theory. Brazilian Journal of Operations & Production Management, v. 9, no. 2, , 2013. Peixe, B. C. S. Mensuração da Maturidade do Sistema de Gestão Ambiental de Empresas Industriais Utilizando a Teoria de Resposta ao Item. Tese de Doutorado, PPGEP/UFSC,

Customer Satisfaction Costa, M.B.F. (2001). Técnica derivada da teoria da resposta ao item aplicada ao setor de serviços. Dissertação de Mestrado – PPGMUE/UFPR Bortolotti, S.L.V. (2003). Aplicação de um modelo de desdobramento da teoria da resposta ao item – TRI. Dissertação de Mestrado. EPS/UFSC. Bayley, S. (2001). Measuring customer satisfaction. Evaluation Journal of Australasia, v. 1, no. 1, 8-16.

Total Quality Management Alexandre, J.W.C., Andrade, D.F., Vasconcelos, A.P. e Araújo, A.M.S. (2002). Uma proposta de análise de um construto para a medição dos fatores críticos da gestão pela qualidade através da teoria da resposta ao item. Gestão & Produção, v.9, n.2, p Bosi, M.A. (2010). Um Estudo sobre o Grau de Maturidade e a Evolução da Gestão pela Qualidade Total no Setor de Transformação Cearense por Meio da Teoria da Resposta ao Item. Dissertação de Mestrado, GES-LOG/UFC.

Psychiatry / Psychology Psychiatric scales: Beck Depression Inventory (BDI) Escala de sintomas Depressivos (CES-D) Escala de rastreamento de dependência de sexo (ERDS) Schaeffer, N. C. (1988). An Application of Item Response to the Measurement of Depression. Sociological Methodology, 18, 271–307.

Psychiatry / Psychology Coleman, M. J., Matthysse, S., Levy, D. L., Cook, S., Lo, J. B. Y.,Rubin, D. B. and Holzman, P. S. (2002). Spatial and object working memory impairments in schizophrenia patients: a bayesian item-response theory analysis. Journal of Abnormal Psychology, 111, number 3, Hays, R., Morales, L. S. e Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century, Medical Care, v.38. Kirisci, L., Hsu, T. C. e Tarter, R. (1994). Fitting a two- parameter logistic item response model to clarify the psychometric properties of the drug use screening inventory for adolescent alcohol and drug abusers, Alcohol Clin. Exp. Res 18: 1335–1341.

Psychiatry / Psychology Langenbucher, J. W., Labouvie, E., Sanjuan, P. M., Bavly, L., Martin, C. S. e Kirisci, L. (2004). An application of item response theory analysis to alcohol, cannabis and cocaine criteria in DSM-IV, Journal of Abnormal Psychology 113: 72–80. Yesavage JA, Brink TL Rose TL et al. (1983). Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiat Res, 17:37-49. Cúri, M. (2006). Análise de questionários com itens constrangedores. Tese de Doutorado. IME/USP. São Paulo.

Organizational Leadership Scherbaum, C.A., Finlinson, S., Barden, K., & Tamanini, K. Applications of Item Response Theory to Measurement Issues in Leadership Research. Leadership Quarterly, 17, , 2006. Faz uma aplicação de ambos modelos, acumulativo (MRG) e de desdobramento (GGUM).

Attribute Importance SAMARTINI, A. L. S. Modelos com Variáveis Latentes Aplicados à Mensuração de Importância de Atributos. Tese de doutorado. Escola de Administração de Empresas de São Paulo, 2006.

Aplications of IRT in other areas
Quality of Life Mesbah, M., Cole, B.F. and Lee, M.L.T.(2002). Ed. Statistical methods for quality of life studies: design, measurements and analysis. Boston:Kluwer Academic Publishers Genetics: to measure the predisposition of na individual to a specific disease Tavares, H. R.; Andrade, D. F.; Pereira, C.A. (2004) Detection of determinant genes and diagnostic via item response theory Genetics and Molecular Biology, v. 27, n. 4, p

Food insecurity Parke E. Wilde, Gerald J. and Dorothy R. Friedman (2004). Differential Response Patterns Affect Food-Security Prevalence Estimates for Households with and without Children. J. Nutr.134: –1915. Physicians Clinical Competence Jishnu Das, Jeffrey Hammer (2005). Which doctor? Combining vignettes and item response to measure clinical competence Journal of Development Economics 78,

Tezza. R., Bornia, A.C., Andrade, D.F.(2011). Measuring web usability using item response theory: Principles, features and opportunities. Interacting with Computers, 23, Menegon, L.S.(2013). Mensuração de Conforto e Desconforto em Poltrona de Aeronave pela Teoria da Resposta ao Item. Tese de Doutorado, PPGEP/UFSC. Adilson(tese) Silvana(tese) Juliano(paper)

Laboratório de Custos e Medidas – LCM/ EPS/UFSC ( Linha de Pesquisa: Teoria da Resposta ao Item Aplicada às Organizações PPGEP / UFSC

Some “Theoretical Applications”
Santosa, V.L.F., Moura, F.A.S., Andrade, D.F., Gonçalves, K.C.M.(2016). Multidimensional and Longitudinal Item Response Models for Non-ignorable Data. Computation Statistics and Data Analysis (Accepted for publication) Borgatto, A.F., Azevedo, C.L., Pinheiro, A., Andrade, D.F.(2015). Comparison of Ability Estimation Methods Using IRT for Tests with Different Degrees of Difficulty. Communications in Statistics-Simulation and Computation, 44, Caio(paper) Mariana(paper) Héliton(paper)

Computacional Aspects
Commercial: BilogMG, Multilog, IRTPro …. Non Commercial(Free): R Packages (IRT) LTM IRTOYS MIRT CATR MIRTCAT PSYCH

References ANDRADE, D. F., TAVARES, H. R., VALLE, R. C. (2000). Teoria da Resposta ao Item: conceitos e aplicações. 14o SINAPE, Associação Brasileira de Estatística. (Available in BAKER, F. B., (1992). Item Response Theory: Parameter Estimation Techniques. Marcel Dekker. BEATON, A. E; ALLEN, N. L. Interpreting scales through scale anchoring. J. Educ. Stat, v. 17, p. 191–204, 1999.

References BOCK, R.D. & ZIMOWSKI, M.F. (1996). Multiple Group IRT, in Linden, W.J. van der & Hambleton, R.K. (eds). Handbook of Modern Item Response Theory, Springer. Embretson, S. E. and Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum Associates, Inc., Publishers..

References KLEIN, R. (2003). Utilização da Teoria de Resposta ao Item no Sistema Nacional de Avaliação da Educação Básica (SAEB). Ensaio: Avaliação e Políticas Públicas em Educação, Rio de Janeiro, v.11, n.40, p , LORD, F.M. (1980). Applications of item response theory to practical testing problems.Hillsdale:Lawrence Erlbaum Associates Inc. LORD, F. M; NOVICK, M. R. Statistical Theories of Mental Test Score. Reading: Addison-Wesley, 1968.

References RECKASE, M.D. Multidimensional Item Response Theory. New York: Springer, 2009. Sistema Nacional de Avaliação da Educação Básica: SAEB 2001, Relatório Técnico. (2002). Consórcio Fundação Cesgranrio/Fundação Carlos Chagas, Rio de Janeiro.

Thank you!!! Dalton F. Andrade (UFSC/Vunesp)
Héliton R. Tavares (UFPA/Vunesp/UF) Adriano Borgatto (UFSC)

Applications Scaling individuals for further analysis
We often collect data in multifaceted forms (e.g. multi-items surveys) and then collapse them into a single raw score IRT based scores represent an optimal scaling of individuals on the trait Most sophisticated analyses require at-least interval level measurement and IRT scores are closer to interval level than raw scores Using scaled scores as opposed to raw scores has been shown to reduce spurious results

Applications Scale Construction and Modification
The focus is changing from creating fixed length, paper/pencil tests to creating a “universe” of items with known IRF’s that can be used interchangeably Scales are being designed based around IRT properties Pre-existing scales that were developed using CTT are being “revamped” using IRT

About tendencies/Research
Use of Response Times: Same approach works with any other source of collateral information, e.g., physiological measures, confidence marking, etc. Item Cloning for banking: Use computer to generate new items from family for administration to examinee Calibrate item families of clones rather than each individual item Use hierarchical IRT to allow for (small) random variation in item parameter values Multidimensional and /or Multivariate approaches

Applications Computer Adaptive Testing (CAT)
CA tests are at least twice as efficient as their paper and pencil counterparts with no loss of precision Primary testing approach used by ETS Adaptive form of the Headache Impact Survey outperformed the P and P counterpart in reducing patient burden, tracking change and in reliability and validity (Ware et al., 2003)

Item Response Theory Estimation with BILOG-MG and R

Before we begin… Data preparation Dichotomization (optional)
Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction) Dichotomization (optional) Reducing multiple options into two separate values (0, 1; right, wrong)

Estimating 3PL parameters
BILOG-MG (Scientific Software) Multiple files in directory (ASCII text) : BLM,DAT,NPF,PRM,… Data file must be saved as ASCII text ID number Individual responses

BILOG-MG input file (*.BLM)
Title lines: 2 AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. Blank or not >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

BILOG input file (*.BLM)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Parameters Data File Name Characters in ID field File for missing

AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Requested files for: Scoring, Parameters, Covariances

AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Number of items Sample size

AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; FORTRAN statement for reading data Name of scale/ measure

BILOG input file (*.BLG)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Estimation specifications (not the default for BILOG-MG)

AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling

Phase one output file (*.PH1)
CLASSICAL ITEM STATISTICS FOR SUBTEST AGR NUMBER NUMBER ITEM*TEST CORRELATION ITEM NAME TRIED RIGHT PERCENT LOGIT/1.7 PEARSON BISERIAL Can indicate problems in parameter estimation

Phase two output file (*.PH2)
CYCLE 12: LARGEST CHANGE = LOG LIKELIHOOD = CYCLE 13: LARGEST CHANGE = [FULL NEWTON STEP] -2 LOG LIKELIHOOD = CYCLE 14: LARGEST CHANGE = Check for convergence

Phase three output file (*.PH3)
Theta estimation Scoring of individual respondents Required for DTF analyses

Parameter file (specified, *.PAR)
“b” “c” “a” AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT AGR AGR AGR AGR AGR AGR AGR (32X,2F12.6,12X,F12.6)

Scoring and covariance files
Like the *.PAR file, specifically requested *.COV - Provides parameters as well as the variances/covariances between the parameters Necessary for DIF analyses *.SCO - Provides ability score information for each respondent

An Introduction to Item Response Theory - IRT

Similar presentations

Presentation on theme: "An Introduction to Item Response Theory - IRT"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Item Response Theory - IRT

Similar presentations

Presentation on theme: "An Introduction to Item Response Theory - IRT"— Presentation transcript:

Similar presentations

About project

Feedback