Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Implications and Extensions of Rasch Measurement.
Standardized Scales.
The GAIN-Q (GQ): Development and Validation of a Substance Abuse and Mental Health Brief Assessment Janet C. Titus, Ph.D. Michael L. Dennis, Ph.D. Lighthouse.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
CAT Item Selection and Person Fit: Predictive Efficiency and Detection of Atypical Symptom Profiles Barth B. Riley, Ph.D., Michael L. Dennis, Ph.D., Kendon.
PROMIS DEVELOPMENT METHODS, ANALYSES AND APPLICATIONS Presented at the Patient-Reported Outcomes Measurement Information System (PROMIS): A Resource for.
PROMIS: The Right Place at the Right Time? David Cella, Ph.D. Department of Medical Social Sciences Northwestern University Chair, PROMIS Steering Committee.
Item Writing Techniques KNR 279. TYPES OF QUESTIONS Closed ended  Checking yes/no, multiple choice, etc.  Puts answers in categories  Easy to score.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Consistent with earlier research, these data found a high rate of co- occurring Axis-I psychiatric disorders. While there was substantial overall agreement,
Presented at the 2006 CLEAR Annual Conference September Alexandria, Virginia Something from Nothing: Limitations of Diagnostic Information in a CAT.
Journal Club Alcohol and Health: Current Evidence May–June 2005.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Journal Club Alcohol and Health: Current Evidence July–August 2005.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
+ A New Stopping Rule for Computerized Adaptive Testing.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
1 Co-occurring Alcohol and Other Drug and Mental Health Conditions in Alcohol and other Drug Treatment Settings Session 3: Identifying Comorbidity.
Computerized Adaptive Testing in Clinical Substance Abuse Practice: Issues and Strategies Barth Riley Lighthouse Institute, Chestnut Health Systems.
Single-Subject Designs
Screening for Depression in Primary Care Kathryn M. Magruder, M.P.H., Ph.D. Derik E. Yeager, M.B.S. VA Medical Center Medical University of South Carolina.
1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.
Global Appraisal of Individual Needs The Global Appraisal of Individual Needs (GAIN) is a progressive and integrated family of instruments for:  initial.
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
Frequency and type of adverse events associated with treating women with trauma in community substance abuse treatment programs T. KIlleen 1, C. Brown.
Student Engagement Survey Results and Analysis June 2011.
Presented By: Trish Gann, LPC
1 What a Difference 5 Minutes can make in the Lives of Children and Adults: Screening for the Co-Occurring Disorders of Mental Health and Substance Abuse.
AOD Use and Mental Health Disparities during Pregnancy and Postpartum Victoria H. Coleman, Ph.D. & Michael L. Dennis, Ph.D. Chestnut Health Systems, Bloomington,
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
Treatment for Adolescents With Depression Study (TADS)
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
ScWk 240 Week 6 Measurement Error Introduction to Survey Development “England and America are two countries divided by a common language.” George Bernard.
1 National Outcomes and Casemix Collection Training Workshop Adult Ambulatory.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
U.S. Food and Drug Administration Notice: Archived Document The content in this document is provided on the FDA’s website for reference purposes only.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
Evaluating Classification Performance
Brian Lukoff Stanford University October 13, 2006.
WEEK 3 CLASSIFICATION AND ASSESSMENT OF ABNORMAL PSYCHOLOGY.
TOMS/NOMS FY12- FY14 Adult Survey Analysis: Does treatment lead to changes over time? 2/16/2016 Prepared by: Abigail Howard, Ph.D.
Project VIABLE - Direct Behavior Rating: Evaluating Behaviors with Positive and Negative Definitions Rose Jaffery 1, Albee T. Ongusco 3, Amy M. Briesch.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Understanding Results
Strategies to incorporate pharmacoeconomics into pharmacotherapy
Auditing & Investigations I
Mohamed Dirir, Norma Sinclair, and Erin Strauts
A Multi-Dimensional PSER Stopping Rule
Exhibit 1 “To what extent are chronic care management processes and programs in place to manage patients with high-need, high-cost chronic illnesses?”
Examining Data.
A handbook on validation methodology. Metrics.
Presentation transcript:

Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems

Overview  Global Appraisal of Individual Needs (GAIN)  Benefits of Computerized Adaptive Testing  CAT: How Does it Work  Examples of CAT in Clinical Assessment Triage of persons around treatment decisions for starting and stopping rule Content Balancing over multiple clinical dimensions Identification persons with atypical symptom presentations

The GAIN  Comprehensive biopsychosocial instrument designed for intake into substance abuse treatment. Provides 5 axes DSM-IV diagnoses Also supports treatment planning, outcome monitoring and program evaluation Versions varying from 2-5 minute screener, minute quick, and 1-2 hour full Over 103 scales, 1000 created variables, and text based narrative report The biggest problem is how long it takes

The Benefits of Computerized Adaptive Testing

General and Targeted Measures  Generalized Heavy response burden Lack specificity  Targeted Floor and ceiling effects Limited content validity Don’t “talk with each other.”

Tailoring Outcome Measurement Instrument A Instrument B Instrument C CAT Selects items from Item Bank administer item ?

Benefits of CAT & Item Banking CAT Item Bank Respondent Burden Tailoring/ Specificity Coverage of content domains Floor and ceiling effects

CAT vs. Short Forms  CAT has been found to be superior to “short forms” of tests, yielding more precise measures.

CAT: What Is It and How Does It Work?

Computerized Adaptive Testing Decreased Difficulty Typical Pattern of Responses Increased Difficulty Middle Difficulty Score is calculated and the next best item is selected based on item difficulty +/- 1 Std. Error CorrectIncorrect

Item Selection  There are several methods for selecting items during a CAT.  The most common method is to find the item that provides the most information given the current estimate of the measure.

Item Selection cont.

 Item selection can also take into account the types of domains of items to be represented in the CAT session.  Examples: Items necessary for DSM-IV diagnosis

Stop Rules  The stop rule, which determines when the item administration process of the CAT ends, can be based on: Measurement precision Number of items administered Test-taking time Some combination of the above

Item Bank Size  The more items there are in an item bank, the more likely it is that items that are tailored to an individual’s level on the measured variable will be available.  Typically, item banks consist of hundreds of items.  The number of items will likely depend on The number of constructs or domains being assessed. Whether one wishes to estimate a measure or classify persons into groups.

CAT for Clinical Assessment  The application of CAT to clinical research and assessment raises several new measurement issues.  Triage of persons around treatment decisions for starting and stopping rule  Content Balancing over multiple clinical dimensions  Identification of persons with atypical presentation of symptoms

Example 1 Triage of Individuals to Support Clinical Decision-Making

Classifying Persons Using CAT  CAT is typically used to estimate a measure  Few studies have examined the use of CAT to place persons into diagnostic groups.  For placing persons into diagnostic groups, it is desirable to vary the level of measurement precision depending on the category in which the person is placed.  Current CAT procedures do not allow one to vary measurement precision during the CAT session.

Triage of individuals to support clinicaldecision making  Strategy: Use of screener measures to set the value of thee initial measure and variable stop rules designed to maximize precision and efficiency for identification of persons in low, medium or high symptom severity  Implications: Taking into account initial location and/or precision around decision points can further improve the efficacy of assessment without hurting precision for decision making

Clinical Decision Making  To facilitate clinical diagnoses, it would be desirable for a CAT to: Classify patients by symptom severity Maximize measurement place within the area of the measure that is most critical for decision making. Use previously collected information to increase the efficiency of the CAT.

Study  We examined the ability of CAT to place persons into low, moderate and high levels of substance abuse and substance dependency.  The Substance Problem Scale (SPS) is a 16 item instrument that measures recency of substance use. “When was the last time you used alcohol or other drugs weekly?

Defining Cut Points  Cut points can be established by examining where persons with different levels of severity fall onto the measurement continuum.

The Start Rules Random: randomly select an item with difficulty calibrations between -0.5 and 0.5 logits (average level of difficulty). Screener: Select an item that has a difficulty level that most closely approximates the respondent’s measure on a previously administered screener (SDScr).

The Variable Stop Rule  Stop rules for the CAT were defined in terms of maximum standard error of measurement for the low, mid and high range of substance abuse severity.  The mid range stop rule was set to SE=0.35 for all simulations.  Low and High range SE ranged from SE=0.5 to 0.75 logits.

CAT Standard Error Middle range where decisions and made and precision is controlled High & Low ranges where there is little impact on clinical decisions and precision is allowed to vary more

The Item Selection Algorithm Start Rule Using Screener Select item Administer item Re-estimate measure & SE Stop rule met? End test Yes No Measure in high range? In mid range? Low range stop rule High range stop rule Mid range stop rule Yes No

Results  Screener starting rule improved efficiency of the CAT by approximately 7 percent compared to standard CAT procedures.  Variable stop rules improved efficiency by 15 to 38 percent, depending on definition of the mid range of severity, compared to standard stopping rules.

Results  Pre-calibration and variable stop rules resulted in accurate and efficient estimation of substance abuse severity.  The screener start rule had only a small effect on classification precision.

Next Step: Refining the Algorithm

Example 2: Content Balancing over Multiple Dimensions

Measuring Multiple Dimensions  Strategy: Use of content balancing methods in combination with conventional item selection procedures to ensure selection of items from each substantive domain  Implications: Assessment of an individual’s clinical profile can be conducted both efficiently and comprehensively at both the total and subscale level.

Internal Mental Distress Scale  The IMDS consists of the following subscales: Depression Symptom Scale Anxiety/Fear Symptom Scale Traumatic Distress Scale Homicidal/Suicidal Scale  IMDS also has 4 general somatic items as part of the total scale score.  Clinicians want to estimates for the overall severity and in each of the subscale areas.

Internal Mental Distress Scale by Content Area IMDS Subscale Item Calibrations Logits H/S Trauma Anxiety Depression Somatic

Example: No Content Balancing All Screener Items Administered

Example: No Content Balancing Depression: 2 H/S: 1 Anxiety: 1 Trauma: 1 Think other people don’t understand you: “Yes”

Example: No Content Balancing Depression: 3 H/S: 1 Anxiety: 1 Trauma: 1 Lost interest in things: “Yes”

Example: No Content Balancing Depression: 3 H/S: 1 Anxiety: 2 Trauma: 1 Thoughts people taking advantage of me: “No”

Example: No Content Balancing Depression: 4 H/S: 1 Anxiety: 2 Trauma: 1 Shyness: “No”

Example: No Content Balancing Depression: 4 H/S: 1 Anxiety: 3 Trauma: 1 Have to repeat action over and over: Yes

Results  If continued to 13 items –  Except for screener items, no hostility/suicide or trauma items were administered during the CAT session.  Mixed precision on the subscales

No Content Balancing Low Precision Estimates

IMDS by Content Area

IMDS Screener Items Logits Suicidal Trauma Anxiety Depression Somatic M1C2 Suicidal thoughts M2A Traumatic memories give distress M1D10 Thoughts should be punished M1B1 Trapped, lonely, sad, depressed M1A2 Sleep trouble Answers N, Y, Y, N, N and estimate overall Rasch severity as IMDS= -0.5 logit

IMDS Subscale Calibrations Logits DepressionAnxietyTraumaSuicidal five screener items change in rank order of severity

IMDS Subscale Item Calibrations Logits DepressionAnxietyTraumaSuicidal five screener items

Re-estimating IMDS Logits Suicidal Trauma Anxiety Depression Somatic Revised Estimate five screener items

Cont. Balancing: CAT to Full IMDS Subscale & Total now have good precision

Example 3: Identifying Persons with Atypical Symptom Presentations

Overview  Strategy: Rasch person fit statistics can identify persons with atypical clinical presentations in a computerized adaptive testing context  Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Using statistics that can identify persons with such an atypical presentation has important clinical implications.

Rasch Fit Statistics  Both infit and outfit follow a chi-square distribution where the high scores are of primary concern  Infit or “Randomness”: More changes between yes/no that would be expected based on overall severity. Low – almost too perfect fit High –more transitions than expected  Outfit or “Atypicalness”: Focuses more on the tail ends –Group of answers Used to detect unexpected outlying, off-target responses. Outlier sensitive Low – almost too perfect fit High – endorsed high severity items, but not the percursor items. (e.g.., easier items)

Problems with Fit Responses by Severity Low HighRandomnessAtypicalness

Clinical Implications of Misfit  Misfit in the context of clinical assessment can reflect: Difficulty understanding the assessment Cross-cultural effects Differential effects of treatment on some symptoms but not others  Our analyses indicate that there are subgroups who endorse severe symptoms without endorsement of milder symptoms.  Example: atypical suicide profile

Example: Atypical Suicide  Depression is regarded as the major risk factor for suicide.  However, there is a less common profile characterized by suicide-related symptoms but in the absence of depressive symptoms.  This profile can be identified through the use of fit statistics (atypicalness) Depression Suicide

Atypical Suicide

Fit Statistics in CAT  Fit statistics such as infit and outfit become less sensitive to atypical response patterns as the number of items is reduced.  Since CAT usually administers items that the respondent has a 50% probability of endorsing, either a “yes” or a “no” response to a dichotomous question is equally likely, and therefore, consistent with the Rasch model.

Randomness by Number of Items Number of Items Randomness Categories < > %58.2%18.2% %55.6%16.2% 835.2%52.8%12.0% 451.1%44.0%4.9%

Atypicalness by Number of Items Number of Items Atypicalness Categories < > %48.1%21.7% %51.1%14.6% 838.4%53.2%8.4% 458.2%40.0%1.8%

Next Steps: Alternatives to Infit and Outfit  Several measures/procedures for detecting misfit have been developed, specifically for use with short tests and/or CAT. These include: Adjustment of critical values for fit statistics Statistical process control procedures Modified t, modified H and modified Z statistics (Dimitrov and Smith, 2006).

Potential of CAT in Clinical Practice  Reduce respondent burden  Reduce staff resources  Reduce data fragmentation  Streamline complex assessment procedures  Assist in clinical decision making  Identify persons with atypical profiles

Future Research  How do we put it all together?  Much of the research in the area of CAT has used computer simulation. There is a need to test working CAT systems in clinical practice.

Contact Information  A copy of this presentation will be at:  For information on this method and a paper on it, please contact Barth Riley at  For information on the GAIN, please contact Michael Dennis at or see