1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.

Slides:



Advertisements
Similar presentations
Implications and Extensions of Rasch Measurement.
Advertisements

Standardized Scales.
The GAIN-Q (GQ): Development and Validation of a Substance Abuse and Mental Health Brief Assessment Janet C. Titus, Ph.D. Michael L. Dennis, Ph.D. Lighthouse.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
CAT Item Selection and Person Fit: Predictive Efficiency and Detection of Atypical Symptom Profiles Barth B. Riley, Ph.D., Michael L. Dennis, Ph.D., Kendon.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Part II Sigma Freud & Descriptive Statistics
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
How does PROMIS compare to other HRQOL measures? Ron D. Hays, Ph.D. UCLA, Los Angeles, CA Symposium 1077: Measurement Tools to Enhance Health- Related.
PROMIS DEVELOPMENT METHODS, ANALYSES AND APPLICATIONS Presented at the Patient-Reported Outcomes Measurement Information System (PROMIS): A Resource for.
PROMIS: The Right Place at the Right Time? David Cella, Ph.D. Department of Medical Social Sciences Northwestern University Chair, PROMIS Steering Committee.
Results PASAT Mood Manipulation PANAS Outcomes. Results of the ANCOVA with PANAS as the dependent variable revealed a significant main effect for mood.
Effect Size and Meta-Analysis
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
RESEARCH METHODS Lecture 18
Critical Thinking.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
+ A New Stopping Rule for Computerized Adaptive Testing.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
1 MMPI-2 William P. Wattles, Ph.D. Francis Marion University.
Computerized Adaptive Testing in Clinical Substance Abuse Practice: Issues and Strategies Barth Riley Lighthouse Institute, Chestnut Health Systems.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Study announcement if you are interested!. Questions  Is there one type of mixed design that is more common than the other types?  Even though there.
Studying treatment of suicidal ideation & attempts: Designs, Statistical Analysis, and Methodological Considerations Jill M. Harkavy-Friedman, Ph.D.
Cross-Validation and Integration of Four Mental Health Screeners with Item Response Theory Barth B. Riley \1, Brian Rush \2, Saulo Castel \2, Bruna Brands.
Kendon ConradBarth Riley University of Illinois at Chicago Michael L. Dennis Chestnut Health Systems.
Measurement and Data Quality
Chapter 4 Research Methods
Global Appraisal of Individual Needs The Global Appraisal of Individual Needs (GAIN) is a progressive and integrated family of instruments for:  initial.
The Learning Behaviors Scale
Chapter 11 Research Methods in Behavior Modification.
Chapter Fifteen Sampling and Sample Size. Sampling A sample represents a microcosm of the population you wish to study If the sample is representative.
Student Engagement Survey Results and Analysis June 2011.
Presented By: Trish Gann, LPC
Chapter 10 Counseling At Risk Children and Adolescents.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
SS440 Seminar: Unit 4 Research in Psychopathology Dr. Angie Whalen Kaplan University 1.
1 National Outcomes and Casemix Collection Training Workshop Adult Ambulatory.
The Kansas Communities That Care Survey Survey Development.
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Introduction Introduction Alcohol Abuse Characteristics Results and Conclusions Results and Conclusions Analyses comparing primary substance of abuse indicated.
Chapter 2 ~~~~~ Standardized Assessment: Types, Scores, Reporting.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
Spring 2015 Kyle Stephenson
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
Vignette 1 Results Overview Inter-rater reliability on the sub-items was fair-good Inter-rater reliability on the major items was good.
TOMS/NOMS FY12- FY14 Adult Survey Analysis: Does treatment lead to changes over time? 2/16/2016 Prepared by: Abigail Howard, Ph.D.
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
Test-Retest Reliability of the Work Disability Functional Assessment Battery (WD-FAB) Dr. Leighton Chan, MD, MPH Chief, Rehabilitation Medicine Department.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
BROOKHAVEN HOSPITAL’S
Qualities of a good data gathering procedures
Presentation transcript:

1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing

2 Evidence-Based Practice Requires accurate diagnosis, treatment placement, and outcomes monitoring Requires accurate diagnosis, treatment placement, and outcomes monitoring Assessment over a wide range of domains Assessment over a wide range of domains The cost of evidence-based assessment is: The cost of evidence-based assessment is: –Time –Respondent Burden –Increased staff resources (including training

3 Improving Efficiency The use of screeners and short-form instruments has significantly improved the efficiency of the assessment process The use of screeners and short-form instruments has significantly improved the efficiency of the assessment process –Can help determine whether a full assessment is warranted –But not a substitute for a full assessment Lack of precision Lack of precision Floor and ceiling effects Floor and ceiling effects Limited content validity Limited content validity

4 Computerized Adaptive Testing Selects items from a large bank of items based on the responses made to previous items. Selects items from a large bank of items based on the responses made to previous items. Continues to select and administer items until sufficient measurement precision is obtained. Continues to select and administer items until sufficient measurement precision is obtained. Combines the precision and comprehensiveness of a full assessment with the efficiency of a screener. Combines the precision and comprehensiveness of a full assessment with the efficiency of a screener.

5 CAT Process Decreased Difficulty Typical Pattern of Responses Increased Difficulty Middle Difficulty Score is calculated and the next best item is selected based on item difficulty +/- 1 Std. Error CorrectIncorrect

6 CAT in Clinical Assessment

7 CAT in Clinical Assessment: Issues  Triage of individuals to support clinical decision making  Measurement of multiple clinical dimensions and subdimensions  Persons with atypical presentation of symptoms  Generalizability of assessment to various groups

8 Clinical Decision Making How severe are the symptoms? How severe are the symptoms? What type of treatment is most appropriate? What type of treatment is most appropriate? Can CAT be used to answer these questions more efficiently? Can CAT be used to answer these questions more efficiently?

9 Strategy Use CAT to place persons into low, moderate and high levels of substance abuse and dependency. Use CAT to place persons into low, moderate and high levels of substance abuse and dependency. Starting Rules Starting Rules –Using screener measures to set the initial measure and select the first item Variable Stop Rules Variable Stop Rules –Tight precision around cut points –Less precision away from cut points

10 CAT Standard Error Middle range where decisions and made and precision is controlled High & Low ranges where there is little impact on clinical decisions and precision is allowed to vary more

11 Results CAT to full-measure correlations ranged from.87 to.99 CAT to full-measure correlations ranged from.87 to.99 Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from.66 to.71. Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from.66 to.71. Screener starting rule improved CAT efficiency by 7 percent Screener starting rule improved CAT efficiency by 7 percent Variable stop rules improved efficiency by Variable stop rules improved efficiency by 15-38

12 Measuring Multiple Dimensions

13 Assessment on Multiple Dimensions Instruments often measure multiple domains Instruments often measure multiple domains In CAT, treating a multi-domain measure as measuring one domain is problematic: In CAT, treating a multi-domain measure as measuring one domain is problematic: –Some subdimensions may not be adequately measured

14 Strategy: Content Balancing Set an item “quota” for each subscale Set an item “quota” for each subscale –Maximum number of subscale items to administer during the CAT An item is selected if: An item is selected if: –Its subscale quota has not been met –Provides maximum information

15 Content Balancing Procedures MethodScreener Content Balanced NoneNoNo ScreenerYesNo MixedYesYes FullNoYes

16 Percentage of Items Administered by Subscale IMDS Scale N ItemsNoneScreenerMixedFull Depression ≥ ≥ Homicidal/ Suicidal ≥ ≥ Anxiety ≥ 1100 ≥ 3100 Trauma ≥ 1100 ≥ 3100

17 Cont. Balancing: CAT to Full IMDS Correlations IMDS ScalesNoneScreenerMixedFull IMDS Depression Homicidal/Suicidal Anxiety Trauma0.97 Average r

18 Identifying Persons with Atypical Presentation of Symptoms

19 Overview Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments. –Statistics that can detect atypical presentation of symptoms have important clinical implications. Strategy: Identify fit statistics sensitive to atypical presentation in a CAT context Strategy: Identify fit statistics sensitive to atypical presentation in a CAT context

20 Rasch Fit Statistics Fit statistics are used to test particular hypotheses. Fit statistics are used to test particular hypotheses. Atypicalness: Used to detect unexpected outlying, off-target responses. Outlier sensitive Atypicalness: Used to detect unexpected outlying, off-target responses. Outlier sensitive –Example: A person with a high level on the measured trait misses an easy item. Randomness: Used to detect unexpected inlying, targeted responses. Randomness: Used to detect unexpected inlying, targeted responses. Both infit and outfit are chi-square statistics. An infit or outfit value of 1.0 indicates perfect fit to the Rasch model. Both infit and outfit are chi-square statistics. An infit or outfit value of 1.0 indicates perfect fit to the Rasch model.

21 Problems with Fit Responses by Severity Low High RandomnessAtypicalness

22 Clinical Implications of Misfit Our analyses indicate that there are subgroups who endorse severe symptoms without endorsement of milder symptoms. Our analyses indicate that there are subgroups who endorse severe symptoms without endorsement of milder symptoms. Examples: Examples: –Atypical suicide –Substance use withdrawal without dependence

23 Atypicalness by Number of Items Number of Items Atypicalness Categories Uber Typical TypicalAtypical

24 Content Balancing and Atypicalness AtypicalnessCategory NoneScreenerMixedFull FullIMDS Proto Typical Typical Atypical Kappa

25 Future Research Identify alternative fit statistics that are more sensitive to atypical presentation of symptoms Identify alternative fit statistics that are more sensitive to atypical presentation of symptoms Determine when it is likely that someone may be present with atypical symptoms, and if so, select items to confirm atypicalness. Determine when it is likely that someone may be present with atypical symptoms, and if so, select items to confirm atypicalness.

26 Generalizability of CAT to Various Groups

27 Overview Persons at the same severity level may differ in their endorsement of specific items. Persons at the same severity level may differ in their endorsement of specific items. This is called differential item functioning (DIF) This is called differential item functioning (DIF) On the GAIN, DIF has been detected by: On the GAIN, DIF has been detected by: –Age (adolescent vs. adult) –Gender –Ethnicity/Race –Drug of choice

28 DIF By GAIN Scale ScaleTotalAgeGenderRace Prim. Drug Internal Mental Distress Crime & Violence Behavioral Complexity Substance Problems 16859

29 DIF and CAT The presence of DIF can limit our ability to generalize measurement findings across different groups. The presence of DIF can limit our ability to generalize measurement findings across different groups. Controlling for DIF becomes complicated as the number of DIF items and groups/factors increases. Controlling for DIF becomes complicated as the number of DIF items and groups/factors increases. Currently exploring a number of methods for controlling DIF in CAT. Currently exploring a number of methods for controlling DIF in CAT.

30 Potential of CAT in Clinical Practice Reduce respondent burden Reduce respondent burden Reduce staff resources Reduce staff resources Reduce data fragmentation Reduce data fragmentation Streamline complex assessment procedures Streamline complex assessment procedures Assist in clinical decision making Assist in clinical decision making Identify persons with atypical profiles Identify persons with atypical profiles Improve measurement generalizability Improve measurement generalizability

31 Future Research How do we put it all together? How do we put it all together? Much of the research in the area of CAT has used computer simulation. There is a need to test working CAT systems in clinical practice. Much of the research in the area of CAT has used computer simulation. There is a need to test working CAT systems in clinical practice.

32 Contact Information A copy of this presentation will be at: A copy of this presentation will be at: For more information, please contact Barth Riley at For more information, please contact Barth Riley at