Download presentation
Presentation is loading. Please wait.
Published byAvice Mitchell Modified over 9 years ago
1
Measurement Challenges in Addiction Treatment Research Michael L. Dennis, Ph.D. Chestnut Health Systems, Bloomington, IL Presentation at the International Conference on Outcome Measurement, September 11, 2008, Bethesda, MD. This presentation supported by National Institute on Drug Abuse (NIDA) grant no R37 DA11323 and Center for Substance Abuse Treatment (CSAT), Substance Abuse and Mental Health Services Administration (SAMHSA) contract 270-07-019. The opinions are those of the author and do not reflect official positions of the consortium or government. Available on line at www.chestnut.org/LI/Posters or by contacting Joan Unsicker at 720 West Chestnut, Bloomington, IL 61701, phone: (309) 827- 6026, fax: (309) 829-4661, e-Mail: junsicker@Chestnut.Org
2
Objectives are to... Examine why more traditional clinical trials type researchers need to care about measurement Provide explicit practical examples of how addressing measurement in Addiction Research can help improve it
3
Since the early 1960s, Jacob Cohen and colleagues has suggest that clinical trials research should: Focus on Statistical power, which is - the probability of finding what you are looking for given that it is there Combine data from multiple clinical trials into meta analyses, which can be used as - a more stable estimate of truth - to evaluate the accuracy of our early estimates and how methods can be improved
4
In a review of over 200 meta analyses of medical, social and legal studies published between 1960-1990, Lipsey consistently found Less than a third of the individual articles coded even mentioned - the statistical power of their core contrast - reliability, validity, or sensitivity of their outcome measure That relative to final effect size estimated from the meta analysis, the studies averaged less than 50% power - in other words, it was more accurate to flip a coin than to use a statistical test the way they were being used “on average” in the published literature
5
Movement to Improve the Methodological Quality of Clinical Trials Research In 1993 a group of 30 experts (medical journal editors, clinical trialists, epidemiologists, and methodologists) met in Ottawa to try to identify methodological gaps in the literature In 1996 this growing group issued the Consolidated Standards of Reporting Trials (CONSORT; www.consort-statement.org) www.consort-statement.org Since 2000, NIH has required DSMB on all Phase 3 and multi-site phase 2 studies (Notice OD-00-38) – which also push CONSORT Today virtually every major medical, psychiatric, psychological, criminological, and social journal has signed onto CONSORT
6
Basic ways to increase power Increase sample size Increase observations Target a higher severity/less heterogeneous sample Increase implementation Reduce measurement error Reduce unexplained variance (which may be systematic) More accurately model error and unexplained variance in analysis While the most common approach, these are also the most expensive and logistically difficult to do Today’s focus
7
Observed Effect Size as a function of “True” effect size (Cohen’s d) and reliability of dependent variable No Measurement Error “Observed” Effect size goes down with lower reliability
8
Sample size required for 80% power as a function of “True” effect size (Cohen’s d) and reliability of dependent variable A reliability of.7 doubles sample size requirements Increasing reliability from.4 to.7 cuts sample size requirements by over 50%
9
-0.39 -0.25 -0.24 -0.10 -0.04 -0.03 -0.6-0.4-0.20 Proportion of Inconsistencies (100%)* Duration (in Minutes)* Denial/Misrepresentation (Staff Rating)* Context Effect (Staff Report) Proportion of Missing Data (100%) Atypicalness (Outfit in Logits) Randomness (Infit in Logits) <- Cohen's d a \a Cohen's d (Post Certification - Pre Certification)/Pooled STD * p<.05 Impact of Comprehensive Data Collection Protocol Certification on Measurement Issues Source: GAIN coordinating center
10
Major improvement over the first 15 interviews Most improvements have occurred by 60 interviews Source: GAIN coordinating center Staff Experience Matters as well
11
Key Advantages of Creating Scales and Indices for Clinical Research One of the lowest cost ways to reduce measurement error and increase statistical power Reduce clinical omissions and backtracking for validity checks Increase conceptual robustness, interpretability and make it easier to explain to others Facilitates profiling over a large number of items
12
Impact of Number of Items on Reliability (Alpha) Observed by Average Inter-item Correlation Generally target.7 to.9 Behavioral Measures (e.g., how many days, times) have high reliability and max out around 3-5 items Covert Scales (e.g., MMPI), summative indices, and other measures with low inter item R may take 30 items (or more) Symptom counts related to a syndrome or latent construct usually max out in 5-13 items
13
Note you can also create a summary measures across different sources of data Source: Lennox et al 2006 (CFI=.98)
14
Formal Measurement Models Can Be Used to Place people along a more reliable/sensitive ruler (aka common or latent factor) Look at the slope/ discrimination of items (primarily 2 parameter IRT) Related items in terms of their average severity Look at the match/mismatch of people and item locations (primarily Rasch / 1 parameter IRT) Study real differences by primary substance, gender, race, age or other groups Identify potential bias at the item and test level by gender, race or other groups Identify atypical patterns of answers (e.g. outfit) Identify random response patterns (e.g., infit) or less valid response patterns Replace missing data (whether small amounts or do to computer adaptive testing
15
Impact of Item Discrimination (aka steepness of slope) on Sample Size Requirements 16-36% reduction in sample size IRT is generally more efficient if the items have low or varied discrimination Rasch is generally more efficient if the items have high discrimination
16
Raw v Rasch v IRT Scales (my take) Raw, Rasch and IRT scales generally correlated over.95 and vary by less than 5% in sample size requirements/power Raw scales are the easiest to calculate (even by hand) and get most of the benefit. On the down side items are not equal, rarely helps you build theory, and require separate approaches to handle missing data Rasch scales focus on high discriminate items, fitting the data to a common measurement model that is very efficient when comparing items and people and theories. On the down side they assume your focus is on building an interval ruler, that item slopes are similar and that you want to compare subgroups of people with each other or over time IRT scales focus on fitting the measurement model to the data (opposite of Rasch), explaining additional variance by adding parameters for slope and guessing, and are particularly useful when you have a preexisting items with a wide range of discrimination. On the down side they are more difficult to calculate, require multiple iterations and larger sample sizes.
17
Structure of GAIN’s Psychopathology Measures and Validity Checks Example of how scales can also be inter-related and used for validation Higher scores associated with alcohol and drug abuse medication (methadone, naltrexone, antaabuse, buprenorphine) and/or substance induced legal, mental health, physical health, and withdrawal problems Higher scores associated with greater dysfunction (e.g., dropping out of school, unemployment, financial problems, homelessness) Higher scores associated with mental health treatment (e.g., anti depressants, seritonin reuptake inhibitors (SSRI), monoamine oxidase inhibitors (MAOI) sedatives) and/or a history of traumatic victimization, and/or high levels of stress Higher scores associated with mental health treatment (e.g., Ritalin, Adderall, lithium), special/alternative education, school or work problems, gambling and other evidence of impulse control problems, and/or anti-social/borderline personality disorders Higher scores associated with arrests, detention/jail time, probation, parole, size of drug habit
18
Internalizing Disorder Subscale Item Calibrations when Considering Diagnoses Separately -3 -2 0 1 2 3 Logits DepressionAnxietyTraumaSuicidal Increasing Severity Most common has narrow range of variation Small to major gaps in measure
19
-3 -2 0 1 2 3 Logits Suicidal Trauma Anxiety Depression Somatic Increasing Severity Internalizing Disorder Subscale Item Calibrations Considered as a Second Order Factor
20
On-Going Debates About SUD Concept Formal assumption that symptoms of “physiological dependence” (either tolerance or withdrawal) are markers of high severity Debate about whether “abuse” symptoms should be dropped, thought of as early dependence, or thought of as moderate/high severity markers that warrant treatment even in the absence of a full syndrome Debate about whether to treat diagnostic orphans (1-2 symptoms of dependence) as abuse or continue to ignore them Concern about whether the current symptoms (which were based primarily on adult data) are appropriate for use with adolescents Concern about the sensitivity to change
21
Sample Characteristics Adolescents: <18 (n=2474) Young Adult: 18-25 (n=344) Adults: 26+ (n=661) Male 74%58%47% Caucasian 48%54%29% African American 18%27%63% Hispanic 12%7%2% Average Age 15.620.237.3 Substance Disorder 85%82%90% Internal Disorder 53%62%67% External Disorder 63%45%37% Crime/Violence 64%51%34% Residential Tx 31%56%74% Current CJ/JJ invol. 69%74%45% Note: all significant, p <.01
22
Item Relationships Across Substances Rasch Severity Measure Desp.PH/MH (+0.10) Give up act. (+0.05) Can't stop (+0.05) Time Cons. (-0.21) Loss of Contro (-0.10) Hazardous (-0.03) Despite Legal (+0.10) Role Failure (-0.12) Fights/troub. (0.17) -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 Time Cons Role Failure Fights/troub. Loss of Control Hazardous Tolerance Can't stop Give up act. Desp.PH/MH Despite Legal Withdrawal Tolerance (0.00) Withdrawal (+0.34) Physiological Sx: While Withdrawal is High severity, Tolerance is only Moderate Dependence Sx: Other dependence Symptoms spread over continuum Abuse Sx: Abuse Symptoms are also spread over continuum 1 st dimension explains 75% of variance (2 nd explains 1.2%) Average Item Severity (0.00)
23
Symptom Severity Varied by Drug Easier to endorse hazardous use for ALC/CAN Rasch Severity Measure ALC AMP CAN COC OPI ALC AMP CAN COC OPI -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 Time Cons. Role Failure Fights/troub. Loss of Control Hazardous Tolerance Can't stop Give up act. Desp.PH/MHDespite Legal Withdrawal AVG (0.00) ALC (-0.44) AMP (+0.89) CAN (-0.67) COC (-0.22) OPI (+0.44) Easier to endorse fighting/ trouble for ALC/CAN Easier to endorse time consuming for CAN Easier to endorse moderate Sx for COC/OPI Easier to endorse despite legal problem for ALC/CAN Easier to endorse Withdrawal for AMP/OPI Withdrawal much less likely for CAN
24
Symptom Severity Varied Even More By Age Rasch Severity Measure <18 18- 25 18- 25 18- 25 18- 25 18- 25 18- 25 26+ <18 18- 25 18- 25 18- 25 18- 25 18- 25 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Time Cons. Role Failure Fights/troub. Loss of Control Hazardous Tolerance Can't stop Give up act. Desp.PH/MH Despite Legal Withdrawal <18 18-25 26+ Age Adults more likely to endorse most symptoms More likely to lead to fights among Adol/YA Hazardous use more likely among Adol/YA Continued use in spite of legal problems more likely among Adol/YA
25
Rasch Severity by Past Month Status -3.50 -3.00 -2.50 -2.00 -1.50 -0.50 0.00 0.50 1.00 1.50 2.00 NoneDiagnostic Orphan in early remission Diagnostic Orphan Lifetime SUD in early remission Lifetime SUD in CE 45+ days Abuse Only Dependence Only Both Abuse and Dependence Rasch Severity Measure Diagnostic Orphans (1-2 dependence symptoms) are lower, but still overlap with other clinical groups
26
Severity by Past Year Symptom Count Rasch Severity Measure 1. Better Gradation 2. Still a lot of overlap in range
27
Severity by Weighted (past month=2, past year=1) Number of Substance x SUD Symptoms Rasch Severity Measure -4.00 -3.50 -3.00 -2.50 -2.00 -1.50 -0.50 0.00 0.50 1.00 1.50 2.00 01-45-89-1213-1617-2021-2425-3031-4041+ 1. Better Gradation 2. Much less overlap in range
28
Construct Validity (i.e., does it matter?) FrequencyOf Use Past Week WithdrawalEmotionalProblemsRecovery Environment Social Risk DSM diagnosis \a 0.470.400.320.30 Symptom Count Continuous \b 0.480.430.390.320.31 \a Categorized as Past year physiology dependence, non-physiological dependence, abuse, other \b Raw past year symptom count (0-11) \c Symptoms weighted by recency (2=past month, 1=2-12 months ago, 0=other) Past year Symptom count did better than DSM Weighted Symptom Rasch \c 0.570.460.39 0.32 Rasch does a little Better still
29
Implications for SUD Concept “Tolerance” is not a good marker of high severity; withdrawal (and substance induced health problems are) “Abuse” symptoms are consistent with the overall syndrome and represent moderate severity or “other reasons to treat in the absence of the full blown syndrome” Diagnostic orphans are lower severity, but relevant Pattern of symptoms varies by substance and age, but all symptoms are relevant “Adolescents” experienced the same range of symptoms, though they (and young adults) were particularly more likely to be involved with the law, use in hazardous situations, and to get into fights at lower severity Symptom Counts appear to be more useful than the current DSM approach to categorizing severity While weighting by recency & drug delineated severity, it did not improve construct validity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.