Download presentation
Presentation is loading. Please wait.
Published byMarybeth Cox Modified over 9 years ago
1
Division of Geriatrics Using Secondary Data Analysis for Outcomes Research Epi 211 April 2010 Michael Steinman, MD
2
Division of Geriatrics Disclosures: None Acknowledgements: J. Michael McWilliams Ann Nattinger SGIM Research Committee Disclosures and acknowledgements
3
Division of Geriatrics Question: You are a fellow / junior faculty member interested in studying... –Impact of nurse-led HTN clinics on clinical outcomes in patients with HTN –Impact of implementing EMRs on appropriate prescribing in ambulatory surgical patients –Whether quality measures of asthma control in children correlate with actual clinical outcomes in this population
4
Division of Geriatrics Question: Here’s your choice: –A. Get a multimillion dollar grant to conduct a multi-center, multi-year RCT –B. Analyze existing data
5
Division of Geriatrics Appreciate key conceptual and methodologic issues involved in outcomes research employing secondary data analysis Identify and use online tools for locating and learning about datasets relevant to your research Understand the range of resources and support required to successfully complete a secondary data analysis Learning objectives
6
Division of Geriatrics Working with secondary data –Conceptual and methodologic issues Overview of high-value datasets and web-based resources Planning your dataset project –Practical advice Q&A Overview
7
Division of Geriatrics Working with Secondary Data
8
Division of Geriatrics Data that have been collected but not for you (My) Definition of Secondary Data
9
Division of Geriatrics Survey Administrative (claims) Discharge Medical chart / EMR Disease registries Aggregate (ARF, US Census) Combinations and linkages Types of Secondary Data
10
Division of Geriatrics Anything but randomized trials By discipline: Outcomes research Epidemiology Health services research By question: Descriptive Comparative Causal What kinds of research can be conducted with secondary data?
11
Division of Geriatrics Which comes first: question or dataset? a.Research question first b.Dataset first c.Trick question: both Hybrid approach 1.Identify research focus, broad question 2.Consider candidate datasets 3.Hone question 4.Iterate between 2 and 3 Conceiving a Project
12
Division of Geriatrics F easible—data, variables, & resources accessible & available Interesting—to researcher and audience Novel—extends what is already known Ethical—upholds standards Relevant—to patient care, clinical outcomes, policy, etc. What makes a good research question? (FINER) Cummings et al. Conceiving the research question. In: Hulley SB, Cummings SR, Browner WS, et al, eds. Designing clinical research, 3rd ed. Philadelphia: Lippincott Williams & Wilkins, 2007:17-26
13
Division of Geriatrics Compatibility with research question(s) Availability and expense Sample: representativeness, power Measures of interest present and valid Messiness and missingness Local expertise Linkages Selecting a Database
14
Division of Geriatrics Key elements for outcomes research Measure of intervention Measure of outcome(s) –Intermediate outcomes % of patients receiving treatment, measures of glycemic or lipid control (A1c, serum LDL) –Clinical outcomes Death, hospitalization, satisfaction, etc. Measures of important confounders
15
Division of Geriatrics They are not primary data! Efficiency: fast and cheap No regrets Scale and scope Size and detail not otherwise feasible for individual research team Generalizable Novel and creative research questions Often easier IRB review process Advantages of Secondary Data
16
Division of Geriatrics 1.Data mining/overfitting When the analysis precedes the question Does urine cortisol predict Catholicism? 2.Causal inference Inherently limited with observational data But does not preclude quasi-experimental designs to recover causal effects Challenges and Pitfalls
17
Division of Geriatrics 3.Validity of measures Beware of assumptions Problems: coding, reporting, recall biases Solutions: direct validation in subgroup or another data source, literature review, sensitivity analyses 4.Complexity of file structure Row in dataset may not be unit of analysis Skip patterns, proxy respondents Challenges and Pitfalls
18
Division of Geriatrics Want to measure time preferences Behavioral economics: people tend to overvalue the present Explanation for unhealthy habits, underuse of cancer screening? Have measures on financial planning horizons Are the two equivalent? Might financial planning also depend on: Income Source of income, employment status Dependents Inheritance What You Want and What You Have
19
Ask: IF ((piRTab1X007AFinFam = FAMILYR) OR (piRTab1X007AFinFam = FINANCIAL_FAMILYR)) AND ((ACTIVELANGUAGE <> EXTENG) AND (ACTIVELANGUAGE <> EXTSPN)) AND (piInitA106_NumContactKids > 0) AND (piInitA100_NumNRKids > 0) JE012 CHILDREN LIVE WITHIN 10 MILES Section: E Level: Household Type: Numeric Width: 1 Decimals: 0 CAI Reference: SecE.KidStatus.E012_ 2000 Link: G19802000 Link: G1980 2002 Link: HE0122002 Link: HE012 IF {R DOES NOT HAVE SPOUSE/PARTNER and DOES NOT STILL HAVE HOME OUTSIDE NURSING HOME {(CS11/A028=1) and (CS26/A070 NOT 1)}} or {R & SPOUSE/PARTNER} LIVE IN SAME NURSING HOME (CS11/A028=1 and CS12/A030=1): [Do any of your children who do not live with you/Does CHILD NAME] live within 10 miles of you (in R's NURSING HOME CITY, STATE (CS25b/A067))? OTHERWISE: [Do any of your children (who do not live with you)/Does CHILD NAME] live within 10 miles of you (in MAIN RESIDENCE [CITY/CITY, STATE STATE])? 6802 1. YES 4720 5. NO 32 8.DK (Don't Know); NA (Not Ascertained) 4 9. RF (Refused) 2087 Blank. INAP (Inapplicable) A Simple Question? * From the Health and Retirement Study
20
Division of Geriatrics 5.Representativeness of Sample External validity (generalizability) Internal validity (selection bias) Example: comparing outcomes for insured and uninsured patients using hospital discharge data Must be hospitalized to enter sample Not only limits generalizability (to outpatients) But inferences about the sample may be wrong –Sample would need to include uninsured who would have been hospitalized if insured Challenges and Pitfalls
21
Division of Geriatrics Sources Non-response: unit and item Variability in data collection (e.g. across states or over time, collected on subset due to expense) Incomplete linkages Language MCAR: M ╨ Y strong assumption, can ignore MAR: M ╨ Y|X weaker assumption, can fix Non-ignorable, informative: M predicts Y can’t fix Statistical Considerations: Missing Data
22
Division of Geriatrics Approaches Listwise deletion, complete case (ok if MCAR) Imputation Mean imputation (biased standard errors) Multiple imputation (MAR) Weighting techniques (MAR) Random effects models (MAR) Statistical Considerations: Missing Data
23
Division of Geriatrics Complex survey designs Example multistage probability sample (NAMCS) : US divided into PSUs (counties / MSAs) sample of PSUs selected within each PSU, stratify MDs by specialty sample of MDs within each stratum quasi-random sample of patients seen by each MD Survey design Clustering: convenience, ↓precision Stratification: ↑precision, ↑representativeness (protects against a bad sample) Oversampling: ↑representation and precision for subgroup of interest Statistical Considerations: Analyzing Survey Data
24
Division of Geriatrics Survey weights: affect point estimates Individuals may have unequal selection probabilities Need to apply weights to recover representativeness W = 1/p(selection) = # people represented W’s reflect sampling design, adjustments to match to census totals, non-response Survey strata, clusters: affect se’s Need variance estimators that account for correlated data Most statistical packages able to handle Statistical Considerations: Analyzing Survey Data
25
Division of Geriatrics Finding the Right Dataset
26
Division of Geriatrics Finding the Right Dataset Contain variables of interest –predictor, outcome, confounders Relevant time frame –Cross-sectional, longitudinal Feasible –Access: time, bureaucracy, cost –Usable No perfect datasets -> hybrid approach of developing research question
27
Division of Geriatrics Administrative Data (VA) VA has multiple high-value administrative databases –Outpatient visit information Visit date, type of clinic, provider, ICD9 diagnoses –Inpatient information Admitting dx(s), discharge dx(s), CPT codes, bed section, meds administered –Lab data >40 labs –Pharmacy data All inpatient and outpatient fills –Academic affiliation –etc
28
Division of Geriatrics Administrative Data (VA) Huge bureaucracy and paperwork
29
Division of Geriatrics Administrative Data (VA) Messy data Huge size –2 TB server Data analyst
30
Division of Geriatrics Survey Data (NHANES) National Health and Nutrition Examination Survey (NHANES) –Nationally representative sample of >10K patients every 2 years –Extensive interview data on clinical history (including diseases, behaviors, psychosocial parameters, etc.) –Physical exam information (e.g. VS) –Labs, biomarkers
31
Division of Geriatrics Survey Data (NHANES) Free and easy to download (Relatively) easy to use –Although requires careful reading of documentation Serial cross-sectional Disease data self-report Very limited information about providers and systems of care
32
Division of Geriatrics Survey Data (NAMCS) National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) Nationally representative sample of ~70K outpatient and ED visits per year Physician-completed form about office visit
34
Division of Geriatrics Survey Data (NAMCS) Data more from physician perspective (diagnoses, treatments Rx’ed, etc) and some info on providers (e.g., clinic organization, use of EMRs, etc) Serial cross-sectional –Visit-focused –Not comprehensive, ? value for chronic diseases
35
Division of Geriatrics Discharge Data (NIS) National Inpatient Sample (NIS) –Database of inpatient hospital stays collected from ~20% of US community hospitals by AHRQ –Diagnoses and procedures, severity adjustment elements, payment source, hospital organizational characteristics –Hospital and county identifiers that allow linkage to the American Hospital Association Annual Survey and Area Resource File
36
Division of Geriatrics Discharge Data (NIS) Relatively easy to access (DUA, $200/yr) Relatively easy to use –Though need close attention to documentation Limited data elements Huge data files
37
Division of Geriatrics Web-Based Resources Society of General Internal Medicine (SGIM) Research Dataset Compendium –www.sgim.org/go/datasetswww.sgim.org/go/datasets UCSF K-12 Data Resource Center –http://www.epibiostat.ucsf.edu/courses/RoadmapK12/ PublicDataSetResources/http://www.epibiostat.ucsf.edu/courses/RoadmapK12/ PublicDataSetResources/ Partners in Information Access for the Public Health Workforce –http://phpartners.org/health_stats.htmlhttp://phpartners.org/health_stats.html
40
Division of Geriatrics Finding Additional Resources National Information Center on Health Services Research and Health Care Technology (NICHSR) Inter-University Consortium for Political and Social Research (ICPSR) Partners in Information Access for the Public Health Workforce Roadmap K-12 Data Resource Center (UCSF) List of datasets from the American Sociologic Association Canadian Research Data Centers – Data Sets and Research Tools (Canada) Directory of Health and Human Services Data Resources Publicly Available Databases from National Institute on Aging (NIA) Publicly Available Databases from National Heart, Lung, & Blood Institute (NHLBI) National Center for Health Statistics (NCHS) Data Warehouse Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems Veterans Affairs (VA) data (all available at www.sgim.org/go/datasets)www.sgim.org/go/datasets
41
Division of Geriatrics National Information Center on Health Services Research and Health Care Technology (NICHSR) Databases, data repositories, health statistics Fellowship and funding opportunities Glossaries, research and clinical guidelines Evidence-based practice and health technology assessment Specialized PubMed searches on healthcare quality and costs http://www.nlm.nih.gov/hsrinfo/datasites.html
42
Division of Geriatrics Inter-University Consortium for Political and Social Research (ICPSR) World’s largest archive of social science data Searchable Many sub-archives relevant to HSR –Health and Medical Care Archive –National Archive of Computerized Data on Aging http://www.icpsr.umich.edu/icpsrweb/ICPSR/partners/archives.jsp
43
Division of Geriatrics Planning Your Dataset Project
44
Division of Geriatrics Resources Needed Your effort Computer resources and security Programmer and/or statistician effort PhD statistical support – complex sampling or analyses Time timeline
46
Division of Geriatrics IRB Issues Exempt if no identifiers - Still need IRB determination of exempt status Limited data set – DUA - Usually expedited for IRB Full data use agreement -Still may be expedited for IRB -May need to explain lack of consent to funders/IRB
47
Division of Geriatrics 1.Start with a clear research question and hypothesis 2.Get to know your data source: Why does the database exist? Who reports the data? What are the incentives for accurate reporting? How are the data audited, if at all? Can you link the data to other large databases? 3.Get good documentation of the cohort, variables, and data layout, then read the fine print 4.Consult or collaborate with researchers who have used the database Summary: 10 Tips for Success in Secondary Data Analyses Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health
48
Division of Geriatrics 5.Line up computing resources before data arrive 6.Allow time to receive data if not publicly available 7.Learn SAS, Stata, or other statistical software so you can analyze data yourself (or collaborate) 8.Assess data quality (e.g., outliers & missing data) with plots or frequency tables 9.Consult or collaborate with a statistician on your analysis plan, especially for complex surveys with sampling weights 10.Use clinical intuition to interpret results and consult experts as needed Summary: 10 Tips for Success in Secondary Data Analyses Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.