Division of Geriatrics Using Secondary Data Analysis for Outcomes Research Epi 211 April 2010 Michael Steinman, MD.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

The Rhode Island Chronic Care Sustainability Initiative: Building a Patient-Centered Medical Home Pilot in Rhode Island.
Department of Vermont Health Access Vermont Blueprint for Health: Using APCD to Evaluate Health Care Reform Pat Jones, MS Blueprint Assistant Director.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Lesson Designing Samples. Knowledge Objectives Define population and sample. Explain how sampling differs from a census. Explain what is meant by.
Urologic Diseases in America Available Datasets. Urologic Diseases in America Mission: 1. Define the burden of illness posed on the nation by the major.
NCHS Data – Strengths and Weaknesses from the NHLBI Perspective Paul Sorlie, Ph.D. Chief, Epidemiology Branch National Heart, Lung, and Blood Institute.
Division of Geriatrics Finding and Using Publicly Available Datasets for Secondary Data Analysis Research KL2 Seminar February 2011.
Introduction to Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 1-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Clustered or Multilevel Data
Chapter 1 The Where, Why, and How of Data Collection
Chapter 1 The Where, Why, and How of Data Collection
David Kilgour Statistics David Kilgour Statistics.
Patient Characteristics and the Use of Health Care Services by Persons with HIV Esther Hing and Christine Lucas, Ambulatory and Hospital Care Statistics.
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Part 3 of 3 By: Danielle Davidov, PhD & Steve Davis, MSW, MPA INTRODUCTION TO RESEARCH: SAMPLING & DESIGN.
Understanding and Using NAMCS and NHAMCS Data Data Tools and Basic Programming Techniques 2010 National Conference on Health Statistics August 16, 2010.
CAHPS Overview Clinician & Group Surveys: Practical Options for Implementation and Use AHRQ ANNUAL MEETING SEPTEMBER 18, 2011 Christine Crofton, PhD CAHPS.
1 Understanding and Using NAMCS and NHAMCS Data: A Hands-On Workshop Susan M. Schappert Donald K. Cherry.
Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population vs. Sample.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 1-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 Lecture 20: Non-experimental studies of interventions Describe the levels of evaluation (structure, process, outcome) and give examples of measures of.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
DESIGN FEATURES OF NCHS SURVEYS By Iris Shimizu Mathematical Statistician Office of Research and Methodology, NCHS Disclaimer: The opinions in this presentation.
1 Data Revolution: National Survey of Child and Adolescent Well-Being (NSCAW) John Landsverk, Ph.D. Child & Adolescent Services Research Center Children’s.
TRANSLATING VISITS INTO PATIENTS USING AMBULATORY VISIT DATA (Hypertensive patient case study) by Esther Hing, M.P.H. and Julia Holmes, Ph.D U.S. DEPARTMENT.
Chapter 1 Introduction and Data Collection
Division of Geriatrics Using Secondary Data Analysis for Outcomes Research Epi 211 April 2011 Michael Steinman, MD.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
1 Estimating non-VA Health Care Costs Todd H. Wagner.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Evidence-Based Public Health Nancy Allee, MLS, MPH University of Michigan November 6, 2004.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
HOW TO WRITE RESEARCH PROPOSAL BY DR. NIK MAHERAN NIK MUHAMMAD.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
EMR use is not associated with better diabetes care Patrick J. O’Connor, MD, MPH, A. Lauren Crain, PhD, Leif I. Solberg, MD, Stephen E. Asche, MA, William.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.
Introduction Biostatistics Analysis: Lecture 1 Definitions and Data Collection.
Outpatient Services and Primary Health Care Heidi Kinsell Master of Health Administration (MHA) Health Services Research, Management and Policy 1.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
1 Using National Hospital Ambulatory Medical Care Survey (NHAMCS) data for injury analysis Linda McCaig Ambulatory Care Statistics Branch Division of Health.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
CAHPS PATIENT EXPERIENCE SURVEYS AHRQ ANNUAL MEETING SEPTEMBER 2012 Christine Crofton, PhD CAHPS Project Officer.
A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. Chap 1-1 A Course In Business Statistics 4 th Edition Chapter 1 The Where, Why, and How.
Regulations 201: Thorny Issues What is Research? Exempt and Expedited Reviews.
Chap 1-1 Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population.
CASE STUDY: NATIONAL SURVEY OF FAMILY GROWTH Karen E. Davis National Center for Health Statistics Coordinating Center for Health Information and Service.
Impact of Perceived Discrimination on Use of Preventive Health Services Amal Trivedi, M.D., M.P.H. John Z. Ayanian, M.D., M.P.P. Harvard Medical School/Brigham.
Odds Ratio& Bias in case-control studies
Overview of National Center for Health Statistics (NCHS) Data Systems Mary Burgess
Saving Time, Money, and Work: How to Do Secondary Data Analysis Vijay Singh, MD, MPH, MS, University of Michigan Arch Mainous III, PhD, Medical University.
Uses of the NIH Collaboratory Distributed Research Network Jeffrey Brown, PhD for the DRN Team Harvard Pilgrim Health Care Institute and Harvard Medical.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 1-1 Descriptive statistics Collecting, presenting, and describing data.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Clinical Medical Assisting
Using Secondary Datasets
Introduction to Survey Data Analysis
Chapter 1 The Where, Why, and How of Data Collection
The Where, Why, and How of Data Collection
Using Large Databases for Research
Chapter 1 The Where, Why, and How of Data Collection
Presentation transcript:

Division of Geriatrics Using Secondary Data Analysis for Outcomes Research Epi 211 April 2010 Michael Steinman, MD

Division of Geriatrics Disclosures: None Acknowledgements: J. Michael McWilliams Ann Nattinger SGIM Research Committee Disclosures and acknowledgements

Division of Geriatrics Question: You are a fellow / junior faculty member interested in studying... –Impact of nurse-led HTN clinics on clinical outcomes in patients with HTN –Impact of implementing EMRs on appropriate prescribing in ambulatory surgical patients –Whether quality measures of asthma control in children correlate with actual clinical outcomes in this population

Division of Geriatrics Question: Here’s your choice: –A. Get a multimillion dollar grant to conduct a multi-center, multi-year RCT –B. Analyze existing data

Division of Geriatrics Appreciate key conceptual and methodologic issues involved in outcomes research employing secondary data analysis Identify and use online tools for locating and learning about datasets relevant to your research Understand the range of resources and support required to successfully complete a secondary data analysis Learning objectives

Division of Geriatrics Working with secondary data –Conceptual and methodologic issues Overview of high-value datasets and web-based resources Planning your dataset project –Practical advice Q&A Overview

Division of Geriatrics Working with Secondary Data

Division of Geriatrics Data that have been collected but not for you (My) Definition of Secondary Data

Division of Geriatrics Survey Administrative (claims) Discharge Medical chart / EMR Disease registries Aggregate (ARF, US Census) Combinations and linkages Types of Secondary Data

Division of Geriatrics Anything but randomized trials By discipline: Outcomes research Epidemiology Health services research By question: Descriptive Comparative Causal What kinds of research can be conducted with secondary data?

Division of Geriatrics Which comes first: question or dataset? a.Research question first b.Dataset first c.Trick question: both Hybrid approach 1.Identify research focus, broad question 2.Consider candidate datasets 3.Hone question 4.Iterate between 2 and 3 Conceiving a Project

Division of Geriatrics F easible—data, variables, & resources accessible & available Interesting—to researcher and audience Novel—extends what is already known Ethical—upholds standards Relevant—to patient care, clinical outcomes, policy, etc. What makes a good research question? (FINER) Cummings et al. Conceiving the research question. In: Hulley SB, Cummings SR, Browner WS, et al, eds. Designing clinical research, 3rd ed. Philadelphia: Lippincott Williams & Wilkins, 2007:17-26

Division of Geriatrics Compatibility with research question(s) Availability and expense Sample: representativeness, power Measures of interest present and valid Messiness and missingness Local expertise Linkages Selecting a Database

Division of Geriatrics Key elements for outcomes research Measure of intervention Measure of outcome(s) –Intermediate outcomes % of patients receiving treatment, measures of glycemic or lipid control (A1c, serum LDL) –Clinical outcomes Death, hospitalization, satisfaction, etc. Measures of important confounders

Division of Geriatrics They are not primary data! Efficiency: fast and cheap No regrets Scale and scope Size and detail not otherwise feasible for individual research team Generalizable Novel and creative research questions Often easier IRB review process Advantages of Secondary Data

Division of Geriatrics 1.Data mining/overfitting When the analysis precedes the question Does urine cortisol predict Catholicism? 2.Causal inference Inherently limited with observational data But does not preclude quasi-experimental designs to recover causal effects Challenges and Pitfalls

Division of Geriatrics 3.Validity of measures Beware of assumptions Problems: coding, reporting, recall biases Solutions: direct validation in subgroup or another data source, literature review, sensitivity analyses 4.Complexity of file structure Row in dataset may not be unit of analysis Skip patterns, proxy respondents Challenges and Pitfalls

Division of Geriatrics Want to measure time preferences Behavioral economics: people tend to overvalue the present Explanation for unhealthy habits, underuse of cancer screening? Have measures on financial planning horizons Are the two equivalent? Might financial planning also depend on: Income Source of income, employment status Dependents Inheritance What You Want and What You Have

Ask: IF ((piRTab1X007AFinFam = FAMILYR) OR (piRTab1X007AFinFam = FINANCIAL_FAMILYR)) AND ((ACTIVELANGUAGE <> EXTENG) AND (ACTIVELANGUAGE <> EXTSPN)) AND (piInitA106_NumContactKids > 0) AND (piInitA100_NumNRKids > 0) JE012 CHILDREN LIVE WITHIN 10 MILES Section: E Level: Household Type: Numeric Width: 1 Decimals: 0 CAI Reference: SecE.KidStatus.E012_ 2000 Link: G Link: G Link: HE Link: HE012 IF {R DOES NOT HAVE SPOUSE/PARTNER and DOES NOT STILL HAVE HOME OUTSIDE NURSING HOME {(CS11/A028=1) and (CS26/A070 NOT 1)}} or {R & SPOUSE/PARTNER} LIVE IN SAME NURSING HOME (CS11/A028=1 and CS12/A030=1): [Do any of your children who do not live with you/Does CHILD NAME] live within 10 miles of you (in R's NURSING HOME CITY, STATE (CS25b/A067))? OTHERWISE: [Do any of your children (who do not live with you)/Does CHILD NAME] live within 10 miles of you (in MAIN RESIDENCE [CITY/CITY, STATE STATE])? YES NO 32 8.DK (Don't Know); NA (Not Ascertained) 4 9. RF (Refused) 2087 Blank. INAP (Inapplicable) A Simple Question? * From the Health and Retirement Study

Division of Geriatrics 5.Representativeness of Sample External validity (generalizability) Internal validity (selection bias) Example: comparing outcomes for insured and uninsured patients using hospital discharge data Must be hospitalized to enter sample Not only limits generalizability (to outpatients) But inferences about the sample may be wrong –Sample would need to include uninsured who would have been hospitalized if insured Challenges and Pitfalls

Division of Geriatrics Sources Non-response: unit and item Variability in data collection (e.g. across states or over time, collected on subset due to expense) Incomplete linkages Language MCAR: M ╨ Y  strong assumption, can ignore MAR: M ╨ Y|X  weaker assumption, can fix Non-ignorable, informative: M predicts Y  can’t fix Statistical Considerations: Missing Data

Division of Geriatrics Approaches Listwise deletion, complete case (ok if MCAR) Imputation Mean imputation (biased standard errors) Multiple imputation (MAR) Weighting techniques (MAR) Random effects models (MAR) Statistical Considerations: Missing Data

Division of Geriatrics Complex survey designs Example multistage probability sample (NAMCS) : US divided into PSUs (counties / MSAs)  sample of PSUs selected  within each PSU, stratify MDs by specialty  sample of MDs within each stratum  quasi-random sample of patients seen by each MD Survey design Clustering: convenience, ↓precision Stratification: ↑precision, ↑representativeness (protects against a bad sample) Oversampling: ↑representation and precision for subgroup of interest Statistical Considerations: Analyzing Survey Data

Division of Geriatrics Survey weights: affect point estimates Individuals may have unequal selection probabilities Need to apply weights to recover representativeness W = 1/p(selection) = # people represented W’s reflect sampling design, adjustments to match to census totals, non-response Survey strata, clusters: affect se’s Need variance estimators that account for correlated data Most statistical packages able to handle Statistical Considerations: Analyzing Survey Data

Division of Geriatrics Finding the Right Dataset

Division of Geriatrics Finding the Right Dataset Contain variables of interest –predictor, outcome, confounders Relevant time frame –Cross-sectional, longitudinal Feasible –Access: time, bureaucracy, cost –Usable No perfect datasets -> hybrid approach of developing research question

Division of Geriatrics Administrative Data (VA) VA has multiple high-value administrative databases –Outpatient visit information Visit date, type of clinic, provider, ICD9 diagnoses –Inpatient information Admitting dx(s), discharge dx(s), CPT codes, bed section, meds administered –Lab data >40 labs –Pharmacy data All inpatient and outpatient fills –Academic affiliation –etc

Division of Geriatrics Administrative Data (VA) Huge bureaucracy and paperwork

Division of Geriatrics Administrative Data (VA) Messy data Huge size –2 TB server Data analyst

Division of Geriatrics Survey Data (NHANES) National Health and Nutrition Examination Survey (NHANES) –Nationally representative sample of >10K patients every 2 years –Extensive interview data on clinical history (including diseases, behaviors, psychosocial parameters, etc.) –Physical exam information (e.g. VS) –Labs, biomarkers

Division of Geriatrics Survey Data (NHANES) Free and easy to download (Relatively) easy to use –Although requires careful reading of documentation Serial cross-sectional Disease data self-report Very limited information about providers and systems of care

Division of Geriatrics Survey Data (NAMCS) National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) Nationally representative sample of ~70K outpatient and ED visits per year Physician-completed form about office visit

Division of Geriatrics Survey Data (NAMCS) Data more from physician perspective (diagnoses, treatments Rx’ed, etc) and some info on providers (e.g., clinic organization, use of EMRs, etc) Serial cross-sectional –Visit-focused –Not comprehensive, ? value for chronic diseases

Division of Geriatrics Discharge Data (NIS) National Inpatient Sample (NIS) –Database of inpatient hospital stays collected from ~20% of US community hospitals by AHRQ –Diagnoses and procedures, severity adjustment elements, payment source, hospital organizational characteristics –Hospital and county identifiers that allow linkage to the American Hospital Association Annual Survey and Area Resource File

Division of Geriatrics Discharge Data (NIS) Relatively easy to access (DUA, $200/yr) Relatively easy to use –Though need close attention to documentation Limited data elements Huge data files

Division of Geriatrics Web-Based Resources Society of General Internal Medicine (SGIM) Research Dataset Compendium – UCSF K-12 Data Resource Center – PublicDataSetResources/ PublicDataSetResources/ Partners in Information Access for the Public Health Workforce –

Division of Geriatrics Finding Additional Resources National Information Center on Health Services Research and Health Care Technology (NICHSR) Inter-University Consortium for Political and Social Research (ICPSR) Partners in Information Access for the Public Health Workforce Roadmap K-12 Data Resource Center (UCSF) List of datasets from the American Sociologic Association Canadian Research Data Centers – Data Sets and Research Tools (Canada) Directory of Health and Human Services Data Resources Publicly Available Databases from National Institute on Aging (NIA) Publicly Available Databases from National Heart, Lung, & Blood Institute (NHLBI) National Center for Health Statistics (NCHS) Data Warehouse Medicare Research Data Assistance Center (RESDAC); and Centers for Medicare and Medicaid Services (CMS) Research, Statistics, Data & Systems Veterans Affairs (VA) data (all available at

Division of Geriatrics National Information Center on Health Services Research and Health Care Technology (NICHSR) Databases, data repositories, health statistics Fellowship and funding opportunities Glossaries, research and clinical guidelines Evidence-based practice and health technology assessment Specialized PubMed searches on healthcare quality and costs

Division of Geriatrics Inter-University Consortium for Political and Social Research (ICPSR) World’s largest archive of social science data Searchable Many sub-archives relevant to HSR –Health and Medical Care Archive –National Archive of Computerized Data on Aging

Division of Geriatrics Planning Your Dataset Project

Division of Geriatrics Resources Needed Your effort Computer resources and security Programmer and/or statistician effort PhD statistical support – complex sampling or analyses Time timeline

Division of Geriatrics IRB Issues Exempt if no identifiers - Still need IRB determination of exempt status Limited data set – DUA - Usually expedited for IRB Full data use agreement -Still may be expedited for IRB -May need to explain lack of consent to funders/IRB

Division of Geriatrics 1.Start with a clear research question and hypothesis 2.Get to know your data source: Why does the database exist? Who reports the data? What are the incentives for accurate reporting? How are the data audited, if at all? Can you link the data to other large databases? 3.Get good documentation of the cohort, variables, and data layout, then read the fine print 4.Consult or collaborate with researchers who have used the database Summary: 10 Tips for Success in Secondary Data Analyses Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health

Division of Geriatrics 5.Line up computing resources before data arrive 6.Allow time to receive data if not publicly available 7.Learn SAS, Stata, or other statistical software so you can analyze data yourself (or collaborate) 8.Assess data quality (e.g., outliers & missing data) with plots or frequency tables 9.Consult or collaborate with a statistician on your analysis plan, especially for complex surveys with sampling weights 10.Use clinical intuition to interpret results and consult experts as needed Summary: 10 Tips for Success in Secondary Data Analyses Provided by John Ayanian, MD, MPP, Ellen McCarthy, PhD, “Research with Large Databases”, Harvard School of Public Health