HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School 1
Brief outline of this presentation ● Using large linked automated data for public health research ● Data development processes to ensure HIPAA-compliance ● Examples ● Some thoughts
Two types of data for public health research ● Primary data – Prospectively collected – Well-designed data collection tool – Informed consent ● Secondary data – Data originally collected for other purposes – May be proprietary – Privacy and confidentiality (particularly important if no prior authorization) – Different data systems
Large linked healthcare databases ● Health insurance claims data – Medicaid – Medicare – Managed Care Organizations (MCO) ● Automated medical records ● Hospital / Clinic IT systems ● Availability of written records ● Need to contact patients / individuals ?
Public health research within MCOs ● Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare) ● Kaiser Permanente (several states) ● Group Health Cooperative (Seattle area) ● Others ● HMO Research Network – 10+ MCOs across the U.S.
Public health research within MCOs ● Different types of MCOs – Group model – Staff model – Different relationship with hospitals – Implications on data access ● MCOs with research programs – Separate research departments – Full-time investigators and support staff
Data elements in the MCO data ● Demographic information ● Membership – Start date, termination date, benefit plan,... ● Office visits – Type of visit, diagnosis(es), special procedures ● Special examinations – Radiology, Laboratory examinations ● Hospitalizations ● Drug dispensings ● Linkable by a unique ID
HIPAA and Research with Databases ● Authorization from individual research subjects not feasible ● Individual authorization may be waived by Institutional Review Board or Privacy Board – Minimal Risk – Data reported in aggregate fashion ● No single-case report – “Minimum necessary” principle – De-identification
HIPAA and Research with Databases ● Single MCO studies – Investigators and research staff are MCO employees ● Multiple-MCO studies – May involve transferral of data across MCOs or to a Data Center ● Other types of studies not covered in this presentation – e.g. Generate a de-identified dataset for public or commercial use
HIPAA and data development ● Do not move individual level data unless absolutely necessary – Generate summary tables at each study site – Combine the tables for final report – Smalley et al. Contraindicated use of cisapride: the impact of an FDA regulatory action. JAMA 2000; 284:
HIPAA and data development ● Randomly generated Study ID to replace True ID – Crosswalk between the two stored at secured location – Destroy the crosswalk after successful linkage of data and quality check – Implications for storage and back-up
HIPAA and data development ● Roll-up / transform variables – Age --> Age groups – National Drug Code --> Drug or Group of drugs – ICD-9 diagnosis code --> Disease e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg – y/o m with Heart Failure received Digoxin
HIPAA and data development ● Preserve temporal sequence of events but disguise the real dates ● e.g. Drug use during pregnancy study – 29 year-old received on Nov 25, 1999 and delivered a baby on Dec 10, > – year-old mother delivered in 1999, baby exposed to amoxicillin at -16 days
HIPAA and data development ● Only extract information relevant to the study – e.g. A study of osteoporosis does not require information on subjects' mental health status ● Co-morbid conditions may be relevant – Use proxy measures to describe level of comorbidity ● Charlson's Index (based on concomitant diagnoses) ● Chronic Disease Score (based on co-medications)
HIPAA and data development ● Geocoding – Describe social-economic status of study subjects based on census tract data – Send out (Study ID, address) to a geocoding firm – (Study ID, X1, X2, X3) returned ● X1 : education level ● X2 : income level ● X3 : race/ethnicity information
An example Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns. Pediatrics 2003; 112: ● Data elements involved – Date of birth, gender – Membership – Drug dispensings – Diagnoses in close proximity to antibiotics dispensings ● Data from nine MCOs
Finkelstein et al. Pediatric antibiotics use study ● Data development at each MCO – Extract antibiotics use information – Extract diagnosis of interest (infections) – Use date of birth, gender, and membership data to calculate person-time of interest ● Refined, aggregate data forwarded to the Data Center – Rate of antibiotics use = # of antibiotics use / 1,000 person-years for each age-gender group
HIPAA and data development ● Individual identification is needed for certain types of research – Obtain medical records – Contact patient to conduct interview and/or request specimen – Linkage with external data ● Cancer registry ● National Death Index
HIPAA and data development ● The process – Data extraction, transformation, reduction, and de- identification carried out at each MCO – Governed by State laws and local HIPAA-compliant Standard Operating Procedures – Principle of Limited Dataset / Minimum necessary ● The goal – Highly processed and de-identified data available for concatenation across study sites and complex analyses
k-anonymity and large datasets ● The goal – A de-identified dataset at a certain level of individual anonymity A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam vs. A man taking a beta-blocker and a thiazolidenedione
HIPAA, Data Storage and Access ● Implications on Data Backup Plans – Data need to be destroyed after the report is published ● Data only used to support pre-defined analyses ● Ancillary analysis are possible after IRB review and approval
Epidemiology studies using large databases ● In the old days... – Give me all the data, do what I say... – What if the investigator / reviewer want to do THIS analysis ? – Use existing datasets to test new hypothesis ● Good research practice – Define necessary data elements according to research protocol – Pre-defined analytic plan
Epidemiology studies using large databases ● Keys to protection of human subjects – Competent, responsible investigators and staff – IRB review and oversight – Data development guidelines ● e.g. Good Epidemiology Practice – Information technology ● Some reasonable rules/guidelines are better than no guideline