HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School 1

Brief outline of this presentation ● Using large linked automated data for public health research ● Data development processes to ensure HIPAA-compliance ● Examples ● Some thoughts

Two types of data for public health research ● Primary data – Prospectively collected – Well-designed data collection tool – Informed consent ● Secondary data – Data originally collected for other purposes – May be proprietary – Privacy and confidentiality (particularly important if no prior authorization) – Different data systems

Large linked healthcare databases ● Health insurance claims data – Medicaid – Medicare – Managed Care Organizations (MCO) ● Automated medical records ● Hospital / Clinic IT systems ● Availability of written records ● Need to contact patients / individuals ?

Public health research within MCOs ● Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare) ● Kaiser Permanente (several states) ● Group Health Cooperative (Seattle area) ● Others ● HMO Research Network – 10+ MCOs across the U.S.

Public health research within MCOs ● Different types of MCOs – Group model – Staff model – Different relationship with hospitals – Implications on data access ● MCOs with research programs – Separate research departments – Full-time investigators and support staff

Data elements in the MCO data ● Demographic information ● Membership – Start date, termination date, benefit plan,... ● Office visits – Type of visit, diagnosis(es), special procedures ● Special examinations – Radiology, Laboratory examinations ● Hospitalizations ● Drug dispensings ● Linkable by a unique ID

HIPAA and Research with Databases ● Authorization from individual research subjects not feasible ● Individual authorization may be waived by Institutional Review Board or Privacy Board – Minimal Risk – Data reported in aggregate fashion ● No single-case report – “Minimum necessary” principle – De-identification

HIPAA and Research with Databases ● Single MCO studies – Investigators and research staff are MCO employees ● Multiple-MCO studies – May involve transferral of data across MCOs or to a Data Center ● Other types of studies not covered in this presentation – e.g. Generate a de-identified dataset for public or commercial use

HIPAA and data development ● Do not move individual level data unless absolutely necessary – Generate summary tables at each study site – Combine the tables for final report – Smalley et al. Contraindicated use of cisapride: the impact of an FDA regulatory action. JAMA 2000; 284: 3036-9.

HIPAA and data development ● Randomly generated Study ID to replace True ID – Crosswalk between the two stored at secured location – Destroy the crosswalk after successful linkage of data and quality check – Implications for storage and back-up

HIPAA and data development ● Roll-up / transform variables – Age --> Age groups – National Drug Code --> Drug or Group of drugs – ICD-9 diagnosis code --> Disease e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg 55555- 333-22 – 65-70 y/o m with Heart Failure received Digoxin

HIPAA and data development ● Preserve temporal sequence of events but disguise the real dates ● e.g. Drug use during pregnancy study – 29 year-old received 55555-333-22 on Nov 25, 1999 and delivered a baby on Dec 10, 1999 --> – 26-30 year-old mother delivered in 1999, baby exposed to amoxicillin at -16 days

HIPAA and data development ● Only extract information relevant to the study – e.g. A study of osteoporosis does not require information on subjects' mental health status ● Co-morbid conditions may be relevant – Use proxy measures to describe level of comorbidity ● Charlson's Index (based on concomitant diagnoses) ● Chronic Disease Score (based on co-medications)

HIPAA and data development ● Geocoding – Describe social-economic status of study subjects based on census tract data – Send out (Study ID, address) to a geocoding firm – (Study ID, X1, X2, X3) returned ● X1 : education level ● X2 : income level ● X3 : race/ethnicity information

An example Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns. Pediatrics 2003; 112: 620-7. ● Data elements involved – Date of birth, gender – Membership – Drug dispensings – Diagnoses in close proximity to antibiotics dispensings ● Data from nine MCOs

Finkelstein et al. Pediatric antibiotics use study ● Data development at each MCO – Extract antibiotics use information – Extract diagnosis of interest (infections) – Use date of birth, gender, and membership data to calculate person-time of interest ● Refined, aggregate data forwarded to the Data Center – Rate of antibiotics use = # of antibiotics use / 1,000 person-years for each age-gender group

HIPAA and data development ● Individual identification is needed for certain types of research – Obtain medical records – Contact patient to conduct interview and/or request specimen – Linkage with external data ● Cancer registry ● National Death Index

HIPAA and data development ● The process – Data extraction, transformation, reduction, and de- identification carried out at each MCO – Governed by State laws and local HIPAA-compliant Standard Operating Procedures – Principle of Limited Dataset / Minimum necessary ● The goal – Highly processed and de-identified data available for concatenation across study sites and complex analyses

k-anonymity and large datasets ● The goal – A de-identified dataset at a certain level of individual anonymity A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam vs. A man 40-45 taking a beta-blocker and a thiazolidenedione

HIPAA, Data Storage and Access ● Implications on Data Backup Plans – Data need to be destroyed after the report is published ● Data only used to support pre-defined analyses ● Ancillary analysis are possible after IRB review and approval

Epidemiology studies using large databases ● In the old days... – Give me all the data, do what I say... – What if the investigator / reviewer want to do THIS analysis ? – Use existing datasets to test new hypothesis ● Good research practice – Define necessary data elements according to research protocol – Pre-defined analytic plan

Epidemiology studies using large databases ● Keys to protection of human subjects – Competent, responsible investigators and staff – IRB review and oversight – Data development guidelines ● e.g. Good Epidemiology Practice – Information technology ● Some reasonable rules/guidelines are better than no guideline

HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Similar presentations

Presentation on theme: "HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Similar presentations

Presentation on theme: "HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,"— Presentation transcript:

Similar presentations

About project

Feedback