Audrey R. Chapman, Ph.D. October 22, 2014
Protecting the privacy and confidentiality of research participants is a foundational ethical obligation and a legal requirement Large-scale genomic databases and biobanks, which are an important contribution to research, impose risks to privacy & confidentiality Currently many experts believe it is not possible to assure fully the privacy and confidentiality of genetic data in databases There are some proposed approaches that may help balance data sharing with reducing the vulnerability of genetic data
Ethical requirement to protect privacy & confidentiality of research data derives from respect for persons and acknowledgement of their autonomy Autonomy translates into the obligation to obtain consent for medical interventions, participation in research, and use of the data Surveys shown it is important to individuals to control use of their genetic data Research on addiction is especially sensitive because alcohol dependence & other addictions are potentially stigmatizing Protection of privacy & assurance of consent for use enhances willingness to participate in genetic research
Alcohol & substance dependence very highly stigmatized even when compared with mental disorders Some experts propose a genetic explanation of addiction would decrease stigmatization of addicted people Limited studies but preponderance research suggests geneticization more likely to increase than to decrease stigmatization ◦ Increase genetic essentialism & perceptions problem is serious & intractable ◦ But also likely to reduce blame
Privacy Rule in Health Insurance Portability and Accountability Act (HIPAA) (1996) establishes national standards to protect individuals’ medical records & other personal health information Federal Policy for Protection of Human Subjects – “Common Rule” (1991) codified at 45 CFR part 46, subparts A through E protects privacy individually identifiable health information while trying to ensure researchers continue to have access to medical information necessary to conduct vital research Inconsistent requirements of two – Common Rule considered less stringent
De-identified information & specimens not protected under either Common Rule or Privacy Rule because no longer considered human subjects research Removal of following identifiers: ◦ names ◦ geographical subdivisions smaller than a state except for first three digits of a ZIP code ◦ All elements of dates (except year) relating to birth date, admission data, & discharge date ◦ Telephone numbers ◦ Fax numbers ◦ addresses
◦ Social security numbers ◦ Medical record numbers ◦ Health plan beneficiary numbers ◦ Account numbers, certificate or license numbers ◦ Vehicle identifiers, including license-plate numbers ◦ Device identifiers and serial numbers ◦ URLs ◦ Internet protocol address numbers ◦ Biometric identifiers ◦ Photographic and comparable images ◦ Any other unique identifying number, characteristic, or code
Assumption stripping away demographic information renders research record de- identifiable not apply to genetic data which is potentially re-identifiable Regulatory dichotomy of all or nothing coverage fails to recognize varying degrees of identifiability of health information Plus anonymizing data precludes recontacting research subjects to share research findings
Investigators commonly share genomic data and biological samples raising issues about conforming to informed consents. As of 2011 there were more than 500 million stored specimens in hundreds of biobanks in U.S. & number estimated to grow by at least 20 million per year. No single rule on whether informed consent legally required under Common Rule or authorization required under HIPAA for integration of data into databases. NIH Genomic Data Sharing Policy (2014), which applies to all NIH funded research that generates large-scale human genomic data, requires investigators to submit the data, as well as relevant associated data (e.g. phenotype & exposure data), to an NIH-designated repository in timely manner.
Researchers often need to correlate genotype with phenotype requiring them to match genetic samples with health records Privacy Rule under HIPAA requires individual to sign an authorization for use & disclosure of protected health information in research Privacy Rule does not provide a definitive answer to question of whether a prospective authorization is valid that contemplates future addition of individual health information, particularly sensitive information, such as information about substance abuse
No consent: many medical institutions store patients’ information from past treatment without explicit consent to use for research Presumed consent/opt out (Icelandic database) General consent: Many studies ask individuals to give consent for indefinite uses of their samples and data ◦ Advantage of this approach is that it promotes sharing of samples and data and helps to advance scientific research. ◦ Disadvantage is not adequately protect participants’ autonomy, because some participants may not want their samples/data used for certain purposes.
Specific consent for each use of their samples or data ◦ Advantage - maximizes subjects’ autonomy. ◦ Disadvantages - burdensome on investigators and participants and impedes sharing, since participants would need to be re-contacted each time to approve a new use of their samples or data. Tiered consent -menu of options on consent forms ◦ Participants can consent to general use of their samples and data or allow only certain types of uses, such as research related to their disease or to exclude specific categories, for example commercial research. ◦ Permits full informed consent, promotes sharing, and minimizes the burden on investigators and participants
Misuse of stored genetic data for purposes other than those in the informed consent difficult to monitor, especially when data sets are shared between researchers Breaches have both individual and group dimensions Havasupai Indian tribe (650 member tribe living in remote part of Grand Canyon) ◦ Samples given to researchers at Arizona State University to investigate cause of tribe’s high incidence of diabetes ◦ Researchers had individuals sign general consents for collection of blood samples & health histories ◦ Allegedly without consent, researchers used samples for other genetic research & published articles about tribe members’ rates of schizophrenia & inbreeding as well as their genetic relatedness to other native peoples (which conflicted with tribe’s creation stories) ◦ Tribe sued and lawsuit was settled in 2010 for $700,000
Various approaches to breaching privacy ◦ Identity tracing – hacker does not know who is owner of DNA sample but figures it out using DNA identifiers ◦ Attribute disclosure – hacker does know donor’s identity but uncovers additional sensitive information about him or her using data submitted to public DNA databases, including uncovering genomic areas that were masked (finding APOE gene in Jim Watson’s posted genome)
Possible re-identification of de-identified health records using computerized network databases linking demographic quasi-identifiers to individuals including voter registration records & other public record search engines ◦ An initial study reported successful tracing of medical record of Governor of MA using demographic identifiers in hospital discharge information ◦ Another study reported identification of 30% of Personal Genome Project participants by demographic profiling using zip code & exact birthdates found in PGP profile
A team of Whitehead Institute researchers led by Yanev Erlich using a computer, an internet connection, & publicly accessible online resources were able to identify nearly 50 individuals who were participants in genomic studies Methodology used: ◦ Analyzed short tandem repeats on Y chromosomes (Y-STRs) of men whose genetic material collected by Center for Study of Human Polymorphisms & whose genomes were sequenced & made publicly available as part 1000 Genome Project ◦ Genealogists & genetic genealogist companies have established publicly accessible databases that house Y-STR data by surname ◦ Researchers submitted Y-STRs to these databases to identify surnames ◦ With surnames in hand team queried other information sources including Internet record search engines, obituaries, genealogical websites, & public demographic data from National Institute of General Medical Sciences
An empirical analysis estimated 10-14% of US white male individuals from middle & upper classes are subject to surname inference on basis of scanning two larges Y-chromosome genealogical websites using a built-in search engine ◦ Limitations – false identification of surnames, spelling variants of surnames, and common use of some surnames
Request consent to open release of their genomic data, with knowledge they can potentially be identified Access control: Move away from open access and take a more restrictive approach Cryptographic solutions ◦ Relies on new advances beyond tradition use of encrypting sensitive information by adding additional mathematical properties Certificates of confidentiality
For studies using data from specimens collected before the effective data of the GDS Policy an assessment by an IRB, privacy board, or equivalent body that data submission to NIH repository is not inconsistent with informed consent provided by research participant. For studies initiated after effective date of GDS Policy investigators must obtain participants’ consent for their genomic & phenotypic data to be used for future research purposes & to be shared broadly even if cell lines or clinical specimens are de-identified. The informed consent needs to specify whether data will be shared through an unrestricted or controlled access repository.
The submitting institution will have to provide an Institutional Certification prior to receipt of an award ◦ noting any data use limitations in the data sharing plan. ◦ Data submission & subsequent data sharing for research purposes consistent with informed consent of study participants from whom data were obtained. ◦ an IRB, privacy board, and/or equivalent body has reviewed the investigator’s proposal for data submission and assures it is consistent with 45 CFR Part 46.
Requests for controlled-access data are reviewed by NIH Data Access Committees ◦ Decisions are based on conformity of proposed research to data use limitations established by submitting institution Investigators approved to download controlled- access data from NIH designated repositories & their institutions required to cosign a Data Use Certification specifying will ◦ only use data for approved research; ◦ Not attempt to identify individual participants from whom data obtained; ◦ Not share any of data with individuals other those listed in the data access request;