De-Identification Standards for CDISC Data Models

Slides:



Advertisements
Similar presentations
Fourth National HIPAA Summit April 26, 2002 Implementation of a HIPAA Data Management Strategy Safeguarding privacy interests while making data available.
Advertisements

Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Exploring uncertainty in cost effectiveness analysis NICE International and HITAP copyright © 2013 Francis Ruiz NICE International (acknowledgements to:
Kendle Implementation of Clinical Data Acquisition Standards Harmonization Dr Elke Sennewald Kendle 9th German CDISC User Group Meeting Berlin, 28 September.
The ICH E5 Question and Answer Document Status and Content Robert T. O’Neill, Ph.D. Director, Office of Biostatistics, CDER, FDA Presented at the 4th Kitasato-Harvard.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
Safeguarding Animal Health 1 Proposed BSE Comprehensive Rule: A New Approach to BSE Rulemaking Dr. Christopher Robinson Assistant Director, NCIE BSE Comprehensive.
Accredited Member of the Association of Clinical Research Professionals, USA Tips on clinical trials Maha Al-Farhan B.Sc, M.Phil., M.B.A., D.I.C.
JumpStart the Regulatory Review: Applying the Right Tools at the Right Time to the Right Audience Lilliam Rosario, Ph.D. Director Office of Computational.
© 2011 Octagon Research Solutions, Inc. All Rights Reserved. The contents of this document are confidential and proprietary to Octagon Research Solutions,
Qualification Process for Standard Scripts Hosted in the Open Source Repository ABSTRACT Dante Di Tommaso 1 and Hanming Tu 2 Tehran 1 F. Hoffmann-La Roche.
Sharing Clinical Trial Reports and Data Access – Some practicalities Merete Jørgensen 26 May 2014 DSBS Hillerød.
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
Second Annual Japan CDISC Group (JCG) Meeting 28 January 2004 Julie Evans Director, Technical Services.
Optimizing Data Standards Working Group Meeting Summary
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
White Papers to Fill the Gaps of Standardization of Tables, Figures, and Listings (Creating Standard Targets) Analyses and Displays for Vital Signs, ECG,
PhUSE Computational Science Working Groups Solutions Through Collaboration.
Company LOGO. Company LOGO PE, PMP, PgMP, PME, MCT, PRINCE2 Practitioner.
Introduction to the Australian Privacy Principles & the OAIC’s regulatory approach Privacy Awareness Week 2016.
How Good is Your SDTM Data? Perspectives from JumpStart Mary Doi, M.D., M.S. Office of Computational Science Office of Translational Sciences Center for.
TRANSBORDER DATA FLOWS INA MEIRING. THE PROTECTION OF PERSONAL INFORMATION ACT (“POPI”) > 'personal information' means information relating to an identifiable,
GCP (GOOD CLINICAL PRACTISE)
An agency of the European Union Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070 Industry.
PhUSE Data De-identification Standard for CDSIC SDTM IG 3.2, and EMA Policy 0070 Sherry Meeh 15Sept2016 Integrated Data Analytics and Reporting Janssen.
Event-Level Narrative
Paediatric Medicine: The Paediatric Investigation Plan
Observational Study Working Group
Risk-Based Monitoring Quantitative Metrics Toolkit
HEALTH INSURANCE PORTABILITY AND ACCOUNTABILITY ACT (HIPAA)
The HIPAA Privacy Rule: Implications for Medical Research
FDA’s IDE Decisions and Communications
Supplementary Table 1. PRISMA checklist
The NICE Citizens Council and the role of social value judgements
Pharmacovigilance in clinical trials
Clinical Study Results Publication
Patient Medical Records
ISO 9001:2015 Auditor / Registration Decision Lessons Learned
Why use CDISC for trials not submitted to regulators?
Critical Reading of Clinical Study Results
CPT and Disclosure: Connecting Critical Processes
Standards and Certification Training
Traceability between SDTM and ADaM converted analysis datasets
Regulatory perspective
PhUSE Key Performance Indicator Initiative
In-Depth Report from Optimizing Data Standards Working Group
BIOS:6610 – STATISTICAL METHODS IN CLINICAL TRIALS
Providing Access To Patient Level Clinical Trial Data
Managed Access to NIHR-funded Research Data
PhUSE Computational Science
PhUSE Data De-Identification Working Group
Snapshot of the Clinical Trials Enterprise as revealed by ClinicalTrials.gov Content downloaded: September 2012.
Data Transparency CSS 2018 Webinar 25.April 2018
The HIPAA Privacy Rule and Research
Streamlining IRB Procedures for Expanded Access
PSO Overview for (name of organization’s) PSES Workgroup
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
PhUSE Computational Science Working Groups
CPT and Disclosure: Connecting Critical Processes
Approaches to Implementing in Your Organization
Nicolás J. I. Rodríguez & Arild Mellesdal
Visualizing Subject Level Data in Clinical Trial Data
MODULE B - PROCESS SUBMODULES B1. Organizational Structure
An Update on the CS Standard Analyses and Code Sharing Working Group Nancy Brucken, Principal Statistical Programmer, Syneos Health; Jared Slain, Associate.
QA Reviews Lecture # 6.
General Data Protection regulation (GDPR)
Information on projects
European Commission, DG Environment Air & Industrial Emissions Unit
PSO Overview for (name of organization’s) PSES Workgroup
Safety Analytics Workshop – Computational Science Symposium 2019
Presentation transcript:

De-Identification Standards for CDISC Data Models ICTMC Panel 09. May 2017 Jean-Marc Ferran PhUSE, Data Transparency Working Group Lead

PhUSE Data Transparency Initiative Background There are current efforts by regulators such as EMA to examine how to make Individual Patient Data (IPD) from clinical trials shared more widely Sponsors have started sharing IPD based on request proposals from researchers and… Data in different data models is available Each company seems to be defining their own high-level guidelines for data de-identification It is possible to request data from different companies within same research proposal

Data De-Identification Guidelines Processes Quasi/Direct Identifiers Assessment Rules Residual Risk

Goals Provide peer-reviewed de-identification standards for CDISC data models to the industry Facilitate the assessment of direct and quasi identifiers in CDISC datasets Ensure consistency in de-identified data shared across sponsors

Key Principles Direct & Quasi Identifiers are identified Direct identifiers: One or more direct identifiers can be used to uniquely identify an individual. E.g. Subject ID, Social Security Number, Telephone number, Exact address, etc. It is compulsory to remove or pseudonymize any direct identifier. Quasi identifiers: Quasi identifiers are background information that can be used in connection with other information to identify an individual with a high probability. E.g. Age at baseline, Race, Sex, Events, Specific Findings, etc. Primary & Alternative Rules for De-Identification are assigned Primary rule: Pro-active data de-identification Alternative rule: Reactive data de-identification and special cases Impact on data utility is evaluated qualitatively Implementation guidance for each rule is provided Rules address different scenarios rather than different implementation possibilities Comments are added to guide the reader To explain further the rational of a given assessment To warn users for exceptions or special considerations

Deliverable De-Identification Standards for CDISC SDTM 3.2 Dates Low frequency & rare events Recoding of unique identifiers Handling of free-text variables Extensible code lists Geographical location Sensitive data Quasi identifiers to keep PII of third-party +1300 variables

Residual Risk Assessment Methodology The evaluation of residual risk is a quantitative exercise and involves 4 general steps: Assessing the context of the data sharing. Setting an acceptable threshold for anonymizing the data. Measuring the actual probability of re-identification in the de-identified data. Adjusting data de-identification if necessary.

Working Group Current Activities Reviewing EMA CSR anonymization guidance and providing additional guidance and interpretations on Policy 0070 Developing de-identification standard for CDISC ADaM to be finalized in summer 2017

Policy 0070 Interpretations Evaluation of Risk in CSRs Data Utility Handling of Patient Narratives Review of Published CSRs

Data Utility in CSRs How researchers plan to use narratives? Do you extracted data from narratives to recreate your own IPD set? Do you thread through narratives and how? What outcomes are you looking at? What can you do with a CSR that you cannot do with publications or public registers?

Six Working Groups: Data Transparency & Data De-Identification Nonclinical Topics Optimizing the Use of Data Standards Evaluating the use of FHIR in Clinical Research  Interoperability & Technology  Standard Analyses & Code Sharing 

Questions? Jean-Marc Ferran Special Projects Director, PhUSE E: jean-marc.ferran@phuse.eu W: http://www.phuse.eu/Data_Transparency.aspx

Residual Risk Assessment Example Population: Study population Similar Clinical Studies population Geographical population Risk Metrics: Average risk for controlled disclosure Maximum risk for public disclosure

Current & Possible Future Process Anonymized CSR Selected TFLs

PhUSE De-Identification Working Group Vision “Develop data de-identification standards for CDISC data models” 20+ Participants from Pharma, CROs, Software and Academia Focus first on SDTM Data Privacy Rules & Rational Data Utility

About Us EU & US Annual Conference & Computational Science Symposium Not-for profit Founded in 2004 8,000 Members Worldwide Programmers, Statisticians, Data Managers, eClinical EU & US Annual Conference & Computational Science Symposium 20+ events in EU, US and Asia in 2016 Global Technical Working Groups PhUSE Wiki & PhUSE Code Repository

Key Areas and Rules Dates Low frequency & rare events Must be offset Date of Birth – Derive into Age at baseline and aggregate patients over 89 years old or derive into age folds (10-15, 15-20, 20-25 etc., 18-20, 20-22, 22-24, etc.) Date of Death - Offset Low frequency & rare events Methodology such as one described in IOM report is recommended to be used Variables and datasets at stake have a comment associated with such considerations Recoding of unique identifiers Subject IDs Investigator ID Site IDs Reference ID and Sponsor ID Handling of free-text variables and extensible code lists If critical to the analysis, and not recoded in the dataset. Review and only redact values with personal information. Otherwise remove. Extensible code lists variables are flagged as a warning as free-text may be added Geographical location Aggregation of country to continent unless country is critical to analysis. Site and Investigator names and IDs. must be deleted. Site/Investigator ID may be recoded in some cases. Sensitive data This is the responsibility of the sponsors to define how to handle such data Some quasi identifiers are advised to be kept as-is Important variable for analysis. E.g. Gender De-identification is already in place. E.g. relative dates such as study day PII of third-party Must be removed as they can provide geographical information Information such a evaluator type is however advised to be kept

MH dates and patients > 89 Core Offset Partial Dates DoB/DoD Adaptive Design Extension Studies Relative de-identification No further MH dates and patients > 89

Dates Offset Recommended Algorithm See Appendix 1

Issue with Partial Dates Ex: Delta applied of -14 days

Disclaimer De-Identification Standards for CDISC SDTM 3.2 The views in the deliverable represent the consensus of the Working Group The rules described do not guarantee an acceptable or very small residual risk of re-identification “It is generally recommended if certain conditions are met, that after the application of the rules described in this document, a second pass examining low frequency should be performed to confirm that there are no risks from low frequencies.”

Residual Risk Assessment PhUSE Approach & Criteria “Because identifying a specific set of variables that need to be modified as per the general Safe Harbor approach does not guarantee that the risk of re-identification is always sufficiently small, a second step of residual risk analysis is generally recommended if any of the conditions below are met. There may be residual re-identification risk under certain conditions, such as: the data is not being released through a secure portal with adequate privacy and security controls, the data recipients do not sign a data sharing agreement that has sufficient limitations on what the recipients can and cannot do, the trial is for a rare disease, there are extreme values in the data set, there are observable or knowable serious adverse events in the trial (e.g., deaths and suicides), the data set has extensive demographic and socioeconomic information about the participants, or the data set includes detailed medical histories of the participants. The sponsor can decide whether any of these conditions are met in making the determination about whether this additional residual risk assessment is required”

Publication