) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC HEALTH RECORDS AND CLINICAL TRIALS SYSTEMS ADVANCING PATIENTS SAFETY IN CLINICAL RESEARCH 1st SIG Workshop Larnaca, 13 November 2012 Athos Antoniades, UCY Panagiotis Gouvas, UBITECH
FP7, ICT-2011 – 5.3 Page 2 1.Problem Statement 2.Ethical & Legal Aspects 3.Data Cube Definition 4.Linking Data 5.Architecture Overview
FP7, ICT-2011 – 5.3 Page 3 Increasing wealth of primary medical information BUT Limited datasets are shared between medical data- providers (fragmentation) Limited statistical power Reduced ability to replicate tests A solution to the above would accelerate clinical research
FP7, ICT-2011 – 5.3 Page 4 Respect patients’ anonymity, data’s ownership and privacy Not possible to transfer or copy patient data from the originating institutions All machines that hold patient data are need to remain off-line Computation and analysis will be performed by data providers off-line
FP7, ICT-2011 – 5.3 Page 5 Linked2Safety data will not be identifiable and should not lead to the identification of a person’s identity (either directly or indirectly e.g. back-tracing) We need to strictly adhere to consent form requirements and all ethical and legal issues (European and national) Legal issues are diverse and different between countries, institutions and studies
FP7, ICT-2011 – 5.3 Page 6 Currently, EU legislation is strict about processing patient data, especially when it comes to genetic information: It does not matter whether the data to which the program is applied were previously pseudo-anonymized It would be possible for a third party to achieve such a link unofficially, through the process of data-matching The processing should be carried out on-site (e.g. by CHUV, CING and ZEINCRO) and should be subject to significant security measures The Linked2Safety Architecture reflects all of the above There are two core ideas that we implement to address legal issues: Closed-world room pre-processing Usage of data-cubes processing
FP7, ICT-2011 – 5.3 Page 7 The operational procedures for the creation and semantic enrichment of the data cubes are as follows: The data provider’s staff after reviewing the legal and ethical requirements for their data, make a decision on what data to include in Linked2Safety and what parameters they need to define for the creation of the aggregated data. A member of the staff of the data provider enters the ”closed-world” room, where the data are maintained and performs the aggregation of the data, which will create the data cubes. This step includes the quality assurance and filtering of the data, based on the predefined settings of the previous step. The produced data cubes are then stored in the RDF format and are verified that they do not contain any personal medical records. The final data cubes are transferred to a server that is accessible by the Linked2Safety platform, outside the ”closed-world” room.
FP7, ICT-2011 – 5.3 Page 8 Each data provider generates data cubes from their raw patients’ data The created data cubes (anonymised data) are then inserted to the Linked2Safety platform Frond- End
FP7, ICT-2011 – 5.3 Page 9 Data Cube Approach
FP7, ICT-2011 – 5.3 Page 10 A paper that analyzes data from a specific study reports: Marital Status Age MarriedWidowedSingle ~601520
FP7, ICT-2011 – 5.3 Page 11 A paper that analyzes data from a specific study reports: Marital Status Age MarriedWidowedSingle ~601520
FP7, ICT-2011 – 5.3 Page 12 A paper that analyzes data from a specific study reports: Marital Status Age MarriedWidowedSingle ~601520
FP7, ICT-2011 – 5.3 Page 13 Paper 1 that analyzes data from a specific study reports: Marital Status Age MarriedWidowedSingle 0-16NA ~ Marital Status Age MarriedWidowedSingle 0-16NA ~ Paper 2 that analyzes data from the same study reports:
FP7, ICT-2011 – 5.3 Page 14 Original Data Marital Status Age MarriedWidowedSingle ~ Marital Status Age MarriedWidowedSingle 0-16NA ~ Perturbation (+-1) and Cell Suppression (<5)
FP7, ICT-2011 – 5.3 Page 15 Semantic Web is built mainly upon Resource Description Framework models RDF data model is based upon the idea of making statements about resources The form of subject-predicate-object expressions is followed (a.k.a. RDF triples) It is an official W3C specification (2004) Many serialization/representation formats (XML, JSON etc) A collection of RDF statements represents a labeled, directed multi-graph
FP7, ICT-2011 – 5.3 Page 16 Interconnection = RDF + Semantic Model + Interfaces S1:drugX bioactivity: target: S1: enzY activation S2:Cyp1A function: Carcinogen excretion similarTo
FP7, ICT-2011 – 5.3 Page 17 :Cyp1A :Cyp2A :Enzyme Function: Carcinogen excretion Location: chr2: rdfs:domain rdf:type rdfs:domain stated inferred :Protein rdfs: subClassOf rdf:type In order to achieve such Interconnection, we need to align data to a common format
FP7, ICT-2011 – 5.3 Page 18
FP7, ICT-2011 – 5.3 Page 19 Semantic SPARQL Datacube
FP7, ICT-2011 – 5.3 Page 20
FP7, ICT-2011 – 5.3 Page 21 Clinical EHR record A … … … sdmx-metadata:. qb:dataSet ; sdmx-dimension:Diabetes ; sdmx-dimension:Weight ; sdmx-measure:Cases “0"^^xsd:long; a qb:Observation. qb:dataSet ; sdmx-dimension:Diabetes ; sdmx-dimension:Weight ; sdmx-measure:Cases “8"^^xsd:long; a qb:Observation. 30 1/1/1900 … 30 1/1/1970 … Clinical EHR record B Data-cube Clinical EHR aligned data in CommonEHR (always in Closed-world Room) Data-cube in RDF Format
FP7, ICT-2011 – 5.3 Page 22
FP7, ICT-2011 – 5.3 Page 23 Frond-End
FP7, ICT-2011 – 5.3 Page 24
FP7, ICT-2011 – 5.3 Page 25 The most crucial aspects of the project: Security and Anonymity Linking Medical Data and Data-cubes Provide ways of initiating experiments Integrating securely different partners in different countries The consortium has come up with innovative solution to address the above
FP7, ICT-2011 – 5.3 Page 26 Athos Antoniades Ph.D. - University of Cyprus Tel: Panagiotis Gouvas - Ph.D. - Ubitech Tel: