EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan Glatard CNRS, I3S laboratory
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, EGEE Medical Data Manager Objectives –Expose a standard grid interface (SRM) for medical image servers (DICOM) –Use native DICOM storage format –Fulfill medical applications security requirements –Do not interfere with clinical practice User Interfaces Worker Nodes DICOM clients DICOM Interface SRM DICOM server
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, 2008 Medical data protection Content –Medical images (data, confidential) –Patient folder (attached metadata, very sensitive) Requirements –Patient privacy Needs fine access control (ACLs on all data and metadata) Needs metadata contention (metadata databases administrated by accredited staff) –Data protection Needs data encryption (even grid sites administrators are not accredited to access the data) How important it is? –The medical community will just not use a system in which they are not trustful (both a technical and a human problem)
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, MDM main Components Usability –LFC API provides transparent access Privacy –LFC and DPM provide file level ACLs –AMGA provides metadata secured communication and ACLs Data protection –SRM-DICOM provides on-the-fly data anonimization DPM-based (SRM v2 interface) –Hydra key store provides encryption / decryption transparently –Data is anoymized prior to transmission LFC AMGA Metadata SRM-DICOM Interface
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Exploiting DPM extensibility DPM can access different storage back-end through plugins –The DPM-DICOM plugin prepares the file DPM exposes a standard Storage Element interface (SRM) DPM provides standard file exchange protocols, file access control DICOM GET DPM head DPM Disks pool Standard interface File retrieval DPM-DICOM Plugin DPM-DICOM Library Temporary copy SFN request
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Medical Data Registration AMGA Metadata gLite API 1. Image is acquired 2. Image is stored in DICOM server 3. gLite client 3a. Image is registered (a GUID is associated) 3b. Image key is produced and registered 4. image metadata are registered LFC DICOM server DPM File Catalog Hydra keystore
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Medical Data Registration AMGA Metadata LFC API 1. Image is acquired 2. Image is stored in DICOM server 3. gLite client 3a. Image is registered (a GUID is associated) 3b. Image key is produced and registered 4. image metadata are registered LFC DICOM server DPM File Catalog –All this step can be done by a single CLI –A DICOM transaction can initiate the registration PUSH DICOM Triggers: DICOM server PUSH MDM registraiton
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Registration in Hydra Each DICOM image is uniquely identified by a unique Study/Serie/SOP identifier The hydra servers generate a key for the selected cypher The cypher and the key are associated to the unique DICOM identifiers analyze Study ID Series ID SOP ID Select a cypher and generate a key DICOM image Hydra servers
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Registration in the DICOM server DPM-DICOM library records the DICOM picture in the DICOM server The file size (depending on encryption algorithm and file format) is computed for registration in LFC The size depends of : Size of the encrypted anonymous file DPM-DICOM Library DICOM file 1A1A 1B1B 2 Image anonimization DICOM server The cyper and the key The fields erased in the anonymization step The size of the original file The DICOM server
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, File identifiers registration A reference to a file is recorded in the DPM, but no copy of the file in the DPM disk pool is needed Directories with the Study, Series and SOP identifiers are created in the LFC The anonymized data fields are registered in the AMGA server - SURL and PFN - the size of the file - host of the disk pool LFN and SURL - size of the file - DICOM image metadata DPM LFC AMGA
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Access control rights management To allow one user to access a medical file and its metadata the owner of the file must set the right in all the component : Example: LFC DPM Hydra AMGA
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Medical Data Retrieval SRM-DICOM interface AMGA Metadata User Interface Worker Node 2. lcg client 3. get SFN from GUID 4. request file 5. get file key 6. on-the-fly encryption and anonimyzation return encrypted file 7. get file key and decrypt file locally Metadata ACL control Anonymization & encryption 1. get GUID from metadata gLite API LFC File ACL control File Catalog Key ACL control Hydra keystore
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Dicom retrieval : get the dicom file DPM SURL request DPM-DICOM Library DPM-DICOM Plugin The PFN associates with a DICOM file is resolved by the DPM-DICOM plugin The plugin makes a DICOM transaction with the DICOM server to retrieve the medical image By default, MDM is packaged with the Conquest DICOM server, but it is intended for interface to production servers The database assocites each SURL with a PFN DICOM GET
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Dicom retrieval : Anonymization and encryption DPM Disks pool Standard interface File transfer Step 1A: The DPM-DICOM uses the DCMTK library to anonymize the DICOM file Or Step 1B: The DICOM file is converted to a 3D format (inrimage) without nominative information Step 2: The DPM-DICOM calls Hydra to encrypt the final file DPM-DICOM uses the RFIO library to copy the file in a spool disk. The spool disk is only a buffer for the file. DPM-DICOM Library DICOM file 1A1A 1B1B 2 Image anonimization DICOM server SURL request DPM-DICOM Plugin
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Service Distribution Hospital sites have to remain autonomous –With strong (in-site) control over the sensitive metadata The EGEE Data Management System federates distributed data files AMGA supports databases replication but not distribution –Asynchronous, master-slave model, with partial replication of the directory hierarchy –The MDM includes a library and a query client that provide multi-site metadata servers federation. The client is based on the AMGA client and is syntactically compatible (transparency). - Users can send the commands to only one or all the servers - Users can dynamically add or remove servers AMGA
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Use cases File administration –A system administrator has access to the file for replication / backup procedures –No access to the file content, nor to metdata Image processing –A neuroscientist has access to the file content for image analysis –No access to the nominative metadata Medical analysis –A physician involved in the patient healthcare has access to all data and metadata
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, DataManagement The User Interface Computing Resources Storage Resources Site X Logging, real time monitoring WorkloadManagement Sites Resources InformationService Dynamic evolution DataSets info Author. &Authen. queries User requests Resources allocation Publication resources info Indexing
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, The User Interface Very few components Easy to install The Hydra client will be part of the standard UI Standard user configuration for the LFC and BDII No configuration for the DPM Only one file for : –Hydra (services.xml) –AMGA (.mdclient) LFC Hydra AMGA Hydra AMGA Hydra Multi-server AMGA client Hydra client Configuration Installation
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, The MDM components Computing Resources Site X Logging, real time monitoring WorkloadManagement InformationService Dynamic evolution DataSets info Author. &Authen. queries User requests Resources allocation Publication resources info Indexing DataManagement Storage Resources Sites Resources
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, MDM is on top of SL4 ( and CentOS ) for the DPM version of the MDM SL3 for the gLite-IO version of the MDM Libraries ( gLite, DCMTK, etc)
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, The MDM components AMGA database front-end Access control AMGA Metadata Hydra Key store Access control SRM v2 interface Access control Instrumented DPM Storage Element DPM-DICOM plugin LFC File Catalog Access control LFC DICOM Server BDII Server
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, One server All the components could be on the same server /vo /dpm /domain /home DPM head node file DPM disk servers … DPM-DICOM plugin One server BDII Server
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Behind the components The LFC of the BIOMED VO is used The BDII must be registered by a top level BDII By default, AMGA uses a PostgreSQL database to store the metadata –Can use other database (Mysql, SQLite, Oracle) The DPM is only a buffer : –The storage area should be small. –The file are already encrypted. –The file in the DPM can be replicated by other servers LFC AMGA Metadata SRM-DICOM Interface
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Behind the components The Hydra server uses Mysql to store the keys –Each Hydra server use well-separated tables/database The Hydra server is on the top of a Tomcat and an Apache server All the DICOM picture are stored in the DICOM server –If there is no DICOM server, the MDM provides the CONQUEST server
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Installation procedure Add some yum repositories Install with –Yum install MDM
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, Configuration procedure The server must be registered in EGEE –The server receive a certificate Today, there is no automatic configuration procedure The configuration procedure is describe Some parts of the configuration (firewall, DPM buffer, etc) are already automatic
Enabling Grids for E-sciencE EGEE-III INFSO-RI Medical Data Manager, R. Texier, July 16, What is New ? All the command are glite-* The automatic registration of DICOM picture in AMGA is flexible Global RPM for the MDM and yum repositories