Un sistema avanzato per la gestione di immagini biomedicali in ambiente GRID G. Aloisio, S. Fiore SPACI Consortium & University of Salento, Italy, Lecce
Outline Scenario Related Works Issues Medical Imaging Environment System Architecture Data Storage Workflow Management Metadata Management: GRelC Case Studies: CMCC & Climate-G Security Web Gateway Conclusions First what is Metadata? A common definition is that metadata is data about data. On the GRID, this is mainly information describing files that is necessary for running jobs, that is, file metadata. But So, in a way accessing metadata is mainly about accessing databases. But having clients going directly to the database is not the most convenient solution. Better than that, is to have a simple interface for metadata access on the GRID. This interface should be defined in terms of metadata concepts, like entries, keys and values, instead of DB concepts. This has several advantages. It is easier to use by clients, since it exposes only metadata concepts and effectively hides the database. Having a simple interface that reveals DB functionality solves most of the problems. Simplified relational database interface.
Scenario, issues and needs Huge amount of medical data produced by several Centres (i.e. Hospitals) Medical Interconnection Network: data sharing among several centres Data integration, digital libraries, sharing are FP7 keywords Need to move towards open, distributed and service-based environments Patients Challenging issues Security Data distribution Data format heterogeneity Metadata management Metadata schema Transparent access to the system Scalable approach …. Infrastructure Clinicians 3
Grid for Biomedical Imaging, 1999 Parallel Rendering Computational Grid Globus Pre Processing Raw data Server HPC Data Acquisition Collaborative working
Surrounding context MediGrid …and more…
Issues (high level) Medical image storage and processing TAC, MRI, fMRI, PET, SPECT DICOM image format Privacy issues Very large data sets handling storage management (PACS) Metadata management Security of data Process complex algorithms with large computing power and memory requirements parallel processing workflow Management System (WFMS) TAC MRI fMRI SPECT PET
AGIR - gLite based www.aci-agir.org
Medicus - Globus based http://dev.globus.org/wiki/Incubator/MEDICUS
Computational Providers Medical Imaging Environment Info Data Virtualization Info DICOM Images DICOM Images Computational Providers DICOM Images Cloud Environment Info Grid Environment
Main Issues (architectural level) Full security support (data security/privacy) Integration effort is needed to achieve high level results Transparent management of data & metadata Preservation of data locality (distribution of data) Distribution of metadata Access via Web Gateway Workflow management for medical purposes high level tools for end users medical imaging methods composition Grid support (storage, computation) on a large scale
Medical Imaging - Data Virtualization Info Data Virtualization Info DICOM Images DICOM Images Distributed Data & Metadata Locality/Autonomy Scalability Legacy systems (PACS) Security Metadata DB Encryption Authorization/policies/roles Data Storage Encryption Network Communication protection Interchange Protocols DICOM Images Info
Computational Providers Medical Imaging - Analysis Info Data Virtualization Info DICOM Images DICOM Images Computational Providers Analysis Distributed Grid Based Cloud Based Complex Medical Workflow Secure Data anonymisation Data Encryption Authorization/Authentication Cloud Environment Grid Environment
System Architecture Security Web Gateway Orchestrator/Collective Services Grid Middleware Metadata Service Data Service Comput. Service MediGrid Fabric Layer (data and metadata) 13
Medical imaging - Storage Management of Medical Images Secure on site management of DICOM images Encryption of data Anonymisation of data for analysis outside the centre Access control through authorization Storage interfaces between distributed environment and local storage devices Secure and efficient data transport protocols Backup of data ….
Biomedical Imaging Processing Issues Huge amount of data 1 radiology department: 10TB/year; 1 CT dataset: 500MB-2GB. Compute-intensive analysis Registration noisy; Difficulty in the segmentation; Volume Reconstruction. Need to explore several imaging analysis algorithms, validation methods, visualization pipelines Grid Workflow Management System Scheduler Grid Data Mng CE Node
Computational Providers Medical Imaging - Workflow Info Info DICOM Images DICOM Images Workflow Example Denoising Segmentation Rendering Cloud Environment Computational Providers Grid Environment
Metadata on the GRID Metadata is data about data Metadata enables search and discovery Metadata on the GRID Mainly information about files Usually stored on DBs Need simple SOA & Grid interfaces for Metadata access Advantages Easier to use by clients Robust paradigm Exploiting Grid Security paradigm Addressing Interoperability First what is Metadata? A common definition is that metadata is data about data. On the GRID, this is mainly information describing files that is necessary for running jobs, that is, file metadata. But So, in a way accessing metadata is mainly about accessing databases. But having clients going directly to the database is not the most convenient solution. Better than that, is to have a simple interface for metadata access on the GRID. This interface should be defined in terms of metadata concepts, like entries, keys and values, instead of DB concepts. This has several advantages. It is easier to use by clients, since it exposes only metadata concepts and effectively hides the database. Having a simple interface that reveals DB functionality solves most of the problems. Simplified relational database interface.
Why are metadata so important? Metadata is the key to manage, route and retrieve your data properly PACS systems can be “federated” exploiting metadata Search and discovery of “clinical case studies” can be afforded by metadata system Metadata information increases the global knowledge about available patient data From data to metadata and from metadata to knowledge to support Physicians for diagnostic and therapeutic purposes How to manage “efficiently”, “securely” and transparently” metadata? METADATA SERVICE
Metadata Service Metadata service Compatibility with distrib. systems/security Full security support for metadata authorization (policies, roles, etc.) authentication (based on grid certificates) Access/Integration of metadata sources Distributed vs centralized approach Grid Access interface Common interface for RDBMS and XML-DB Database backend independence GSI support
Metadata Management: Stack
GRelC Project (starting date 2001) Grid Relational Catalog (GRelC) is a project which aims at designing and developing a set of efficient, secure and transparent Data Grid Services DB XML
Grid Metadata Handling System: architecture in the small GRelC DAIS
International Testbeds Lecce (Italy) Bejing (China) GRelC Data Access Data Sources (DB) gandalf.unile.it Linux x86 sara.unile.it Mac OS X sigma2.unile.it Linux IA64 gridsurfer.unile.it FreeBSD galileo.hpcc.unical.it sepac00.projects.cscs.ch spacina.na.infn.it
Test Performance
CMCC: a fully distributed data grid environment Grid Layer VOMS Server CMCC Data Grid Portal Metadata Server (GRelC DAIS) Grid Storage GridFTP GRelC Storage CMCC CA P2P layer Data and Metadata resources RDBMS (Oracle,MySQL, Postgres) XML DBs (eXist, XIndice, etc.) Search & Discovery Two step process Grid SS Grid SS Grid SS
A new research effort: Climate-G The main goal of Climate-G is to create a unified environment for climate change, able to concentrate in the same context big amount of data geographically spread among several centres, rich metadata descriptions, efficient data access services, advanced data analysis and visualization tools, etc. exploiting and joining knowledge and skills in the fields of climate change and computational science 26
Climate-G partners Università del Salento 27
Metadata Distribution and virtualization For each site: Relational DB (index) XML DB (entire schema) Virtualization/Integration layer: GRelC DAIS Virtualization allows to conceal: Data distribution Number of sites, RDBMS and XML back-ends P2P Topology Data Integration aspects technological details …
Security What about security? Authentication (mutual process) Authorization Access control lists Role Membership Services Data encryption (to ensure secure data communication) Data anonymisation (to move data outside the Centers) A certificate for each actor (user, service, host) Each action is associated with an actor Firewall protection …
Medical Web Gateway Main Functionalities Search & Discovery Data access & viz Workflow management Users and roles mng …. Features Easy to use interfaces Platform independent Secured by design Integrated environment Data and … Analysis tools Visualization tools Post-processing tools
Computational Providers Medical Imaging Environment Info DICOM Images Computational Providers Cloud Environemnt Grid Environment Data Virtualization
Questions? First what is Metadata? A common definition is that metadata is data about data. On the GRID, this is mainly information describing files that is necessary for running jobs, that is, file metadata. But So, in a way accessing metadata is mainly about accessing databases. But having clients going directly to the database is not the most convenient solution. Better than that, is to have a simple interface for metadata access on the GRID. This interface should be defined in terms of metadata concepts, like entries, keys and values, instead of DB concepts. This has several advantages. It is easier to use by clients, since it exposes only metadata concepts and effectively hides the database. Having a simple interface that reveals DB functionality solves most of the problems. Simplified relational database interface.