VPH Overview and Data Requirements Peter Coveney University College London 1.

VPH Overview and Data Requirements Peter Coveney University College London 1

Centre for Computational Science Advancing science through computers Computational Science Algorithms, code development & implementation High performance, data-intensive & distributed computing Visualisation & computational steering Condensed matter physics & chemistry, materials & life sciences Translational medicine, e-Health & VPH – “data deluge” 2

Patient-specific medicine ‘Personalised medicine’ - use the patient’s genotype/phenotype to better manage disease or predisposition towards a disease Tailoring of medical treatments based on the characteristics of an individual patient Patient-specific medical-simulation Use of genotypic and/or phenotypic simulation to customise treatments for each particular patient -- computational modelling can be used to predict the outcome of courses of treatment and/or surgery Why use patient-specific approaches? Treatments can be assessed for their effectiveness with respect to the patient before being administered, saving the potential expense and trauma of multiple/ineffective treatments 3 The IT Future of Medicine Project

Virtual Physiological Human (VPH) €207M initiative in EU-FP7 Aims Enable collaborative investigation of the human body across all relevant scales. Introduce multiscale methodologies into medical and clinical research The VPH framework is: Descriptive Integrative Predictive Organism Organ Tissue Cell Organelle Interaction Protein Cell Signals Transcription Gene Molecule 4 VPH DEISA Virtual Community established in collaboration with Hermann Lederer, MPS RZG

P-MEDICINE: From data sharing and integration via VPH models to personalized medicine Predictive disease modelling Exploiting the individual data of the patient in federated data warehouse Optimization of cancer treatment (Wilms tumor, breast cancer and acute lymphoblastic leukemia) Infrastructure supports: –generic seamless, multi-level data integration –VPH-specific, multi-level, cancer data repository –model validation and clinical translation through trials Scalable for any disease - as long as: –predictive modeling is clinically significant in one or more levels –development of such models is feasible Disease Modelling at the molecular Level Disease Modelling at the cellular Level N SG 1 G 2 M G 0 A Disease Modelling at the tissue/organ Level Multi-scale therapy predictions/disease evolution results Generic multi-level disease modeling Led by a clinical oncologist - Prof Norbert Graf! €13M, 2011-2013, EU FP7 http://www.p-medicine.eu/

6 P-MEDICINE architecture

VPH-SHARE overview VPH-SHARE is developing the organisational fabric and integrate the optimised services to: –expose and share patient data (imaging, clinical, biomedical signals) –jointly develop multiscale models for the composition of new VPH workflows from @neurIST, euHeart, VPHOP, and Virolab projects –facilitate collaborations within the VPH community –evaluate the effectiveness and fitness-for -purpose of Data and Compute Cloud platforms for biomedical applications The project focuses on a key bottleneck: the interface with the wealth of data from medical research infrastructures and from clinical processes. Led by Rod Hose, Sheffield, UK (http://vph-share.org/)http://vph-share.org/

VPH-Share Overview VPH-Share will provide the organisational fabric realised as a series of services, offered in an integrated framework, to expose and to manage data, information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community. HIV Heart Aneurisms Musculoskeletal €11M, 2011-2015, EU FP7 – Promotes cloud technologies 8

Septembe r 18th- 20th, 2012. London, UK VIP for VPH - CNRS and UCL 9 Use case scenario – EP9: VIP for VPH ● Data transfer from the EGI data storage system to the PRACE HPC resource.

Septembe r 18th- 20th, 2012. London, UK VIP for VPH - CNRS and UCL 10 Use case scenario – EP9: VIP for VPH ● The data stops over on a server machine on the way from EGI to Supercomputer and on the way back. - Bottleneck, delay? - Lost data? - Data overload? ● Specialised and dedicated data storage solution?

VIP for VPH - CNRS and UCL Use case scenario – EP9: VIP for VPH ● Data storage requirements: 3D+t image simulation, a quick calculation. (In the use case scenario of the 3D heart, the frame size in disk is 16 MB). { size per frame x frames per second x t } x user executions x... = biomed VO: 2.3 PB stored (of which VIP = 0.5 PB) A 2D MR balanced steady state free pre- cession (bSSFP) sequence at 1.5T was simulated on a cardiac cycle (14 instants) extracted from the ADAM model. Video from The Virtual Imaging Platform – VIP http://vip.creatis.insa-lyon.fr

Data organization imaging data (fMRI, etc) in the standardized DICOM format –DICOM header includes specs for imaging data organization –DICOM header includes typical metadata (patientID, doctor, time, etc.) –overall data organization is stored in PACS server microphotos in the JPEG format –limited metadata (patientID, etc) is in the filename treatment data in XLS files in defined structure (but not verified) –patientID and some other MD is stored in the XLS file genetic data –all relevant incl. gen snippets in XLS files –patientID and some other MD is stored in XLS as well simulation model data –yet simulations only per patient, i.e. patientID sufficient to indicate relation –parameter sets are provided to simulation –parameter sets include some MD –simulation output is widely textual contained in a small database 12

Date workflow 13 1.Replicate data collection from master storage to storage close to HPC system(s). 2.Run simulation by copying data from storage to HPC workspace and start the simulation. 3.Copy results from the simulation back into the storage. 4.Replicate the results to slave archives also working on the same data collection. 5.Replicate the simulation results to the master. 6.Slaves notify the master about all extra copies. 7.The master creates or updates the PID records based on the notifications from steps 5 and 6.

Workflows are integrated with VPH Portals 14

Data locations and infrastructures: IMENSE Aims Central integrated repository of patient data for project clinicians & researchers Storage of and audit trail of computational results Interfaces for data collection, editing and display Ultimately provide a data environment for Integration of multi-scale data Decision support environment for clinicians Critical factors for success and longevity Use standards & OS solutions Use pre-existing EU FP6/FP7 solutions & interaction with VPH NoE ToolKit 15

Data locations and infrastructures: 16

IMENSE Components Data repository – this is the key store for project data containing all patient data, and simulation data derived from the patient data. Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria. Web Services – the web services platform implements required data processing functions. Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre- defined workflows to automate moving data between the data environment and multiple data processing services. 17

Tools and protocols for data access

Tools and protocols for data movement 19 Many users have data stored on EUDAT & internal project infrastructure and need to perform analysis using a subset of this data  e.g. simulations or data mining. The data needs to be moved from the storage infrastructure to the resource doing the processing  This may be a PRACE machine, an EGI cluster or a local machine. The data collection is likely to be large, and will be needed at the compute site for an extended period of time When the simulations/analysis has finished results need to be brought back to EUDAT

Requirements 20 Users can specify which collections to replicate for a simulation. Users should be able to specify which data centers to use for the dynamic replication. Preferably close to the HPC system. Users can specify how long the data should be kept close to the HPC system. Data is moved from the storage to the HPC workspace. Replicas across multiple HPC centers should be kept in sync once a simulation is run in one of the centers (step 4 in scenario 2). PID Server returns optimal URL (see PID service case description for details). Community Managers should be able to manage user permissions. Community Managers want to know whether the replicas are identical to the source (auditing). There is the need to control what user can do in terms of starting replications to and simulations on HPC systems, restrictions on how long user can keep data in storage…

Scenario: Multiple simulations are run on multiple PRACE systems 21

VPH Tools: AHE Application Hosting Environment –Simplifying Access to the Grid –Community Model. Simplifies security –End-User avoids grid security and MyProxy configuration and generation. Simplifies application setup –End-User does not have to compile, optimise, install and configure applications. Simplifies basic workflow –AHE stages the data, runs and polls the job and fetches the results automatically Simplifies compute access through RESTful web- services –Provides a RESTful interface –Clients and services access infrastructure and apps with ‘Software as a Service’

Bridging the gap PRACE UK NGS Leeds Manchester Oxford RAL HECToR EGI Local resources GridSAM Globus UNICORE XSEDE Globus

Authentication via local credentials  A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service.  User uses a local authentication service to authenticate to our gateway service.  Our gateway service provides a session key (not shown) to our modified AHE client and our modified AHE Server to enable the AHE client to authenticate to the AHE Server.  Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid.  User now has no certificate interaction.  Private key of the certificate is never exposed to the user. User Credential Repository Our “gateway” service Grid Middleware Computational Grid PI, sysadmin, … certificate Modified AHE Server using our modified AHE client and a local authentication service Audited Credential Delegation

AHE & ACD Usability Study Usability: we have completed a comprehensive usability study that involved: –Comparing AHE+ACD (GUI), AHE (GUI) and UNICORE GUI, AHE command line and Globus TK command line –40 users from different UCL departments (Physics, Computer Science, Medical school, Business School, Chemistry, Cancer Institute, Law School) –Task: run a simulation on Grid (NGS) using the above middlewares and use credentials given to them (username/password, X509 Certificate) –Result: AHE+ACD scored best in respect of: Time needed to run the task Ease of Configuring the tool Ease of running the whole task. 25 S. J. Zasada, A. N. Haidar, and P. V. Coveney, “On the Usability of Grid Middleware and Security Mechanisms”. Phil. Trans. R. Soc. A, 2011, 369 (1949) 3413-3428; doi:10.1098/rsta.2011.0131

User requirements Users need to run application codes on one or more resources (potentially from federated grids) Users don’t care where the resources are, what underlying OS/architecture they are using  they just want to run their application Users are interested in minimizing their turn around time  they just want their results NOW! System should be responsive and scalable. User may require access to more than one resource  e.g. compute and viz machine. So, system must be decentralized, scalable and allow users to specify the constraints in which their jobs run. 26

RAMP – the Resource Allocation Market Place Our system uses a combinatorial, multi-attribute reverse auction mechanism Users can specify their requirements and find resources appropriate to their needs Implemented as a p2p multi-agent system, with resource management and user agents. Agents can join or leave the system as required p2p means there is no central point of failure Agents act independently and autonomously, and compete for jobs from users System consists of two types of agent – user agents and resource agents. 27

Acknowledgements People Derek Groen Ali Haidar Nour Shublaq Hywel Carver Rupert Nash Jacob Swadling Marco Mazzeo Dave Wright Ben Jefferys David Chang 28 Funding and Projects

VPH Overview and Data Requirements Peter Coveney University College London 1.

Similar presentations

Presentation on theme: "VPH Overview and Data Requirements Peter Coveney University College London 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VPH Overview and Data Requirements Peter Coveney University College London 1.

Similar presentations

Presentation on theme: "VPH Overview and Data Requirements Peter Coveney University College London 1."— Presentation transcript:

Similar presentations

About project

Feedback