VPH Overview and Data Requirements Peter Coveney University College London 1.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AAI in EGI Status and Evolution Peter Solagna Senior Operations Manager
XSEDE 13 July 24, Galaxy Team: PSC Team:
Grid Security. Typical Grid Scenario Users Resources.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Chapter 19: Network Management Business Data Communications, 4e.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
The CrossGrid project Juha Alatalo Timo Koivusalo.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
New Communities: The Virtual Physiological Human Use Case Stefan Zasada University College London
Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Utility Computing: Security & Trust Issues Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
1 AHM, 2–4 Sept 2003 e-Science Centre GRID Authorization Framework for CCLRC Data Portal Ananta Manandhar.
SHIWA and Coarse-grained Workflow Interoperability Gabor Terstyanszky, University of Westminster Summer School Budapest July 2012 SHIWA is supported.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
E-Science Security Roadmap Grid Security Task Force From original presentation by Howard Chivers, University of York Brief content:  Seek feedback on.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Evolution of AAI for e- infrastructures Peter Solagna Senior Operations Manager.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
SAGE Nick Beard Vice President, IDX Systems Corp..
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Collaborative Tools for the Grid V.N Alexandrov S. Mehmood Hasan.
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Imaging Workspace An Overview and Roadmap Eliot L. Siegel, MD Imaging Workspace Lead SME January 23, 2008.
1 Globe adapted from wikipedia/commons/f/fa/ Globe.svg IDGF-SP International Desktop Grid Federation - Support Project SZTAKI.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Accessing the VI-SEEM infrastructure
AAI for a Collaborative Data Infrastructure
Model Execution Environment for Investigation of Heart Valve Diseases
Similarities between Grid-enabled Medical and Engineering Applications
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Grid Computing Software Interface
Presentation transcript:

VPH Overview and Data Requirements Peter Coveney University College London 1

Centre for Computational Science Advancing science through computers Computational Science Algorithms, code development & implementation High performance, data-intensive & distributed computing Visualisation & computational steering Condensed matter physics & chemistry, materials & life sciences Translational medicine, e-Health & VPH – “data deluge” 2

Patient-specific medicine ‘Personalised medicine’ - use the patient’s genotype/phenotype to better manage disease or predisposition towards a disease Tailoring of medical treatments based on the characteristics of an individual patient Patient-specific medical-simulation Use of genotypic and/or phenotypic simulation to customise treatments for each particular patient -- computational modelling can be used to predict the outcome of courses of treatment and/or surgery Why use patient-specific approaches? Treatments can be assessed for their effectiveness with respect to the patient before being administered, saving the potential expense and trauma of multiple/ineffective treatments 3 The IT Future of Medicine Project

Virtual Physiological Human (VPH) €207M initiative in EU-FP7 Aims Enable collaborative investigation of the human body across all relevant scales. Introduce multiscale methodologies into medical and clinical research The VPH framework is: Descriptive Integrative Predictive Organism Organ Tissue Cell Organelle Interaction Protein Cell Signals Transcription Gene Molecule 4 VPH DEISA Virtual Community established in collaboration with Hermann Lederer, MPS RZG

P-MEDICINE: From data sharing and integration via VPH models to personalized medicine Predictive disease modelling Exploiting the individual data of the patient in federated data warehouse Optimization of cancer treatment (Wilms tumor, breast cancer and acute lymphoblastic leukemia) Infrastructure supports: –generic seamless, multi-level data integration –VPH-specific, multi-level, cancer data repository –model validation and clinical translation through trials Scalable for any disease - as long as: –predictive modeling is clinically significant in one or more levels –development of such models is feasible Disease Modelling at the molecular Level Disease Modelling at the cellular Level N SG 1 G 2 M G 0 A Disease Modelling at the tissue/organ Level Multi-scale therapy predictions/disease evolution results Generic multi-level disease modeling Led by a clinical oncologist - Prof Norbert Graf! €13M, , EU FP7

6 P-MEDICINE architecture

VPH-SHARE overview VPH-SHARE is developing the organisational fabric and integrate the optimised services to: –expose and share patient data (imaging, clinical, biomedical signals) –jointly develop multiscale models for the composition of new VPH workflows euHeart, VPHOP, and Virolab projects –facilitate collaborations within the VPH community –evaluate the effectiveness and fitness-for -purpose of Data and Compute Cloud platforms for biomedical applications The project focuses on a key bottleneck: the interface with the wealth of data from medical research infrastructures and from clinical processes. Led by Rod Hose, Sheffield, UK (

VPH-Share Overview VPH-Share will provide the organisational fabric realised as a series of services, offered in an integrated framework, to expose and to manage data, information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community. HIV Heart Aneurisms Musculoskeletal €11M, , EU FP7 – Promotes cloud technologies 8

Septembe r 18th- 20th, London, UK VIP for VPH - CNRS and UCL 9 Use case scenario – EP9: VIP for VPH ● Data transfer from the EGI data storage system to the PRACE HPC resource.

Septembe r 18th- 20th, London, UK VIP for VPH - CNRS and UCL 10 Use case scenario – EP9: VIP for VPH ● The data stops over on a server machine on the way from EGI to Supercomputer and on the way back. - Bottleneck, delay? - Lost data? - Data overload? ● Specialised and dedicated data storage solution?

VIP for VPH - CNRS and UCL Use case scenario – EP9: VIP for VPH ● Data storage requirements: 3D+t image simulation, a quick calculation. (In the use case scenario of the 3D heart, the frame size in disk is 16 MB). { size per frame x frames per second x t } x user executions x... = biomed VO: 2.3 PB stored (of which VIP = 0.5 PB) A 2D MR balanced steady state free pre- cession (bSSFP) sequence at 1.5T was simulated on a cardiac cycle (14 instants) extracted from the ADAM model. Video from The Virtual Imaging Platform – VIP

Data organization imaging data (fMRI, etc) in the standardized DICOM format –DICOM header includes specs for imaging data organization –DICOM header includes typical metadata (patientID, doctor, time, etc.) –overall data organization is stored in PACS server microphotos in the JPEG format –limited metadata (patientID, etc) is in the filename treatment data in XLS files in defined structure (but not verified) –patientID and some other MD is stored in the XLS file genetic data –all relevant incl. gen snippets in XLS files –patientID and some other MD is stored in XLS as well simulation model data –yet simulations only per patient, i.e. patientID sufficient to indicate relation –parameter sets are provided to simulation –parameter sets include some MD –simulation output is widely textual contained in a small database 12

Date workflow 13 1.Replicate data collection from master storage to storage close to HPC system(s). 2.Run simulation by copying data from storage to HPC workspace and start the simulation. 3.Copy results from the simulation back into the storage. 4.Replicate the results to slave archives also working on the same data collection. 5.Replicate the simulation results to the master. 6.Slaves notify the master about all extra copies. 7.The master creates or updates the PID records based on the notifications from steps 5 and 6.

Workflows are integrated with VPH Portals 14

Data locations and infrastructures: IMENSE Aims Central integrated repository of patient data for project clinicians & researchers Storage of and audit trail of computational results Interfaces for data collection, editing and display Ultimately provide a data environment for Integration of multi-scale data Decision support environment for clinicians Critical factors for success and longevity Use standards & OS solutions Use pre-existing EU FP6/FP7 solutions & interaction with VPH NoE ToolKit 15

Data locations and infrastructures: 16

IMENSE Components Data repository – this is the key store for project data containing all patient data, and simulation data derived from the patient data. Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria. Web Services – the web services platform implements required data processing functions. Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre- defined workflows to automate moving data between the data environment and multiple data processing services. 17

Tools and protocols for data access

Tools and protocols for data movement 19 Many users have data stored on EUDAT & internal project infrastructure and need to perform analysis using a subset of this data  e.g. simulations or data mining. The data needs to be moved from the storage infrastructure to the resource doing the processing  This may be a PRACE machine, an EGI cluster or a local machine. The data collection is likely to be large, and will be needed at the compute site for an extended period of time When the simulations/analysis has finished results need to be brought back to EUDAT

Requirements 20 Users can specify which collections to replicate for a simulation. Users should be able to specify which data centers to use for the dynamic replication. Preferably close to the HPC system. Users can specify how long the data should be kept close to the HPC system. Data is moved from the storage to the HPC workspace. Replicas across multiple HPC centers should be kept in sync once a simulation is run in one of the centers (step 4 in scenario 2). PID Server returns optimal URL (see PID service case description for details). Community Managers should be able to manage user permissions. Community Managers want to know whether the replicas are identical to the source (auditing). There is the need to control what user can do in terms of starting replications to and simulations on HPC systems, restrictions on how long user can keep data in storage…

Scenario: Multiple simulations are run on multiple PRACE systems 21

VPH Tools: AHE Application Hosting Environment –Simplifying Access to the Grid –Community Model. Simplifies security –End-User avoids grid security and MyProxy configuration and generation. Simplifies application setup –End-User does not have to compile, optimise, install and configure applications. Simplifies basic workflow –AHE stages the data, runs and polls the job and fetches the results automatically Simplifies compute access through RESTful web- services –Provides a RESTful interface –Clients and services access infrastructure and apps with ‘Software as a Service’

Bridging the gap PRACE UK NGS Leeds Manchester Oxford RAL HECToR EGI Local resources GridSAM Globus UNICORE XSEDE Globus

Authentication via local credentials  A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service.  User uses a local authentication service to authenticate to our gateway service.  Our gateway service provides a session key (not shown) to our modified AHE client and our modified AHE Server to enable the AHE client to authenticate to the AHE Server.  Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid.  User now has no certificate interaction.  Private key of the certificate is never exposed to the user. User Credential Repository Our “gateway” service Grid Middleware Computational Grid PI, sysadmin, … certificate Modified AHE Server using our modified AHE client and a local authentication service Audited Credential Delegation

AHE & ACD Usability Study Usability: we have completed a comprehensive usability study that involved: –Comparing AHE+ACD (GUI), AHE (GUI) and UNICORE GUI, AHE command line and Globus TK command line –40 users from different UCL departments (Physics, Computer Science, Medical school, Business School, Chemistry, Cancer Institute, Law School) –Task: run a simulation on Grid (NGS) using the above middlewares and use credentials given to them (username/password, X509 Certificate) –Result: AHE+ACD scored best in respect of: Time needed to run the task Ease of Configuring the tool Ease of running the whole task. 25 S. J. Zasada, A. N. Haidar, and P. V. Coveney, “On the Usability of Grid Middleware and Security Mechanisms”. Phil. Trans. R. Soc. A, 2011, 369 (1949) ; doi: /rsta

User requirements Users need to run application codes on one or more resources (potentially from federated grids) Users don’t care where the resources are, what underlying OS/architecture they are using  they just want to run their application Users are interested in minimizing their turn around time  they just want their results NOW! System should be responsive and scalable. User may require access to more than one resource  e.g. compute and viz machine. So, system must be decentralized, scalable and allow users to specify the constraints in which their jobs run. 26

RAMP – the Resource Allocation Market Place Our system uses a combinatorial, multi-attribute reverse auction mechanism Users can specify their requirements and find resources appropriate to their needs Implemented as a p2p multi-agent system, with resource management and user agents. Agents can join or leave the system as required p2p means there is no central point of failure Agents act independently and autonomously, and compete for jobs from users System consists of two types of agent – user agents and resource agents. 27

Acknowledgements People Derek Groen Ali Haidar Nour Shublaq Hywel Carver Rupert Nash Jacob Swadling Marco Mazzeo Dave Wright Ben Jefferys David Chang 28 Funding and Projects