Integrated e-Infrastructure for Distributed, Data-driven, Data- intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for Computational Science University College London Integrating the Strengths of the e-Research Community, NeSC, Thursday, 10th March 2011
Computational Biomedicine –HIV/AIDS –Cardiovascular medicine –Cancer ICT, e-Health and the Virtual Physiological Human Infrastructure support Shortcomings in UK infrastructure Major policy hurdles UCL CLMS initiative Conclusions Contents
UCL Projects VPH Network of Excellence – EU (€8M); no HPC ContraCancrum – EU (€3.4M); no HPC VPH-Share – EU (€10.7M); no HPC P-Medicine – EU (€13.7M); no HPC INBIOMEDVision – EU (€2M) MAPPER – EU (€2M); no HPC A new approach to Science at the Life Sciences Interface – EPSRC (£4M) + HECToR Large Scale Lattice-Boltzmann Simulation of Liquid Crystals – EPSRC (£800K) + HECToR
Patient-specific medicine ‘Personalised medicine’ - use the patient’s specific profile to better manage disease or a predisposition towards a disease Tailoring of medical treatments based on the characteristics of an individual patient Patient-specific medical-simulation Use of genotypic and or phenotypic simulation to customise treatments for each particular patient, where computational simulation can be used to predict the outcome of courses of treatment and/or surgery Why use patient-specific approaches? Treatments can be assessed for their effectiveness with respect to the patient before being administered, saving the potential expense of ineffective treatments See: P. V. Coveney et al (eds), Interface Focus, Theme Issue on VPH Vol. 1, No. 3 Online 25 th April 2011
HIV-1 Protease is a common target for HIV drug therapy Enzyme of HIV responsible for protein maturation Target for Anti-retroviral Inhibitors Example of Structure Assisted Drug Design 9 FDA inhibitors of HIV-1 protease So what’s the problem? Emergence of drug resistant mutations in protease Render drug ineffective Drug resistant mutants have emerged for all FDA inhibitors Monomer B Monomer A Flaps Leucine - 90, 190 Glycine - 48, 148 Catalytic Aspartic Acids - 25, 125 Saquinavir P2 Subsite N-terminalC-terminal Medical/clinical domain I: HIV/AIDS Integrate simulation with conventional clinical decision support systems to refine results
The goal: to simulatelarge scale patient specific cerebral blood flow in clinically relevant time frames Objectives: To study cerebral blood flow using patient-specific image-based models. To provide insights into the cerebral blood flow & anomalies. To develop tools and policies by means of which users can better exploit the ability to reserve and co-reserve HPC resources. To develop interfaces which permit users to easily deploy and monitor simulations across multiple computational resources. To visualize and steer the results of distributed simulations in real time Yield patient-specific information which helps plan embolisation of arterio- venous malformations, aneurysms, etc. Medical/clinical domain II: Grid enabled neurosurgical imaging using simulation M. D. Mazzeo and P. V. Coveney, Computer Physics Communications, 178, (12), , (2008). DOI: /j.cpc DOI /j.cpc
Medical/clinical domain III: ContraCancrum Multi-level dataMulti-level Modelling Two dedicated clinical studies in ContraCancrum, one in glioma and one in lung cancer (200 cases/year) Clinically Oriented Translational Cancer Multilevel Modelling
Virtual Physiological Human Funded under EU FP 7; ~ €250M 20 projects: 1 NoE, 5 IPs, 11 STREPs, 3 CAs. “a methodological and technological framework that, once established, will enable collaborative investigation of the human body as a single complex system...” Networking NoE
VPH-Share Overview VPH-Share will provide the organisational fabric (the infostructure), realised as a series of services, offered in an integrated framework, to expose and to manage data, information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community. HIV Heart Aneurisms Musculoskeletal €11M, , EU FP7 – Promotes cloud technologies
p-medicine Predictive disease modeling in p-medicine will contribute to the optimization of cancer treatment by fully exploiting the individual data of the patient. p-medicine is focusing on Wilms tumor, breast cancer and acute lymphoblastic leukemia The p-medicine infrastructure supports both a generic seamless, multi-level data integration purpose and a VPH-specific, multi-level, cancer data repository to facilitate model validation and clinical translation through trials. The infrastructure is scalable for any disease as long as predictive modeling is clinically significant in one or more levels (from molecular to tissue level) and the development of such models is feasible (i.e. there is enough understanding of the biological mechanisms involved to develop them). Led by a clinical oncologist - Prof Norbert Graf! Disease Modelling at the molecular Level Disease Modelling at the cellular Level N SG 1 G 2 M G 0 A Disease Modelling at the tissue/organ Level Multi-scale therapy predictions/disease evolution results Multi-level disease modeling €13M, , EU FP7
Large scale data & computing 06/10/2015 Seamless access and integration of distributed, heterogeneous data in a data warehouse repeatedly over time (≈ 200 GB / patient and time point) Seamless access and integration of distributed, heterogeneous data in a data warehouse repeatedly over time (≈ 200 GB / patient and time point) Models are built for use in clinical decision support results are needed in a timely fashion results are needed in a timely fashion It is necessary to have the possibility of seamlessly “plugging in” resources for parallel and large scale computing “here and now” petascale computing is needed to perform e.g.: activities like drug binding affinity determination activities like drug binding affinity determination Blood flow through tumours Gratis via VPH-NoE supervised VPH Virtual Community allocations of time on DEISA and, in future PRACE via MAPPER, …?
MAPPER: Objectives and Challenges MAPPER will develop computational strategies, software and services for distributed multiscale simulations across disciplines, exploiting existing and evolving European e-Infrastructure. Driven by seven exemplar multiscale applications, MAPPER will deploy a computational science infrastructure for distributed multiscale computing on and across European e-Infrastructures. By taking advantage of existing software and services, MAPPER will deliver high quality components aiming at large-scale, heterogeneous, high performance multidisciplinary multiscale computing, while maintaining ease of use and transparency for end users. MAPPER will advance state-of-the-art in high performance computing on e-Infrastructures by enabling distributed execution, across all European e-Infrastructures, of multiscale models.
VPH ToolKit
VPH Virtual Community on DEISA + euHeart in second wave, and other non-VPH EU projects VPH was awarded 2 million standard DEISA core hours for 2009, renewed for 2010 and 2011 HECToR (Cray, UK) SARA (IBM Power 6, Netherlands) DEISA-TeraGrid interoperability project has additional access to LRZ
Computational experiments integrated seamlessly into current clinical practice Clinical decisions influenced by patient specific computations: turnaround time for data acquisition, simulation, post-processing, visualisation, final results and reporting. Fitting the computational time scale to the clinical time scale : –Capture the clinical workflow –Get results which will influence clinical decisions: 1 day? 1 week? –This project - 15 to 30 minutes Development of procedures and software in consultation with clinicians Security/Access is major concern Need to integrate Data, Compute via Workflows On-demand availability of storage, networking and computational resources VPH requires HPC and Data Integration
Many of the projects we are involved in have non-standard requirements with respect to HPC service providers Ability to co-reserve resources HARC Launch emergency simulations SPRUCE Consistent interfaces for federated access AHE Access to back end nodes: steering, visualisation Lightpath network connections Data integration from multiple sources IMENSE Support for software (ReG steering toolkit etc)
Individualized MEdiciNe Simulation Environment IMENSE Data repository – this is the key store for project data containing all patient data, and simulation data derived from the patient data. Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria. Web Services – the web services platform implements required data processing functions. Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre- defined workflows to automate moving data between the data environment and multiple data processing services. Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
IMENSE Interface IMENSE Environment
Workflows GSEngine is a workflow orchestration engine developed by the ViroLab project Can be used to orchestrate applications launched by AHE It allows services to be orchestrated using both point and click and scripting interfaces Workflows stored in a repository and shared between users Many of the aims of ViroLab similar to VPH-I projects, so GSEngine will be useful here Malawski et al, Future Generation Computer Systems, 26, (1), 138—146, 2010
Inside IMENSE: Integrating the components Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
UK Infrastructural Failures UK computing e-Infrastructure is crumbling. Not a holding partner in PRACE. No Tier-0 site in the works. Only one Tier-1 machine (with issues). HECToR has had several major failures, researchers seem to have trouble using/trusting it, given its usage. What’s happening next? Tier-2 facilities are also being dismantled. NGS core nodes being shut down!! We cannot maintain a good level of e-Science research without the infrastructure to support it Relative to other countries we’re in full scale retreat!
Infrastructure in the UK is fragmented 22 Data HPC NGS Networks ?
TeraGrid eXtreme Digital (XD) Two sets of services: –XES will provide a set of well-known (and standard) protocol specifications and profiles –CPS will support both the diversity of different services and capabilities required by the community From the desktop to the largest machines! XD design is firmly tied to the user requirements of the science and engineering research community. Presents the individual user with a common user environment Caters to both researchers whose computations require very little data movement and those performing very data-intensive computations. Will offer a highly capable service interface to “community user accounts” such as science gateways
We face major policy hurdles For our projects to be successful, we need integrated compute, storage, networks and services. –HPC’s antediluvian policies prevent this from happening They still have a batch job mentality! No coordinated allocations policies in the EU –Need to apply for a project, then if successful apply for compute access Can’t do project if compute application rejected!
Importance of connectivity With limited national facilities, connectivity to other countries becomes crucial. 1-10Gbit wide area networks are needed for large simulations and data movements. However, network provisioning is currently extremely difficult and time-consuming. Researchers end up having to request the links, rather than resource providers.
Policy issues E-science research has always required changes in resource provider policies to thrive. Support for advance machine and network reservations. Including urgent computing. Improvements in accessibility and usability. Support for Audited Credential Delegation. Interoperability between machines & infrastructures. DEISA’s Failure to address this augurs poorly for the future
Political issues Streamlined procedures for UK or EU scientific projects. All-in proposals which, when accepted, grant everything needed for a research project. This includes funding for research as well as HPC resource allocations. More sensible service level agreements. If a simulation uses multiple machines and one fails, a full allocation refund should be given. MAPPER Policy Document – copies available
Supported by the Provost's Strategic Fund Computational Life and Medical Science The CLMS Network is 3 year initiative from September 2010 Management: Dean’s Committee Steering Committee Director: P.V. Coveneyhttp://
1.Expand UCL’s world-leading position in life and biomedical sciences 2.Steering the collaboration with academic institutions: within UCL, with UCLP and the NHS, UK-CMRI, Yale, and others 3.Exploit initiatives in integrative biomedical systems science from the UK Research Council, EU and others around the world 4.Grow collaborations with industry, create business and commercial opportunities, promote UCL IP licensing 5.Plan for the next stages of activity in computational life and medical sciences at UCL CLMS Goals CLMS brings together UCL researchers with clinicians from UCL partners to develop shared data + compute + data transfer + application support services Integrated e-Infrastructure and Services
Conclusions Biomedical projects all put pressure on resource providers to offer new services and new ways of working For interactive and urgent work the batch processing model does not work The very conservative model adopted by HPC providers proscribes their resources from being used in innovative ways to do new science and engage new and different kinds of users If HPC is to be exploited in computational biomedicine it needs to be used in a way that fits in with the medical & clinical workflow VPH and similar initiatives: Will only increase pressure for non-standard services from resource providers