Distributed Computing Infrastructures for e-Science: Future Perspectives EGI Technical Forum 2012 The Clarion Congress Hotel, Freyova 945/33, Prague 18.

Slides:



Advertisements
Similar presentations
FP7 Preparations ISTC meeting 31 March Content FP7 preparation approach and timetable Context for FP7 and for ICT in FP7 Research in New Financial.
Advertisements

An overview of the EGEE project Bob Jones EGEE Technical Director DTI International Technology Service-GlobalWatch Mission CERN – June 2004.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
Policy recommendations for wider implementation of telemedicine Peeter Ross, MD, PhD e-Health expert, Estonian eHealth Foundation, Estonia.
Steven Newhouse, Head of Technical Services Virtualisation and Cloud Computing at EBI.
e-ScienceTalk: Supporting Grid and High Performance Computing Reporting across Europe GA No September 2010 – 31 May 2013.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.
EMBL-EBI and Bioinformatics Steven Newhouse, Head of Technical Services, EMBL-EBI.
Research and Innovation Research and Innovation Research and Innovation Research and Innovation Research Infrastructures and Horizon 2020 The EU Framework.
E-Infrastructures in WP European Commission – DG CNECT eInfrastructure Presentation for national contact points.
Steven Newhouse, Head of Technical Services European Bioinformatics Institute: ICT Challenges.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
The Preparatory Phase Proposal a first draft to be discussed.
ISBE An infrastructure for European (systems) biology Martijn J. Moné Seqahead meeting “ICT needs and challenges for Big Data in the Life Sciences” Pula,
European Life Sciences Infrastructure for Biological Information ELIXIR
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Procurement Innovation for Cloud Services in Europe CERN – 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium.
Scientific Data Infrastructure: activities in the Capacities Programme of FP7 Presentation at euroCRIS Workshop, Brussels 15 September 2009 "The views.
1 European policies for e- Infrastructures Belarus-Poland NREN cross-border link inauguration event Minsk, 9 November 2010 Jean-Luc Dorel European Commission.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
1 INFRA : INFRA : Scientific Information Repository supporting FP7 “The views expressed in this presentation are those of the author.
EARNEST Workshop Research Networking The Next Challenge Berlin, 23 May 2006 Wim Jansen.
Grids, Clouds and the Community. Cloud Technology and the NGS Steve Thorn Edinburgh University Matteo Turilli, Oxford University Presented by David Fergusson.
Notur: - Grant f.o.m is 16.5 Mkr (was 21.7 Mkr) - No guarantees that funding will increase in Same level of operations maintained.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Vision for European DCIs Steven Newhouse Project Director, EGI-InSPIRE 15/09/2010.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ?? Athens, May 5-6th 2009 Community Support.
European Life Sciences Infrastructure for Biological Information Life science community update for the 7 th Federated Identity Management.
A public-private partnership building a multidisciplinary cloud platform for data intensive science Bob Jones Head of openlab IT dept CERN This document.
This document produced by Members of the Helix Nebula Partners and Consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions.
European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bob Jones EGEE project director CERN.
ELIXIR: a sustainable infrastructure for biological information in Europe Workshop on the future of Big Data Management The Blackett Laboratory, Imperial.
The industrial relations in the Commerce sector EU Social dialogue: education, training and skill needs Ilaria Savoini Riga, 9 May 2012.
EGI-InSPIRE Steven Newhouse Interim EGI.eu Director EGI-InSPIRE Project Director Technical Director EGEE-III 1GDB - December 2009.
EMI INFSO-RI EMI Roadmap to Standardization and DCI Collaborations Alberto Di Meglio (CERN) Project Director.
European Life Sciences Infrastructure for Biological Information ELIXIR and Identity Management 2 nd Workshop on Federated Identity.
EMBL-EBI Data Archives – An Overview. The EMBL-EBI mission Provide freely available data and bioinformatics services to all facets of the scientific community.
1 The European Open Science Cloud: Open Day Event EMBL, Heidelberg, 20 January 2016 Joint Research Centre (JRC) The European Commission’s in-house science.
A European Open Science Cloud
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
NORDUnet NORDUnet e-Infrastrucure: Grids and Hybrid Networks Lars Fischer CTO, NORDUnet Fall 2006 Internet2 Member Meeting, Chicago.
Erwin Laure ScalaLife Project Director.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
3rd Helix Nebula Workshop on Interoperability among e-Infrastructures and Commercial Clouds Carmela ASERO, EGI.eu 17 September 2013, Madrid
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
Cultural Heritage in Tomorrow ’s Knowledge Society Cultural Heritage in Tomorrow ’s Knowledge Society Claude Poliart Project Officer Cultural Heritage.
Economical opportunities stemming from data and computing e- infrastructures Stakeholders consultation on computing and data for the WP Brussels,
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
H. PERO, E uropean Commission European policy developments and challenges in the field of Research Infrastructures.
European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
1      Structural Funds DG REGIO + RESEARCH INFRASTRUCTURES for EU COMPETITIVENESS: research and innovation assets Giorgio Rossi, Unimi, ESFRI EB.
European Life Sciences Infrastructure for Biological Information European Life Sciences Infrastructure for Biological Information.
DutchGrid KNMI KUN Delft Leiden VU ASTRON WCW Utrecht Telin Amsterdam Many organizations in the Netherlands are very active in Grid usage and development,
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Panel discussion on Principles of Engagement
Clouds , Grids and Clusters
ELIXIR: Potential areas for collaboration with e-Infrastructures
EGEE support for HEP and other applications
ELIXIR: Authentication and Authorization Infrastructure Requirements
GÉANT International Networking and Collaboration
EGI-Engage Engaging the EGI Community towards an Open Science Commons
ELIXIR Safeguarding the results of life science research in Europe
EGI Webinar - Introduction -
Common Authentication and Authorisation Service for Life Science Research Mikael Linden, ELIXIR Finland.
Presentation transcript:

Distributed Computing Infrastructures for e-Science: Future Perspectives EGI Technical Forum 2012 The Clarion Congress Hotel, Freyova 945/33, Prague 18 September 2012 Andrew Lyall PhD, ELIXIR Project Manager Distributed Computing for Life-Sciences & Medical Research in the Genome-Age.

European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory International organisation created by treaty (cf CERN, ESA) 20 year history of service provision and scientific excellence EMBL-EBI has 500+ Staff, €50 Million Budget, at least a million users, 20 petabytes of data, 10,000 cpus Data are doubling in less than a year Bandwidth between disk and memory is at least as big an issue as obtaining sufficient CPU-cycles Disk space t

ELIXIR: A sustainable infrastructure for biological information in Europe… medicine bioindustries society ESFRI BMS RI & e-Infrastructure coordinated by EMBL-EBI Entering construction phase Interim board is meeting regularly New data centres commissioned in London Technical hub building construction is under way Twelve countries have joined already Many more have expressed interest Over €100 Million already invested More than 60 proposals for nodes environment

ELIXIR is a distributed infrastructure…

FP7-funded cluster project -First European consortium coordinated by ELIXIR -Includes the ESFRI BMS RI and European e-Infrastructure Providers Award 10.6 M€, 4 years, 21 partners in 9 countries -Austria, Denmark, Finland, France, Germany, Italy, Netherlands, Sweden, UK e-Infrastructure Construction -to allow interoperability between data and services in the biological, medical, translational and clinical domains: formats, standards, conventions, provenance, models, ontologies, etc Provide computational ‘data and service’ bridges -between the BMS RIs, linking basic biological research and data to clinical research and associated data 5 BioMedBridges

BioMedBridges Technology Watch Representatives of GÉANT, DANTE, EGI.eu, PRACE & CERN & technical experts from the ESFRI BMS RIs Monitor and report on developments and provide advice to the project Facilitate the adoption of e-Infrastructure, technologies & standards by the BioMedBridges WP and the BMS RI Communicates advice from the ICT Infrastructures and the e-Infrastructures to the BioMedBridges partners 6

Future perspectives… 1.What are the requirements of "big science" for Distributed Computing Infrastructures vs. the requirements of "everyday science"? Are they compatible? 2.Which science fields will drive the evolution of the future DCI? 3.What are the present DCI constraints hampering its take-up in science? 4.What do you expect from Cloud computing for the scientific community?

Biology is becoming “big science”… The Human Genome Project was the archetype for large multinational big-science projects in biology It has created new ways of doing biology and medical research where data generation and analysis are the main tasks Hospitals and other health-research institutes will be generating and using huge amounts of data (cf. ELSI) Storing, archiving and moving these data around are now significant challenges I/O between primary and secondary storage is at least as big a bottle neck as CPU-cycles – this appears not to have been the case in other sciences

Many different modes of data & bandwidth utilisation Internet Thousands of small data producers Many millions of small data users A few international collaborators A few very large data producers gigabytes megabytes exabytes petabytes terabytes petabytes A significant number of large data users terabytes

History of Cloud Computing at EMBL-EBI Since its inception, EBI has had the remit to provide services to a wide range of users using an “easy-as- possible” usage model This led naturally to deployment via a web-browser interface on a thin-client – essentially equivalent to Software-as-a-Service EBI has also been an early adopter of virtualisation as the optimum way to enable its service provision More recently we have evaluated the Cloud market place in order to select products for the many diverse cloud projects that we are undertaking 10

Drivers for Cloud adoption at EMBL-EBI 1.Compute: Provision of compute to diverse users 2.Deployment: Provision of services at remote locations 3.Big Data: Moving compute to data 4.Security: Providing collaborators with secure access 5.Collaboration: Participation in international projects 6.Other: … 11 Notes: In additional to these EBI-specific drivers, there are also more generic ones including (i) the need to manage the rapid pace of change, (ii) unsustainable increases in cost and (iii) the need to manage increasing complexity.

Example projects 1.The EMBL-EBI Private & Public Clouds 2.The ENSEMBL Amazon Cloud 3.Cloud solutions for personalised medicine 4.Embassy Clouds 5.The Helix-Nebula Cloud 12

1. Compute Example: The EMBL-EBI Private Cloud EMBL-EBI systems group provides compute and storage resources to a wide range of internal users. Historically this was provided by assigning physical servers Have acquired substantial experience of cloud enabling technologies such as virtualisation Have just conducted a thorough analysis of the cloud market place Selected VMware ESXi ™, vSphere™ & vCloud Director™to implement a hybrid cloud for internal and external users Systems can now dynamically allocate resources and users can interact with their VMs through a web interface or via APIs

2. Deployment Example: The ENSEMBL Cloud ENSEMBL is the most heavily used of EBIs services. Users in the USA and Japan were reporting unacceptable response times: a solution was needed urgently After an evaluation Amazon Web Serices (AWS) and Amazon Machine Instances (AMIs) running on Amazon Elastic Cloud (EC2) were selected This provided a very rapid means to test a cloud solution The project was extremely successful in that it removed the problem all together and now provides a substantial proportion of the ENSEMBL service at modest cost

Global use of EMBL-EBI/Sanger ENSEMBL Service 15

ENSEMBL on AMAZON 16

3. Big Data Example: Personalised medicine Personalised medicine will require sequencing of the genomes of large numbers of patients and volunteers It will be necessary to compare at least some of these genomes with the reference data collections Most hospitals and clinical research institutes will not wish to maintain up-to-date copies of the reference data collections It will be therefore be necessary to send these genomes to the institutes that hold the reference data collections It seems likely that this will be achieved using secure VMs and secure clouds holding the reference data collections EMBL-EBI is engaging with stakeholders to evaluate opportunities in this area.

4. Collaborator “Embassy” Clouds Pharmaceutical companies put significant effort into creating secure “EBI-like” services on their own infrastructure Many other users with high computational requirement do not wish to recreate our infrastructure on their own site A secure cloud environment providing “Cloud-Embassies” at EMBL- EBI would obviate this Embassy owners would have complete control over their virtual infrastructure Embassy owners could bring their own data and software to compute against EMBL-EBIs data and services Such services would be managed with legally acceptable collaboration agreements. 18

5. The Helix-Nebula Science-Cloud Three members of EIROforum (CERN, EMBL & ESA) Thirteen European IT providers (more are joining) A pan-European partnership of academia and industry to create cloud solutions and foster innovation in science Stimulate the creation of a cloud computing market in Europe (cf USA) Two year pilot phase after which it will be made more widely available to commercial and public domain EMBL will use it for the analysis of large genomes 19

Conclusions Data management is becoming a significance challenge in biology: size, complexity, ELSI… Organising I/O from disk to memory is as big a challenge as obtaining sufficient CPU-cycles High-throughput data-generators and users will be situated all round Europe The environment will be very heterogeneous with complex data and many different modalities of use Cloud solutions will be key in the approach to big-data challenges and complex international collaborations 20