JRA2 Platforms for the Data Commons

Slides:



Advertisements
Similar presentations
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Advertisements

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
European Grid Initiative Federated Cloud update Peter solagna Pre-GDB Workshop 10/11/
Grid Workload Management Massimo Sgaravatto INFN Padova.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
RI EGI-InSPIRE RI EGI Future activities Peter Solagna – EGI.eu.
Authentication and Authorisation for Research and Collaboration Peter Solagna Milano, AARC General meeting Report and plans Attribute.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT EGI interoperability.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
EGI-Engage EGI-Engage WP3 e-Infrastructure Commons Diego Scardaci EGI.eu/INFN 6/18/2016 EGI-Engage – First.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
European Grid Initiative The EGI Federated Cloud as Educational and Training Infrastructure for Data Science Tiziana Ferrari/ EGI.eu.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI A pan-European Research Infrastructure supporting the digital European Research.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE EGI-InSPIRE RI EGI strategy towards the Open Science Commons Tiziana Ferrari EGI-InSPIRE Director at EGI.eu.
EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE Software provisioning and HTC Solution Peter Solagna Senior Operations Manager.
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number Federated Cloud Update.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Overview for ENVRI Gergely Sipos, Malgorzata Krakowian EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
1 EGI Federated Cloud Architecture Matteo Turilli Senior Research Associate, OeRC, University of Oxford Chair – EGI Federated Clouds Task Force
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
The EGI Federated Cloud
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Accessing the VI-SEEM infrastructure
J.Marco Spanish JRU EGI-ENGAGE meeting Madrid, 23 Feb 2015
Diego Scardaci EGI Technical Outreach Expert
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
EGI: advanced computing for research in Europe… and beyond!
JRA3 Introduction Åke Edlund EGEE Security Head
WP4/JRA2 Development of EGI Technical Platforms
Federated Cloud Computing
EGI and EGI-Engage PY2 Overview
ICOS on-demand atmospheric transport computation A use case for interoperability of EGI and EUDAT services Ute Karstens, André Bjärby, Oleg Mirzov, Roger.
FedCloud Blueprint Update
StratusLab Final Periodic Review
StratusLab Final Periodic Review
KER - Open Data Platform
Alexandre M.J.J. Bonvin MoBrain CC coordinator
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
EGI/EUDAT/INDIGO-DataCloud Joint project proposal for EINFRA-12 A
Introduction to Data Management in EGI
EGI.eu Technical Director EGI-Engage Technical Coordinator
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Agenda Welcome Project Status (inc. Activity Reports)
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Management of Virtual Execution Environments 3 June 2008
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Solutions for federated services management EGI
Accelerated Computing in Cloud
The Onedata platform Konrad Zemek, Krzysztof Trzepla ACC Cyfronet AGH
Leigh Grundhoefer Indiana University
Case Study: Algae Bloom in a Water Reservoir
EGI Webinar - Introduction -
Operations Management Board April 30
Module 01 ETICS Overview ETICS Online Tutorials
ELIXIR Competence Center
MMG: from proof-of-concept to production services at scale
Technical Outreach Expert
Expand portfolio of EGI services
EOSC-hub Contribution to the EOSC WGs
Operations Management Board March 26
Presentation transcript:

JRA2 Platforms for the Data Commons Matthew Viljoen Operations Officer, EGI.eu JRA2 Activity Leader, EGI-Engage

JRA2 Platforms for the Data Commons Outline Overview Objectives, tasks, partners and effort Overview of JRA2 (WP4) Activities JRA 2.1 Federated Open Data JRA 2.2 Federated Cloud JRA 2.3 e-Infrastructure Integration JRA 2.4 Accelerated Computing Use of Resources, Issues Plan for PY2 Summary JRA2 Platforms for the Data Commons

Overview

JRA2 Platforms for the Data Commons JRA2 (WP4) objectives Tasks # Task Objectives TJRA2.1 Federated Open Data Analysis of open data use cases and requirements Design and develop the Open Data platform prototype Open Data platform demonstrator TJRA2.2 Federated Cloud Evolve the federated IaaS Cloud platform with functionalities required by the CCs Extend ‘open standards’-based interfaces exposing new capabilities Maintain interface support for future versions of popular Cloud Management Frameworks (CMFs) TJRA2.3 e-Infrastructure Integration Integration for gCube and the D4Science infrastructure EGI-EUDAT Harmonisation for Virtual Research Environments Canadian Advanced Network for Astronomical Research TJRA2.4 Accelerated Computing Enabling accelerated computing support in the Information System Enabling accelerated computing support in the HTC and Cloud middleware frameworks list of work package objectives JRA2 Platforms for the Data Commons

JRA2 Partners and effort 15 Participants PY1 effort 69 PMs Project Total effort 156 PMs 5.2 FTEs JRA2 Task Leader / Partner JRA2.1 Federated Open Data Lukas Dutka / CYFRONET JRA2.2 Federated Cloud Álvaro López García / IFCA - CSIC JRA2.3 e-Infrastructures Integration Enol Fernandez / CSIC JRA2.4 Accelerated Computing Marco Verlato / INFN Provided by PO JRA2 Platforms for the Data Commons

Configuration Database Training Infrastructure Federated Data Manager Compute Storage Data Security Operations Training Design Phases Cloud Compute Data Transfer Configuration Database Discovery Online Storage Attribute Management FitSM JRA2.2 Alpha Cloud ContainerCompute Content Distribution Service Monitoring Beta Archive Storage Training Infrastructure IdP Proxy High Throughput Compute Federated Data Manager JRA 2.1 – federated open data JRA 2.2 – federated cloud JRA 2.4 – HTC JRA 2.3 – e-Infrastructure Interoperatibility Addressing technological gaps – meeting user needs that isn’t met by EGI at present, mainly because technology isn’t there Production Helpdesk Retired JRA2.4 Data Hub JRA2.1 JRA2 Platforms for the Data Commons

JRA2.1 - Federated Open Data (Lead: CYFRONET, M1 – M30) Overview Overview of the activities of each activity starting with Federated Open Data

Collaboration Platform Community Platforms Brokering, community-specific data, tools and applications EGI endorsed VM images, Helpdesk Collaboration Platform VM Image Catalogue of Data-intensive computing HTC Platform Cloud compute and storage GPGPU Platform Open Data Platform EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring Federated Service Management This task focuses on designing and prototyping an Open Data platform as a solution to integrate various data repositories and offer a common interface for end users. This is where it fits in the overall architecture. Physical Infrastructure JRA2 Platforms for the Data Commons

Current situation and Aims JRA2.1 - Federated Open Data Current situation and Aims Present gaps: Existing EGI data management infrastructure does not support open data publishing Open data must be accessible through diverse authentication technologies Metadata stored in EGI infrastructure is not compatible with “open” metadata standards Data sets are often too large to be completely transferred to users infrastructure EGI Open Data Platform aims to: Address the above Manage entire data life cycle from raw data to preservation Combine efficient computation services with open data managed by federated infrastructures JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.1 - Federated Open Data Work outline Analysis of requirements from the communities Analysis of existing standards and technologies Definition of Open Data Platform (ODP) architecture Selection of technologies for ODP Start of prototype implementation JRA2 Platforms for the Data Commons

Community requirements collection JRA2.1 - Federated Open Data Community requirements collection Open Data track at the EGI Conference in Lisbon, 18-22 May 2015 Custom questionnaires were developed and sent to communities Biological and Medical Sciences Human Brain Project, MoBRAIN, BBMRI Environmental and Earth Sciences EMSO, LifeWatch Agriculture Agrodat.hu, agINFRA Astronomy & Astrophysics (A&A) CTA, LoFAR, CANFAR Mention the communities: (more info on each) MoBrain - A Competence Center to Serve Translational Research from Molecule to Brain BBMRI - Biobanking and Biomolecular Resources Research Infrastructure EMSO - European Multidisciplinary Seafloor and water-column Observatory LifeWatch CC Agrodat.hu – hungarian agricultural knowledge centre agINFRA - a research data hub for agriculture, food and the environment CTA - An observatory for ground-based gamma-ray astronomy LoFAR - Low-Frequency Array, an instrument for performing radio astronomy. CANFAR - The Canadian Advanced Network for Astronomical Research JRA2 Platforms for the Data Commons

Community requirements collection JRA2.1 - Federated Open Data Community requirements collection REQ1. Publication of open research data based on policies REQ2. Make large data sets available without transferring them completely REQ3. Enabling complex metadata queries REQ4. Integration of the open data access data management with community portals REQ5. Data identification, linking and citation REQ6. Enabling sharing of data between researchers under certain conditions List of requirements, as extracted from the feedback from the questionnaires and presented in Milestone 4.1. No particular order – currently dealing all of them with equal priority. Before implementation, we will decide on which ones are implemented first. (elaborate on each item) REQ8 – We will not be doing long-term data preservation – rather our plan is to integrate EUDAT for this. (EUDAT is another major e-Infrastructure). EGI/EUDAT integration is the subject of dedicated work in JRA2.3 – I will discuss this in more detail later. REQ7. Sharing and accessing data across federations REQ8. Long term data preservation EUDAT will be doing bit preservation REQ9. Data provenance JRA2 Platforms for the Data Commons

Open Data Platform (ODP) from users’ perspective JRA2.1 - Federated Open Data Open Data Platform (ODP) from users’ perspective ODP Non-grid users friendly security – no VO certificate necessary for open data Users and community data is organized into spaces (virtual folders) Single user interface for personal, research and open data management Open data specific functionality including DOI registration, publication policies and long term preservation Web interface for data management, including ACL and sharing. Data can be accessed from local filesystem or grid / cloud protocols Clockwise from bottom left. JRA2 Platforms for the Data Commons

Open Data Platform Interactions Public Services For Data Discovery 2: opendata publish collection Data-set-1.1 -> DOI.1 3: discover data -> DOI.1 Snapshot Data-set-1.1 Data-set-1.1 Mounted to /localdir/ 4: Visit Collection Web Page (HTTP) 5: opendata mount remote DOI.1 /localdir/ 1: opendata create snapshot Data-set-1 6: opendata fork DOI.1 An example of how users formation teract with the ODP User has a dataset on a private resource Snapshots the data & publishes it. Gets a DOI Another user discovers it, visits the collection via ODP and mounts it. User may forks the collection & continues to work on it privately. Lazy Replication Private Resources Data–set-1 Cloned Data-set-1.1 Private Resources JRA2 Platforms for the Data Commons

Open Data Platform architecture JRA2.1 - Federated Open Data Open Data Platform architecture DOI Registrar (e.g. DataCite) Community Portal EGI User 1 (VO x) Anonymous User 1 EGI User 2 (Onedata space) REST Web GUI POSIX OAI-PMH CDMI REST Open Data Platform Space Manager Space Manager Open Data Manager Metadata Registry OAI-PMH Data Provider Authentication and Authorization Long Term Retention Generatore AIP package for abc ODP is a solution for a a gateway to storage – it’s not duplicating anything. Federated approach of accessing data. Complexity completely hidden from users, who only need to chose 1) the data they want and 2) how they want to access it Top – users accessing the way they chose Say each in turn. Openaire – service for discovering journals, datasets. Fed by a metadata protocol called OAIPMH Open Archives Initiative Protocol for Metadata Harvesting) DOIs – Digital Object Identifiers. Allow citation of datasets Bottom – disparate storage with plugins installed making them work with ODP. We are developing the Open Data Platform prototype, based on OneData. We will have interfaces that will be able to communicate to the individual storage providers. In almost all cases, no changes will need to be done at their ends. NOTE: components from EUDAT can plug into ODP. (EUDAT is another major e-Infrastructure). For example, B2SAFE for long-term storage. More information later about EGI/EUDAT interoperability later. EGI Research Centres Cloud storage EUDAT JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.1 - Federated Open Data HBP Testbed Updates Close collaboration between HBP/EGI since September 2015 Setup 10TB data hosting testbed on EGI, now ready for production CYFRONET, DESY (soon also FZ Jülich, GRNET) Use case analysis, testing and optimization Serves HBP visualization software New use cases under discussion (computation, active/passive datasets, federation using Open Data Platform) Now we move to an example of the early test of Open Data Platform components – a testbed that’s been setup for the Human Brain Project over the last 4 months. Human Brain Project is a major and groundbreaking 10 year long, EU funded project building scientific research infrastructure to allow researchers across the globe to advance knowledge in the fields of neuroscience, computing, and brain-related medicine They have petabyte-scale storage requirements, and want to use EGI to handle the major demands of these storage & processing requirements. Visualization software running on Fedcloud, but HBP wants to run it New usecases – computation, integration with Zenodo (passive, active datasets) and geographical distribution of datasets for optimized serving JRA2 Platforms for the Data Commons

Introducing the EGI DataHub JRA2.1 - Federated Open Data Introducing the EGI DataHub Why? Existing datasets are isolated and not easily discoverable Challenge to bring computing and data together Value Unifying access to scientific data of public interest Ease the bringing of data to computation For whom? RIs Research Communities and LToS How? New Data as a Service offering, built on top of the Open Data Platform and accessible to the Federated Cloud Will manage replicas of publically available data accessible on EGI I’d like to spend some time introducing the DataHub. Unlike the OpenData Platform which is a technology enabling federation of data, the DataHub is a service addressing needs of our community. DataHub is a new concept we first seriously discussed during a week of meetings by EGI and communities following the Open Science Cloud retreat at Amsterdam in March . Using business model canvas for initial structuring as input to service design and management processes JRA2 Platforms for the Data Commons

DataHub - Current Landscape JRA2.1 - Federated Open Data DataHub - Current Landscape Public Data Repository X Public Data Repository Y Public Clouds S3 Community Specific Data Discovery Community Specific Data Discovery AWS Existing Replica To understand what the DataHub is trying to address, this diagram shows the current state of play. - Some communities hosting their data and provide means to discover it Other communities do a similar thing, and use public clouds to access/compute the data At the bottom we have our fedcloud resources with different storage, and private clouds/storage DataHub aims to bring this together… EGI Resource Centres EGI Resource Centres Private Resources Private Comp. Cloud LUSTRE S3 Ceph NFS JRA2 Platforms for the Data Commons

Landscape changed by DataHub JRA2.1 - Federated Open Data Landscape changed by DataHub Here you see how the DataHub in front of the Open Data Platform can improve on the current disparate landscape, Users can discover open data & make them accessible to federated cloud resources via the federated data provided by the ODP Lazy Replication - JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.1 - Federated Open Data DataHub – Next Steps Communities engaged: EBI/ELIXIR, iMarine etc. Presented at EGI CF 2016 within DataHub session Collecting use cases Draft whitepaper written and circulated to initial communities. Further discussions prior to final version DataHub will be prototyped and launched soon after the Open Data Platform - Other communities: agri-food, Earth Observation, Marine) including SMEs within the area of Earth Observation We are drawing plans which will articulate how the DataHub will fit into other infrastructures, e.g. EUDAT ODP is a pre-requisite for the DataHub. One launched, we will prototype the DataHub for initial communities. JRA2 Platforms for the Data Commons

JRA2.2 - Federated Cloud (Lead: CSIC, M1 – M30) Overview JRA2 Platforms for the Data Commons

Data-intensive computing Cloud compute and storage GPGPU Platform Community Platforms Brokering, community-specific data, tools and applications EGI endorsed VM images, Helpdesk Collaboration Platform VM Image Catalogue of Data-intensive computing HTC Platform Cloud compute and storage GPGPU Platform Open Data Platform EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring Federated Service Management This is where the EGI Federated Cloud (or Fedcloud) sits on our global architecture Physical Infrastructure JRA2 Platforms for the Data Commons

The EGI Federated Cloud OpenStack Nova Manage instances Uniform interfaces and behaviour Share & endorse VM images (OVF) Cloud Providers Cloud Site (OpenStack) (OpenNebula) (Synnefo) Image replication (VMCatcher) EGI e-infrastructure operation tools Operation services AAI (VO management) Service registry Information service Accounting Monitoring (Slide arch was presented by Peter – don’t repeat) Much of the development in this Activity improves the user experience of the Deployer of services/VMs on the fedcloud, by improving the Applications Database functionality. If you recall, AppDB is a marketplace where users can discover virtual appliances and virtual machine images and see at what Fedcloud site they are hosted at. Endorsed VM images JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Activities Extend AppDB to support basic VM management operations Extending VM management standards support. Extend OGF open cloud standard interface (OCCI) to support: users creating snapshots. changing attached resources to an executing VM instance (i.e. resize). Draft specification for relocating VM instances between providers. EGI Federated Cloud Integration Tools support for OpenStack, OpenNebula and Synnefo. USER DRIVEN activities - New functionality leading to extension of standards. Integration tools demanded by easier use/adoption of FedCloud, aimed at sysadmins OCCI = Open Cloud Computing Interface – open standard to different cloud interfaces (Openstack, Opennebula, Synnefo) Allows standard way of managing VMs (starting, stopping, configuration etc.) OCCI is a standard in the Open Grid Forum (OGF) JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Achievements I Open Standards improvement EGI Federated Cloud is well positioned in OGF → main driver of the OCCI 1.2 standard definition. OCCI additions accepted to be included in the OGF official specification for OCCI 1.2 (no external extensions). OCCI 1.2 included feedback provided from the EGI Federated Cloud (e.g. attribute definition, JSON rendering). Major achievement in this activity is developing the OCCI, This is a good example of EGI working with standards bodies – in this case OGF. Leveraging OGF by ensuring outputs are fed into standards, ensuring sustainability & promoting adoption. JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Achievements II Developing OCCI implementations: OpenStack (OOI) rOCCI All software being integrated within UMD/CMD. Work done as part of EGI-ENGAGE funding OOI = openstack OCCI implementation rOCCI = a Ruby OCCI Framwork (Ruby is a programming language). Availabie on major linux distributions JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Achievements III Monitoring support Monitoring probes have been updated and improved. OpenStack specific probes developed. Accounting support New accounting record developed, including benchmarking information for the VM ‘Non-user-facing’ work. But very important ‘behind the scenes’ work necessary for production – monitoring and accounting The OCCI probe was completely rewritten. Two new openstack probes were developed : one for VM management and another for object storage (this is not yet in production). And a new probe was developed to check CA certificates at sites JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Achievements IV Security improvements AuthN and AuthZ have been improved, satisfying security teams’ requests (i.e. user central suspension and removal). Keystone-VOMS modules targeted to be integrated upstream for the upcoming version, as a Keystone federation plugin. VMI endorsement and distribution procedures and policies have been improved in collaboration with security teams. Keystone is an OpenStack component that provides Identity, Token, Catalog and Policy services The Keystone-VOMS module is intended to provide VOMS authentication to an OpenStack Keystone JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.2 - Federated Cloud Plans Application DB improvements roadmap AppDB will allow to perform simple VM operations (create, delete, etc.) using the Infrastructure Manager (IM) behind the scenes Plans for adding a web user interface (alternative to the current command line requirements for OCCI and X509 proxies) Design available, implementation started. Very important for usability and further adoption of the FedCloud – usability improvements. Highlight that the VM images are stored in the AppDB that will act also as user interface to create/destroy VMs Infrastructure Manager (IM) – existing tool developed by the University polytechnic of Valencia to improve usability of IaaS clouds by automating management of Virtual Appliances JRA2 Platforms for the Data Commons

JRA 2.3 - e-Infrastructure Integration (Lead: INFN, M1 – M30) Overview In JRA2.2 we talked about the further improving the EGI Fedcloud which uses a model which integrates institutional clouds into a scalable computing platform. This was defined in 2011 and fully implemented in 2014. We now take this model a step further by integrating federations of clouds where the EGI Fedcloud can itself integrate with other federations or e-Infrastructures. Also to make it easier for scientific communities to create their own cloud federations and to interoperate with the Fedcloud. JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.3 – e-Infrastructure Integration Work outline Interoperation of 3 e-Infrastructures: EUDAT CANFAR gCube/D4Science Update of EGI cloud federation model to accommodate different types of integration Use EGI Federated Cloud as building blocks to create new federations Collaboration activities driven from requirements of user communities involved in WP6 Activities tracked in dedicated RT queue JRA2 Platforms for the Data Commons

Cloud Federation Model JRA2.3 – e-Infrastructure Integration Cloud Federation Model Collaboration Platform VM image catalogue, Helpdesk EGI endorsed images Community Platform Community Platform Cloud Realm Cloud Realm Cloud Realm Cloud Realm EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring, Federated Service Management Cloud Realm subset of cloud providers exposing homogeneous cloud management interfaces and capabilities which use the services of the EGI Core Infrastructure Platform for creating a federation Changes in the FedCloud to make it easier for scientific communities to create their own customized cloud federation. The federation model which enables federation of different e-Infastructures. We define a Realm to help us do this where a realm has the same cloud management intefaces and capabilities. There are 2 Realms operated in EGI: OpenStandards Realm (using OCCI- OpenSTack, OpenNebula and Synnefo) OpenStack Realm(using OpenStack native Nova interface) Community Platforms provide community-specific data, tools and applications and can be supported by one or more realms. JRA2 Platforms for the Data Commons

EGI-EUDAT Interoperation JRA2.3 – e-Infrastructure Integration EGI-EUDAT Interoperation Goal: provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services Roadmap defined by selected user communities (EPOS, ICOS, BBMRI, ELIXIR and EISCAT-3D) First use case ‘universal’ use case: User to instantiate a VM on the EGI cloud federation for the execution of a computational job consuming data preserved onto EUDAT resources. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Permanent Identifiers (PIDs) for future use Working closely with EPOS, ICOS implementing their use cases We now move on to looking in more depth about the e-Infrastructure Integration done in this activity, starting with EGI and EUDAT. (read objective) Why do this? To benefit from services offered by both e-Infrastructures. Clear advantages of doing this for existing users of both e-Infrastructures. (read universal usecase) – demoed in EGI CF 2015 (Bari) Both EPOS and ICOS presented at EGI CF ‘16. Actively working on implementing their usecases, and are hopefully for PoC results later this year. (Two Earth Observation Research Infrastructures) ICOS = Integrated Carbon Observation System EPOS = European Plate Observing System JRA2 Platforms for the Data Commons

EGI-EUDAT Integration Pilot: Processing in EGI, Long-term storage in EUDAT Access to EGI and EUDAT services with a single user identity Data Staging between EGI Federated Cloud and EUDAT services 1. Run analysys 5. Run analysis 4. Retrieve Data 2. Stage output data in B2STAGE 3. Data stored in B2SAFE for Long Term preservation VM based on fact that both EGI/EUDAT use the same underlying security technologies (X509, proxies, VOMS) User creates proxy, instantiates VM with the proxy Analysis is run (simple creation of a random file!) Data is staged via B2STAGE for long-term preservation in B2SAFE Data is then retrieved Data is analysed (retrieved file is checksummed with original to validate it’s the same) Simple but very important basic usecase that paves the way for many more complex real-life community usecases No time to go into the ICOS/EPOS usecases, please refer to their presentations at the EGI/EUDAT Interoperability session in Amsterdam. EGI Federated Cloud EUDAT services JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.3 – e-Infrastructure Integration EGI-CANFAR Goal Prove interoperable access to astronomy data for European and Canadian astronomers How? Integrate both e-infrastructures towards a seamless and uniform community platform for international astronomy research collaboration. Share Data: mirrors for the Canada-France-Hawaii Telescope archives. “Bring the computing near the data” to expand capabilities of International Virtual Observatory Pilot, preproduction, prototype of e-infrastructure CANFAR is a community platform for astronomy based on IVOA standards running on Compute Canada resources Offers data access to astronomical data (archives and catalogues), Access control (GMS), user storage (VOSpace), cloud processing integrated with telescope data collections (OpenStack) In 2014, > 7,000 users, > 1 PiB of data handled JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.3 – e-Infrastructure Integration CANFAR - Status Analysis and integration of CANFAR AAI (Group Management System, GMS) Tests of the European GMS prototype https://gms.oats.inaf.it Tests of interoperability between Canadian GMS and European GMS Preliminary analysis of VOSpace First steps – security related Group Management Service (GMS) – is the AAI system for CAMFAR. GMS now implemented on top of VOMS, and an instance is now available for testing https://gms.oats.inaf.it VOSpace is about providing the actual storage, it is a International Virtual Observatory Alliance (IVOA) spec. We’re drawing plans to implemlent it on the European side. It needs GMS to be setup as a prerequisite – hence doing that first. JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.3 – e-Infrastructure Integration gCube/D4Science Goal Extension of D4Science computing capabilities to transparently use the EGI Federated Cloud How? Extend gCube/D4Science to use EGI Federated Cloud resources through implementing OCCI client capabilities. Identity and authorization federation Integration at IaaS level and bundling of D4Science-specific Virtual Appliances Elastic execution of D4Science processes on the remote Federated Cloud infrastructure by exploiting existing Virtual Appliances Production of VM images identical to the ones working at CNR Images available on AppDB and replicated to Federated Cloud sites D4Science.org is an organisation offering a Hybrid Data Infrastructure service and a number of Virtual Research Environments A major user is the iMarine project which deals with fisheries management and conservation of marine living resource D4Science built on the gCube software system, an open source system for building and operating Hybrid Data Infrastructures. (…) First steps – security related (again!) JRA2 Platforms for the Data Commons

D4Science/gCube access to federated cloud JRA2.3 – e-Infrastructure Integration D4Science/gCube access to federated cloud Resources Panel - Remote Node - Service Profile - VM Templates - VM Providers Resources grid Lists available resources according to selected view Pinned Resource Detailed resource information Illustration of how gCube is able to access the EGI Fedcloud Create New Opens wizard for resource creation. Select Service Profile Select VM Template JRA2 Platforms for the Data Commons

gCube/D4Science integration status JRA2.3 – e-Infrastructure Integration gCube/D4Science integration status gCube System Software tested in pre-production environment Under integration Planned release: end of April ‘16 D4Science exploitation EGI SLA and EGI OLA to be finalized D4Science Infrastructure Manager trained VRE Managers campaign to promote EGI FedCloud started JRA2 Platforms for the Data Commons

JRA 2.4 - Accelerated Computing (Lead: INFN, M1 – M15) Overview JRA2 Platforms for the Data Commons

Collaboration Platform Community Platforms Brokering, community-specific data, tools and applications EGI endorsed VM images, Helpdesk Collaboration Platform VM Image Catalogue of Data-intensive computing HTC Platform Cloud compute and storage GPGPU Platform Open Data Platform EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring Federated Service Management Accelerated Computing – a way to utilize energy efficient and powerful High Performance Computing capabilities. This is where it fits in the overall architecture diagram. GPGPU – General Purpose Graphics Processing Unit, originally developed for better graphics rendering (by offloading processing from the CPU to the graphics card) thanks to the demands of the gaming industry Now very prevalent and a cost effective means of increasing processing power in hardware. Physical Infrastructure JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.4 – Accelerated Computing Introduction Goals: Provide a new and integrated accelerated computing platform – HTC (grid) and cloud Extend accounting to include GPGPU usage (HTC and cloud) Very significant benefits to making GPGPUs available to users of EGI, and this activity is extending existing components for both High Throughput Compute HTC/Grid and Cloud users (…) ( JRA2 Platforms for the Data Commons

Accelerated computing in HTC JRA2.4 – Accelerated Computing Accelerated computing in HTC Current situation: Interfaces to computing clusters in EGI don’t support GPGPUs Work plan Extend information systems to allow for discovery of GPGPU capabilities Change HTC/grid components to allow for submission and running of jobs using GPGPUs Information discovery – extending OGF standard GLUE, and extending information providers to GPGPU information Extending CREAM component with GPGPU support and releasing a new version of CREAM with this JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.4 – Accelerated Computing HTC GPU Achievements CIRMMP testbed used for MoBrain applications AMBER, Powerfit and DisVis applications with CUDA 5.5 ExecutionEnvironment class deployed at CIRMMP testbed Dynamic info-providers need new attributes in GLUE2.1 draft HTC GPU-enabled prototype available and tested at: CIRMMP (local Torque based GPU cluster) GRIF/LLR (Production HTCondor based GPU & MIC cluster) ARNES (Production Slurm based GPU cluster) Queen Mary (local SGE based cluster with AMD GPUs) Plans to test the prototype at INFN-CNAF (LSF9 based GPU cluster) WeNMR/AMBER grid portal now exploiting GPU HTC resources Testbed being used for HTC GPGPU testing is CIRMMP - Centre of Magnetic Resonance in Florence GLUE2.1 draft now being finalized and submitted to the GLUE-WG at the OGF New GLUE2.1 attributes: ComputingManager class (the LRMS) TotalPhysicalGPUs, TotalGPUSlots, UsedGPUSlots ComputingShare class (the batch queue) FreeGPUSlots, UsedGPUSlots JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.4 – Accelerated Computing Accelerated computing in Clouds Supporting GPGPU in clouds Virtualized GPU is in early stage Openstack support for PCI passthrough Current status Production sites with GPGPU in EGI FedCloud Instructions for site admins, developers and users JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.4 – Accelerated Computing Cloud GPU Achievements IISAS-GPUCloud is in production Images of supported VOs installed Ceilometer added for better monitoring New site has been installed at INCD/LIP 2 compute nodes have NVIDIA GPUs - Tesla K40 Openstack Kilo, PCI passthrough Contributed to OGF standard to allow publishing of GPU information - GLUE2.1 allows for discovery of GPU resources MoBRAIN and Lifewatch now exploiting GPU cloud resources (…) IISAS = Institute of Informatics, Slovak Academy of Sciences project currently benefitting by gpu on clouds are: several mobrain applications, ad described in D6.7, and some lifewatch applications (presented at EGI CF ‘16) JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons JRA2.4 – Accelerated Computing GPU Cloud - Ongoing work Accounting VMs with GPGPU New items are being added to accounting records New Federated Cloud sites offering GPU capabilities CESNET-Metacloud will upgrade their endpoint in April and provide GPU-enabled templates, images and guides Longer-term goal is to provide OCCI extensions to to enable starting of VMs using GPUs without the need to have GPU-specific images/flavours in the CMF Accounting – working with the STFC APEL team Longer term goal - a way to start gpu-enabled VMs without the need to have gpu-specific images/flavours in the CMF JRA2 Platforms for the Data Commons

Use of Resources and Issues Numbers will be provided by PO, work package leaders must provide explanation

JRA2 – Effort consumed in PY1 ‘’Overall, the WP04 has used 98% of the efforts planned in PY1. Among the partners CESGA (third party of CSIC) and Engineering have exceeded the linear plan in T04.02 - JRA2.2 - Federated Cloud but this is justified with the implementation of the task and the preparation of three deliverables. CIRMMP has been slightly under spending its activity (68%) with a slow start in the first quarter; they are active partner and this will be balanced in the next period. Further investigations are for Agro-Know’s overspend of 277% has revealed that not all of their work has been part of the agreed EGI-Engage activities and only 33% or 1PM of their spend will be accepted. Note: IIASA has 5PM for 15 months (not for the whole project duration), so linear PMs plan cannot apply. They expect to spend all PMs by the end of May 2016, when their task ends. Note: EGI.eu has 4 PMs planned for the activity VM Catcher, which has been postponed in PY2.’’ JRA2 Platforms for the Data Commons

Plan for PY2 Freedom on way to present

JRA2 Platforms for the Data Commons Plans (JRA2.1, JRA2.2) JRA2.1 Finalize plans for the EGI DataHub Open Data Platform first prototype HBP data hosting – developing production service JRA2.2 OCCI extension for VM migration VM Management functionality for AppDB JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons Plans (JRA2.3, JRA2.4) JRA2.3 Completion of CANFAR, and D4Science Integration EUDAT Integration (EPOS and ICOS use cases) JRA2.4 (ending M15) Finalising the prototype for SGE and LSF batch systems for (HTC GPU) Further deployment of GPUs on production EGI federated cloud sites (CESNET) Accounting (work continuing under JRA1.3 after M15) JRA2 Platforms for the Data Commons

JRA2 Platforms for the Data Commons Summary Objective 2 (O2): EGI Solutions, related business models and access policies. Data as a Service and DataHub planning Development of the EGI Federated Cloud Objective 3 (O3): Offer and expand an e-Infrastructure Commons solution E-Infrastructure Integration (CANFAR, D4Science and EUDAT) Objective 4 (O4): Open data platform and the European Big Data Value. Open Data Platform preparation Objective 5 (O5): Promotion the adoption and extension of the current EGI services HBP Testbed EGI/EUDAT Integration (ICOS and EPOS use cases) Achievements grouped by objectives JRA2 Platforms for the Data Commons