Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Data service requirements and provisioning models Gergely Sipos With input.

Slides:



Advertisements
Similar presentations
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI support for scientific communities Gergely Sipos EGI.eu Technical Outreach.
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
EGI-InSPIRE EGI-InSPIRE RI The EGI Federated Cloud, using standards to create a fair and open European Cloud marketplace David.
Jisc Data Spring Pitch: Cloud Workbench Ben Butchart EDINA.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE EGI services for the long tail of science Peter Solagna Senior Operations.
European Grid Initiative Federated Cloud update Peter solagna Pre-GDB Workshop 10/11/
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud F2F Security Issues in the cloud Introduction Linda Cornwall,
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Towards H2020 Tiziana Ferrari/EGI.eu WLCG Collaboration Workshop.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
RI EGI-InSPIRE RI EGI Future activities Peter Solagna – EGI.eu.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI (Present and) Future of the EGI Services for WLCG Peter Solagna – EGI.eu.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI UMD Roadmap Steven Newhouse 14/09/2010.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud 1 17 Feb 2014 Diego Scardaci, EGI.eu Technical Outreach.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI User Support services and activities Gergely Sipos User Community Support.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI OpenSource GeoSpatial Catalogue Platform-as-a-Service Salvatore Pinto Cloud.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI for ELI-HU Gergely Sipos EGI.eu Technical Outreach Manager User Community.
PLATFORM TO EASE THE DEPLOYMENT AND IMPROVE THE AVAILABILITY OF TRENCADIS INFRASTRUCTURE IberGrid 2013 Miguel Caballer GRyCAP – I3M - UPV.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure collaboration Gergely Sipos EGI.eu Technical.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
EGI-Engage Competence Centres in EGI-Engage Gergely Sipos WP6 ‘Knowlegde Commons’ coordinator 6/14/
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Technology Sustainability Discussion Points DCI Sustainability Meeting.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
EGI-InSPIRE EGI-InSPIRE RI CAP4SME and EGI.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI /04/14 1 EGI Community Forum 2014 Federated Cloud image management Marios.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Evaluation of Liferay modules EGI-InSPIRE mini-project Gergely Sipos EGI.eu.
EGI-Engage EGI Webinar - Introduction - Gergely Sipos EGI.eu / MTA SZTAKI 6/26/
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Collaboration.
European Grid Initiative The EGI Federated Cloud as Educational and Training Infrastructure for Data Science Tiziana Ferrari/ EGI.eu.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI A pan-European Research Infrastructure supporting the digital European Research.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE EGI-InSPIRE RI EGI strategy towards the Open Science Commons Tiziana Ferrari EGI-InSPIRE Director at EGI.eu.
EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE Software provisioning and HTC Solution Peter Solagna Senior Operations Manager.
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number Federated Cloud Update.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Overview for ENVRI Gergely Sipos, Malgorzata Krakowian EGI.eu
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI OpenSource GeoSpatial Catalogue Platform-as-a-Service Salvatore Pinto Cloud.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
Enhancements to Galaxy for delivering on NIH Commons
EGI: advanced computing for research in Europe… and beyond!
AAI for a Collaborative Data Infrastructure
Federated Cloud Computing
Federated Open Data Repository in EGI
DI4R, 30th September 2016, Krakow
EGI-Engage Engaging the EGI Community towards an Open Science Commons
GSAF Grid Storage Access Framework
Cloud Computing R&D Proposal
Introduction to the EGI cloud federations
Federated Identity Management: Status and perspectives of EGI
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Expand portfolio of EGI services
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data service requirements and provisioning models Gergely Sipos With input from several EGI members Corresponding author of data services position paper: 1

EGI-InSPIRE RI Outline Emerging use cases (8) Possible EGI responses (provisioning models) Suggested responses Open questions 2

EGI-InSPIRE RI Use case 1: Scalable, personal storage E-laboratory - local or cloud installation of virtual laboratory software Can be customised with services, applications and data according to the collaborators’ needs Requires –Import data into e-laboratory from different 3rd party sources (for integration, curation, processing and visualisation) –Personal, remote (cloud) storage space for the user Current limitations of EGI storage: –Shared among VO members –Cannot be attached just like a ‘cloud storage’ to user environments (portals, virtualised, desktop clients, etc) Source: Lifewatch, BioVeL Potential solution(s): dCache has a related development in its roadmap? 3

EGI-InSPIRE RI Use case 2: Metadata discovery Source: EISCAT_3D Metadata exists in specific section of the file –E.g. Antenna direction –Will be used in Phase 1 for discovery of files Further metadata have to be extracted from the data –E.g. Number of spikes, Type of spikes –Applications exist that can process the EISCAT files and identify metadata (e.g. with FFT) –These applications should be collected from the EISCAT community and exposed to OSGC as services –Can be done in Phase 2 4 EISCAT file Metadata part Data part Metadata generator service 1 Metadata generator service N... Open Source Geospatial Catalogue (OSGC) CESNET site (CZ) Catalogue Phase 1: In ENVRI Phase 2: In a H2020 project

EGI-InSPIRE RI Open Source Geospatial Catalogue (OSGC) CESNET site (CZ) Catalogue EISCAT archive Object Storage Juelich site (DE) OpenStack SWIFT CDMI with HTTP export ENVRI pilot setup 1 EGI Federated Cloud Drop box tool to upload data on- demand from client side Near Real Time tool to import data automatically from receiving stations Admin tools Scientific users Data administrators Web browser wget 5m files, ~1TB in total On-site Off-site Phase 1: In ENVRI Phase 2: In a H2020 project Metadata generator service 1 Metadata generator service N... Processing / visulation service 1 Processing / visulation service N...

EGI-InSPIRE RI On-site Off-site EISCAT archive ENVRI pilot setup 2 EUDAT storage CSC (Jüelich or STFC) Scientific users Data administrators EUDAT Safe Replication ~5m files, ~1TB in total EUDAT Metadata Catalogue

EGI-InSPIRE RI Use case 3: Long term preservation Bit preservation, data preservation, metadata preservation and software preservation Source: High Energy Physics (HEP), Digital Cultural Heritage Preservation (DCH-RP), EISCAT_3D, EMSO, EPOS Potential solution(s): –Data curation tools and frameworks, virtualized solutions for software testing? –Zenodo for small datasets and software? –EGI Applications Database for software? –PURL for large datasets? 7

EGI-InSPIRE RI Use case 4: Services for citizen scientists To receive Curate Store Integrate Share data of citizen scientists Requires: –Low/no barrier of submission –Flexible curation services –Cost recovery for contributors, etc. Source: DRIHM, DCH-RP Potential solution(s): EUDAT Simple Storage for DRIHM? 8

EGI-InSPIRE RI Use case 5: Data with access restrictions Providing storage and processing services for data that have access restrictions (ethical, legal or societal reasons) Technology + legal arrangement. For example: –Legal guarantee that no one besides the owner of the data accesses it. –Technology guarantees that the data cannot be downloaded, only processed by certified VMs Source: Life sciences, Economy? Potential solution(s): –Hosting confidential data in the EGI Federated Cloud, and allow access only through certified Virtual Machine images? (~EBI Embassy Cloud) –Legal arrangement through EGI.eu to guarantee data confidentiality? 9

EGI-InSPIRE RI Use case 6: Data preservation from science gateways Support scientific gateways to transfer users’ computational results from the gateways to repositories Data can be preserved for long term after being properly indexed with metadata for later reuse and processing by external tools Automated processes with user control (minimal user input) Strong relation to use case 1 and 3 Source: WeNMR (structural biology) Potential solution(s): An API for EGI gateways on top of long term preservation services? 10

EGI-InSPIRE RI Use case 7: Open Data services OpenAIRE: an electronic infrastructure for handling open access, peer-reviewed articles as well as other important forms of publications. Will be compulsory for H2020 projects EGI to provide storage capacity and value-added services for OpenAIRE? 11

EGI-InSPIRE RI Use case 8: Close compute and data in the cloud Co-locate cloud storage and compute capacity Run users’ VMs close to data Source: BioVeL, ESA SSEP, ELIXIR Potential solution: Broker + Open Search? 12

EGI-InSPIRE RI EGI today  Possible responses 13 EGI Core Platform (X509, BDII, APEL, SAM) Grid platform (SRM, LFC, AMGA,...) Federated Cloud platform (CDMI, OCCI) 1.Extend the grid platform 2.Extend the cloud platform (standard interface) 3.New service in the cloud (hosted through OCCI) 4.Federate new services 5.Act as a technology provider 6.Do nothing EUDAT platform AMGA++ NoSQL X MapReduce Metadata portal Software for community deployment

EGI-InSPIRE RI My suggested responses 14 Use caseWhich strategy should EGI follow to support this use case?Next step 1. Scalable, personal storage 3. New service in the cloud: Bring in an external solution that builds on CDMI and could be hosted as an SaaS. 2. Metadata discovery3. New service in the cloud: OSGC service in the EGI Fed. Cloud. 4. Federate new services: EUDAT Metadata Catalogue, Storage and Secure Replication. Evaluate the two pilots, define sustainable setup for the long term. 3. Long term preservation 4. Services for citizen scientists 4. Federate new service: EUDAT will develop a Simple Store service for the citizen scientists use case of DRIHM. Federate this. 5. Data with access restrictions 6. Data preservation from science gateways 3. Act as a technology provider: Assemble an API for the developers of science gateways. Build on long term preservation services. 7. Open Data services 8. Close compute and data in the cloud 3.Extend the cloud platform: Bring in ‘VMI broker’ service into the Federated Cloud. It should use and expose standard interfaces.

EGI-InSPIRE RI Open questions 0. Additional use cases and technologies for consideration? 1.Data storage: What processes, policies and tools should EGI provide to help the setup and implement sustainable data management plans? 2.PID infrastructure: Which one and how to support? 3.UMD includes the cross-cutting services. Should we add new services to the UMD? 4.EUDAT: Which EUDAT services and how should be supported in EGI? 5.Software injection: How can we operate an efficient and scalable software selection and integration process to enable the rapid injection of new software into the production infrastructure? 15

EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Thank you 16