Download presentation
Presentation is loading. Please wait.
Published byMadlyn Jefferson Modified over 8 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Data service requirements and provisioning models Gergely Sipos With input from several EGI members Corresponding author of data services position paper: https://documents.egi.eu/document/2038 Gergely.sipos@egi.eu 1
2
www.egi.eu EGI-InSPIRE RI-261323 Outline Emerging use cases (8) Possible EGI responses (provisioning models) Suggested responses Open questions 2
3
www.egi.eu EGI-InSPIRE RI-261323 Use case 1: Scalable, personal storage E-laboratory - local or cloud installation of virtual laboratory software Can be customised with services, applications and data according to the collaborators’ needs Requires –Import data into e-laboratory from different 3rd party sources (for integration, curation, processing and visualisation) –Personal, remote (cloud) storage space for the user Current limitations of EGI storage: –Shared among VO members –Cannot be attached just like a ‘cloud storage’ to user environments (portals, virtualised, desktop clients, etc) Source: Lifewatch, BioVeL Potential solution(s): dCache has a related development in its roadmap? 3
4
www.egi.eu EGI-InSPIRE RI-261323 Use case 2: Metadata discovery Source: EISCAT_3D Metadata exists in specific section of the file –E.g. Antenna direction –Will be used in Phase 1 for discovery of files Further metadata have to be extracted from the data –E.g. Number of spikes, Type of spikes –Applications exist that can process the EISCAT files and identify metadata (e.g. with FFT) –These applications should be collected from the EISCAT community and exposed to OSGC as services –Can be done in Phase 2 4 EISCAT file Metadata part Data part Metadata generator service 1 Metadata generator service N... Open Source Geospatial Catalogue (OSGC) CESNET site (CZ) Catalogue Phase 1: In ENVRI Phase 2: In a H2020 project
5
www.egi.eu EGI-InSPIRE RI-261323 5 Open Source Geospatial Catalogue (OSGC) CESNET site (CZ) Catalogue EISCAT archive Object Storage Juelich site (DE) OpenStack SWIFT CDMI with HTTP export ENVRI pilot setup 1 EGI Federated Cloud Drop box tool to upload data on- demand from client side Near Real Time tool to import data automatically from receiving stations Admin tools Scientific users Data administrators Web browser wget 5m files, ~1TB in total On-site Off-site Phase 1: In ENVRI Phase 2: In a H2020 project Metadata generator service 1 Metadata generator service N... Processing / visulation service 1 Processing / visulation service N...
6
www.egi.eu EGI-InSPIRE RI-261323 On-site Off-site EISCAT archive ENVRI pilot setup 2 EUDAT storage CSC (Jüelich or STFC) Scientific users Data administrators EUDAT Safe Replication ~5m files, ~1TB in total EUDAT Metadata Catalogue
7
www.egi.eu EGI-InSPIRE RI-261323 Use case 3: Long term preservation Bit preservation, data preservation, metadata preservation and software preservation Source: High Energy Physics (HEP), Digital Cultural Heritage Preservation (DCH-RP), EISCAT_3D, EMSO, EPOS Potential solution(s): –Data curation tools and frameworks, virtualized solutions for software testing? –Zenodo for small datasets and software? –EGI Applications Database for software? –PURL for large datasets? 7
8
www.egi.eu EGI-InSPIRE RI-261323 Use case 4: Services for citizen scientists To receive Curate Store Integrate Share data of citizen scientists Requires: –Low/no barrier of submission –Flexible curation services –Cost recovery for contributors, etc. Source: DRIHM, DCH-RP Potential solution(s): EUDAT Simple Storage for DRIHM? 8
9
www.egi.eu EGI-InSPIRE RI-261323 Use case 5: Data with access restrictions Providing storage and processing services for data that have access restrictions (ethical, legal or societal reasons) Technology + legal arrangement. For example: –Legal guarantee that no one besides the owner of the data accesses it. –Technology guarantees that the data cannot be downloaded, only processed by certified VMs Source: Life sciences, Economy? Potential solution(s): –Hosting confidential data in the EGI Federated Cloud, and allow access only through certified Virtual Machine images? (~EBI Embassy Cloud) –Legal arrangement through EGI.eu to guarantee data confidentiality? 9
10
www.egi.eu EGI-InSPIRE RI-261323 Use case 6: Data preservation from science gateways Support scientific gateways to transfer users’ computational results from the gateways to repositories Data can be preserved for long term after being properly indexed with metadata for later reuse and processing by external tools Automated processes with user control (minimal user input) Strong relation to use case 1 and 3 Source: WeNMR (structural biology) Potential solution(s): An API for EGI gateways on top of long term preservation services? 10
11
www.egi.eu EGI-InSPIRE RI-261323 Use case 7: Open Data services OpenAIRE: an electronic infrastructure for handling open access, peer-reviewed articles as well as other important forms of publications. Will be compulsory for H2020 projects EGI to provide storage capacity and value-added services for OpenAIRE? 11
12
www.egi.eu EGI-InSPIRE RI-261323 Use case 8: Close compute and data in the cloud Co-locate cloud storage and compute capacity Run users’ VMs close to data Source: BioVeL, ESA SSEP, ELIXIR Potential solution: Broker + Open Search? 12
13
www.egi.eu EGI-InSPIRE RI-261323 EGI today Possible responses 13 EGI Core Platform (X509, BDII, APEL, SAM) Grid platform (SRM, LFC, AMGA,...) Federated Cloud platform (CDMI, OCCI) 1.Extend the grid platform 2.Extend the cloud platform (standard interface) 3.New service in the cloud (hosted through OCCI) 4.Federate new services 5.Act as a technology provider 6.Do nothing EUDAT platform AMGA++ NoSQL X MapReduce Metadata portal Software for community deployment
14
www.egi.eu EGI-InSPIRE RI-261323 My suggested responses 14 Use caseWhich strategy should EGI follow to support this use case?Next step 1. Scalable, personal storage 3. New service in the cloud: Bring in an external solution that builds on CDMI and could be hosted as an SaaS. 2. Metadata discovery3. New service in the cloud: OSGC service in the EGI Fed. Cloud. 4. Federate new services: EUDAT Metadata Catalogue, Storage and Secure Replication. Evaluate the two pilots, define sustainable setup for the long term. 3. Long term preservation 4. Services for citizen scientists 4. Federate new service: EUDAT will develop a Simple Store service for the citizen scientists use case of DRIHM. Federate this. 5. Data with access restrictions 6. Data preservation from science gateways 3. Act as a technology provider: Assemble an API for the developers of science gateways. Build on long term preservation services. 7. Open Data services 8. Close compute and data in the cloud 3.Extend the cloud platform: Bring in ‘VMI broker’ service into the Federated Cloud. It should use and expose standard interfaces.
15
www.egi.eu EGI-InSPIRE RI-261323 Open questions 0. Additional use cases and technologies for consideration? 1.Data storage: What processes, policies and tools should EGI provide to help the setup and implement sustainable data management plans? 2.PID infrastructure: Which one and how to support? 3.UMD includes the cross-cutting services. Should we add new services to the UMD? 4.EUDAT: Which EUDAT services and how should be supported in EGI? 5.Software injection: How can we operate an efficient and scalable software selection and integration process to enable the rapid injection of new software into the production infrastructure? 15
16
www.egi.eu EGI-InSPIRE RI-261323 www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Thank you 16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.