Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

Workload Management Massimo Sgaravatto INFN Padova.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Heads in the cloud? GSM-WG at OGF31, Taipei Jens Jensen, RAL.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
European Grid Initiative Federated Cloud update Peter solagna Pre-GDB Workshop 10/11/
Protect Your Business-Critical Data in the Cloud with SoftNAS, a Full-Featured, Highly Available Solution for the Agile Microsoft Azure Platform MICROSOFT.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.
CoprHD and OpenStack Ideas for future.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
ONEDATA Way to access to your Data at the global scale Lukasz Dutka, R. Slota, M. Wrzeszcz, D. Krol, L. Opiola, R. Slota, J. Kitowski ACK Cyfronet AGH.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
INDIGO DATACLOUD MEETING AMSTERDAM 4-5 th APRIL 2016 Lukasz Dutka RIA INDIGO-DataCloud is co-founded by the Horizon 2020Framework Programme AMSTERDAM.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
DreamFactory for Microsoft Azure Is an Open Source REST API Platform That Enables Mobilization of Data in Minutes across Frameworks and Storage Methods.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Docker for Ops: Operationalize Your Apps in Production Vivek Saraswat Sr. Product Evan Hazlett Sr. Software
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
PLG-Data and rimrock Services as Building
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
PaaS services for Computing and Storage
Course: Cluster, grid and cloud computing systems Course author: Prof
Workload Management Workpackage
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
StratusLab First Periodic Review
Introduction to Distributed Platforms
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Unified Data Access and MGMT. in Distributed hybrid Cloud
The PaaS Layer in the INDIGO-DataCloud
Working With Azure Batch AI
Data Bridge Solving diverse data access in scientific applications
Open Source distributed document DB for an enterprise
StratusLab Final Periodic Review
StratusLab Final Periodic Review
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
KER - Open Data Platform
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Middleware independent Information Service
Introduction to Data Management in EGI
Tools and Services Workshop Overview of Atmosphere
Unified Data Access in Distributed hybrid Cloud
Integration of Network Services Interface version 2 with the JUNOS Space SDK
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
University of Technology
Research Data Archive - technology
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
PROCESS - H2020 Project Work Package WP6 JRA3
Management of Virtual Execution Environments 3 June 2008
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Enterprise security for big data solutions on Azure HDInsight
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
The Onedata platform Konrad Zemek, Krzysztof Trzepla ACC Cyfronet AGH
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Concept of VLAN (Virtual LAN) and Benefits
Case Study: Algae Bloom in a Water Reservoir
TEMPLATE.
Abiquo’s Hybrid Cloud Management Solution Helps Enterprises Maximise the Full Potential of the Microsoft Azure Platform MICROSOFT AZURE ISV PROFILE: ABIQUO.
Technical Capabilities
Mariusz Sterzel1 , Lukasz Dutka1, Tomasz Szepieniec1
DBOS DecisionBrain Optimization Server
Photon & Neutron working meeting
OpenStack for the Enterprise
Check-in Identity and Access Management solution that makes it easy to secure access to services and resources.
Presentation transcript:

Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)

PROBLEMS ADDRESED BY ONEDATA Multi-protocol transparent access to data “[…] but we want POSIX” Heterogeneity of storage technologies Replica Management Easy Data Sharing and publication (DIO) Metadata Management Integrated with Data Management Platform Flexible authentication and authorization Easy integration using API with external services High-throughput data processing 1 2 3 4 5 6 7 8

PROBLEM 1: MULTI-PROTOCOL TRANSPARENT ACCESS TO DATA IN MULTI-CLOUD ENVIRONMENTS Transparently access and create data in multi-cloud environments Care less about data locality, all your data are accessible wherever you go Use many protocols to access the same data

ONEDATA for ONEWORLD NO TRUST Onezone Entry GUI Kademlia DHT (in prep.) OIDC SAML (in prep.) OAI-PMH REST APIs NO TRUST POSIX Ceph S3 Swift GridFTP (in. Prep) CDMI POSIX Data Mgmt. GUI REST APIs FTP / SFTP (in prep.) WebDAV (in prep.)

[…] BUT WE WANT POSIX Support for most of the POSIX operations on globally distributed virtual file system All data accessible via a unified file system mountable on virtual machines, Grid worker nodes and containers

PROTOCOL HANDLERS (PLUGINS) FUSE Client Oneclient POSIX Onezone Entry GUI HTTP GUI REST CDMI FUSE Client Data Mgmt. GUI Kademlia DHT (in prep.) FUSE Client REST APIs FTP / SFTP (in prep.) HTTP GUI REST WebDAV (in prep.) FUSE Client

PROBLEM 2: Heterogeneity of storage technologies Use the data protocols of your choice to access data wherever you go Minimize the problems of selection right storage technology to data centres operators Avoid cloud vendor locking

Different types of storages virtualized POSIX Ceph OpenStack Swift

STORAGE SYSTEMS DRIVERS (PLUGINS) FUSE Client Oneclient Onezone POSIX Ceph S3 Swift Entry GUI HTTP GUI REST GridFTP (in. Prep) FUSE Client Kademlia DHT (in prep.) FUSE Client HTTP GUI REST FUSE Client

PROBLEM 3: REPLICA MANAGEMENT Replicate files on demand and on the fly without any additional effort Migrate data between sites on demand with simple API interface Easily check location of your data using GUI or API

Replicas Management SIMPLIFIED Manage files not Replicas File distribution between storage locations is underneath the file structure Replicas management on a chunk basis Missing chunks delivered on the fly API for replica management for pre-staging and implementing external data policy management

PROBLEM 4: EASY DATA SHARING WITHOUT BORDERS Share large scale data collections with other communities Enable your data to be shared in cross-federation scenarios Bring your data and tools as building blocks to European Open Science Cloud

EASY DATA SHARING Team-sharing Cross-community data sharing For groups For individuals Token based Cross-community data sharing Instant and ad-hoc data sharing Thanks to effort supported by EGI Engage: Open Data Publication Handles (DOI) support OAI-PMH

EASY DATA SHARING Team-sharing Cross-community data sharing For groups For individuals Token based Cross-community data sharing Instant and ad-hoc data sharing Thanks to effort supported by EGI Engage: Open Data Publication Handles (DOI) support OAI-PMH

EASY DATA SHARING Team-sharing Cross-community data sharing For groups For individuals Token based Cross-community data sharing Instant and ad-hoc data sharing Thanks to effort supported by EGI Engage: Open Data Publication Handles (DOI) support OAI-PMH

Open Data Platform workflow

Open Data Platform workflow

Open Data Platform workflow

Open Data Platform workflow

Open Data Platform workflow

PROBLEM 5: METADATA MANAGEMENT INTEGRATED WITH DATA MANAGEMENT PLATFORM Work with data and metadata in one system – avoiding problems of consistency Monitor metadata data changes trough API in order to feed external custom systems Advanced data discovery capabilities based on metadata

Integrated metadata managment All files and directories can have a custom user metadata API for metadata management API for data discovery based on metadata Virtual Folders based on metadata tags Metadata formats: key-value, JSON, RDF

PROBLEM 6: FLEXIBLE AUTHENTICATION AND AUTHORIZATION Control who knows about your data Control who can access data on a single file level

authentication and authorization Pluggable methods of authentication per zone Multiple levels of access control ACL on files and directories Group management Token based authentication (macaroons) X.509 in prep.

authentication and authorization Pluggable methods of authentication per zone Multiple levels of access control ACL on files and directories Group management Token based authentication (macaroons) X.509 in prep.

authentication and authorization Pluggable methods of authentication per zone Multiple levels of access control ACL on files and directories Group management Token based authentication (macaroons) X.509 in prep.

PROBLEM 7: EASY INTEGRATION USING API WITH EXTERNAL TOOLS Integrate external tools using rich API interfaces with data management platform and build more complex environments for data processing

RICH COLLECTION OF APIs APIs for all operations Flexible permission checking for APIs APIs for full eventually consistent integration with external systems API fully described using Swagger for generation of clients based on API specification Easy to use simple command line clients for REST API

PROBLEM 8: High-throughput processing Protocols CDMI Protocols S3 Protocols POSIX VFS P2P P2P Control, Remote Data Access CDMI API Storage Access Direct Access if possible Parallel Processing Nodes using POSIX oneclient, CDMI or REST P2P Ceph S3 SWIFT Lustre

THROUPUT TESTS 55Gbit/s On single node 5 parallel streams

High-throughput transfers Distributed Priority Queue For cluster to cluster transfers WAN Transfer started by: User in GUI API-s Policy Access to Rmt. Data Block-based transfer: Remote Data Access on the fly Pre-staging Data Migration Data Replication

EXAMPLE USECASE

Multi-Cloud Earth Observation Image Processing PoC JOB QUEUE EOProc SatCat PROCESSOR FUSE Client Oneclient EO-PoC Spaces JOB CREATION JOB LOCATION REASONING Sentinel 1 Sentinel 2 APPLICATION COMPSITION AND DEPLOYMENT WITH DOCKER COMPOSE JOB DEPLOYMENT Results VM MNG. API CLOUD ORQUESTRATOR CESNET Cyfronet IPT Poland AWS Parts of Sentinel 1 & 2 Data Parts of Sentinel 1 & 2 Data Parts of Sentinel 1 & 2 Data Parts of Sentinel 1 & 2 Data DATA LOCATION API DATA REPLICATION, PRE-STAGINING, ON DEMAND TRANSFER

Michał Orzechowski (CYFRONET AGH) QUESTIONS? Please visit: www.onedata.org Michał Orzechowski (CYFRONET AGH)