DI4R, 30th September 2016, Krakow

Slides:



Advertisements
Similar presentations
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Advertisements

Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
Role Activity Sub-role Functional Components Control Data Software.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
EMSODEV-EGI ConfCall 12 February T6.5 WP4 T6.3 T6.5 T6.4 Real-time ingestionAsynch. ingestion Data ingestion speed Regional Node X Regional Node.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI OpenSource GeoSpatial Catalogue Platform-as-a-Service Salvatore Pinto Cloud.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
A Big Data approach for ocean observations: the EMSODEV data management platform experience on top of the EGI FedCloud EGI Conference, 8 th April 2016,
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
ETRIKS Platform for bioinformatics ISGC 17/03/15 Pengfei Liu, CC-IN2P3/CNRS.
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number Federated Cloud Update.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Overview for ENVRI Gergely Sipos, Malgorzata Krakowian EGI.eu
The EGI Federated Cloud
The Big Data Network (phase 2) Cloud Hadoop system
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Accessing the VI-SEEM infrastructure
More than IaaS Academic Cloud Services for Researchers
Big Data Enterprise Patterns
Smart Building Solution
StratusLab First Periodic Review
Service Fabrik Manage Enterprise Grade Services
Federated Cloud Computing
Open Source distributed document DB for an enterprise
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Fernando Aguilar, IFCA-CSIC
Data Ingestion in EMSO Presented by Marco Pappalardo
Smart Building Solution
Introduction to Data Management in EGI
Introduction to Grid Technology
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
PROCESS - H2020 Project Work Package WP6 JRA3
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
The Brocade Cloud Manageability Vision
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Solutions for federated services management EGI
Data catalogues and the data repository ADMIRe JISC MRD
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Climate Data Analytics in a Big Data world
Big Data - in Performance Engineering
Introduction to D4Science
Case Study: Algae Bloom in a Water Reservoir
EGI FedCloud in Digital Humanities
Thales Alenia Space Competence Center Software Solutions
Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve.
Future Internet: Infrastructures and Services
Designed for powerful live monitoring of larger installations
MMG: from proof-of-concept to production services at scale
Agenda Need of Cloud Computing What is Cloud Computing
Harrison Howell CSCE 824 Dr. Farkas
Customer 360.
Presentation transcript:

DI4R, 30th September 2016, Krakow Collection and Analysis of Ocean Big Data: Building the EMSODEV Data Management Platform using EGI Federated Cloud DI4R, 30th September 2016, Krakow Pasquale Andriani pasquale.andriani@eng.it

..about EMSODEV

EMSODEV scenario DATA MANAGEMENT PLATFORM Data ingestion Data access Real-time Asynch. Data ingestion Ingestion speed Data access DMP tools

DMP Design 1/2 ENVRI Reference Model v2.0 Data acquisition Data curation Data publishing Data processing Data use Computational Viewpoint (CV) has been used to identify a standard set of components (CV Objects) and interfaces that inspired the design of the EMSODEV DMP architecture in different phases

EMSODEV DATA MANAGEMENT PLATFORM DMP Design 2/2 EMSODEV DATA MANAGEMENT PLATFORM <<external resource>> EMSO Regional Nodes Data Files <<experimental lab>> EMSODEV API <<security service>> Authentication &Authorization Tool <<virtual lab>> DMP Tools data use <<experimental lab>> Data Analysis Tool <<instrument controller>> Sensor Observation Service data acquisition <<data transfer service>> Transfer Flow Orchestrator <<raw data collector>> Push Transfer Flow Pull Transfer Flow <<data store controller>> NoSQL DBs Streaming Store Controller Distributed File System Time Series DB <<catalogue service>> Metadata and Service Repository <<data exporter>> Dataset Exporter <<data importer>> Processing Results Importer Regional Node Importer <<data stager>> Stager Engine data curation <<process controller>> Batch Processor Engine Streaming Processor Engine <<coordination service>> Analysis Manager data processing Mapping of ENVRI CV Objects to EMSODEV DMP architectural components Instrument Controller in the Data Acquistion <<data broker>> Broker Engine data publishing

DMP infra. on EGI FedCloud EMSODEV DMP current prototype Test VO: fedcloud.egi.eu Cloud Compute: 8 VMs (8 CPUs + 16GB RAM + 40GB HD) EMSODEV DMP (requested SLA request in early August 2016) Production VO: vo.emsodev.eu Cloud Compute: ~10 VMs (8 CPUs + 16GB RAM + 40GB HD) File Storage: 5 TB

DMP Operation – Apache Ambari Dashboard for provisioning, managing, monitoring and securing the EMSODEV cluster hosting the EMSODEV DMP.

Data Acquisition and Curation At this stage, two raw data collectors exist: A Pull Transfer Flow: data is retrieved via API exposed by an OGC SOS server available at the OBSEA observatory located in Vilanova and managed by Universitat Politecnica De Catalunya. SOS server API GetCapabilities EMSODEV DATA MANAGEMENT PLATFORM GetObservation OBSEA data DescribeSensor A Push Transfer Flow: data is sent to a DMP service which “listens” to near-real time updates on XML files describing sensors data and measurements.

Data Publishing and Use Real-time dashboard solution for time-series data analysis After the storing phase, data is visualized by using a real-time dashboard, in particular we are testing elasticsearch with Grafana. The choice of this two tools has been made because elasticsearch allow Real-time data search.. . . . . And with Grafana is possible to have real-time summary and charting and both are Open source and under Apache 2 license that is one of the common licenze type in research project. Real-time data search Real-time advanced analytics Schema-free Real-time summary and charting Apache2 Open Source License Distributed, scalable, and highly available

DMP preliminary REST API Built and managed through: Swagger Editor Swagger UI Swagger CodeGen

Advantages of using EGI FedCloud A ready-to-use IaaS where to deploy on-demand IT services Easy VM and security management via OpenStack Horizon Scalable according to community needs (within the boundaries established through SLA) Secure VM access via a mechanism (VOMS credentials) based on proxy credentials issued and verified by EGI Fast and reliable support (ggus.eu trouble-ticketing and by mail)

Questions Pasquale Andriani Engineering Ingegneria Informatica SpA Italy pasquale.andriani@eng.it