Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big data analytics workflows for climate

Similar presentations


Presentation on theme: "Big data analytics workflows for climate"— Presentation transcript:

1 Big data analytics workflows for climate
Sandro Fiore, Ph.D. Director of the Advanced Scientific Computing Division Euro Mediterranean Center on Climate Change (CMCC) On behalf of the Ophidia Team Research Data Alliance Earth System Science Data Management BOF Paris - September 24, 2015

2 Data analytics requirements and use cases
Requirements and needs focus on: Time series analysis Data subsetting Model intercomparison Multimodel means Massive data reduction Data transformation (through array-based primitives) Climate change signal Maps generation Ensemble analysis Worflow support Tens, hundreds of tasks Metadata management support …. Big data analytics Framework Ophidia

3 Ophidia in a nutshell Big data stack for scientific data analysis
Use of parallel operators and parallel I/O Support for complex workflows / operational chains Extensible: simple API to support framework extensions like new operators and array-based primitives currently 50+ operators and 100+ primitives provided Multiple interfaces available (WS-I, GSI/VOMS, OGC-WPS). Programmatic access via C and Python APIs Support for both batch & interactive data analysis Riassumendo, i principali contributi di questa tesi includono:… e infine l’implementazione del terminale…

4 EUBrazilCC project The main objective is the creation of a federated e-infrastructure for research using a user-centric approach. To achieve this, we need to pursue three objectives: Adaptation of existing applications to tackle new scenarios emerging from cooperation between Europe and Brazil relevant to both regions. Integration of frameworks and programming models for scientific gateways and complex workflows. Federation of resources, to build up a general-purpose infrastructure comprising existing and heterogeneous resources Data analytics workflows on heterogeneous datasets including climate, remote sensing data and observations (e.g. NetCDF, LANDSAT, LiDAR) EU Coordinator Ignacio Blanquer-Espert, Universitat Politècnica de València, Spain BR Coordinator Francisco Vilar Brasileiro, Universidade Federal de Campina Grande, Brazil

5 Climate indicators processing in CLIP-C
Processing chains for data analysis are being defined to compute climate indicators First set of indicators includes: TNn, TNx, TXn, TXx Input files: 6GBs TasMin, TasMax Workflows have been already implemented Validation phase is ongoing Preliminary test are ongoing both on the private cloud environment of CMCC Parallel approach Inter-parallelism: Multiple branches are executed in parallel Intra-parallelism: data analysis operators have been parallelized too (e.g. MPI) Parallel I/O back-end EURO4M-MESAN Co-ordinator: Martin Juckes (STFC) Website:

6 Operational FIre Danger preventIon plAtform
OFIDIA main objective is to build a cross-border operational fire danger prevention infrastructure that advances the ability of regional stakeholders across Apulia and Ioannina Regions to detect and fight forest wildfires OFIDIA: Operational FIre Danger preventIon plAtform Co-ordinator: Prof. G. Aloisio (CMCC) Website:

7 OFIDIA: Operational FIre Danger preventIon plAtform
Zoom on the Fire Weather Index computation

8 OFIDIA: Operational FIre Danger preventIon plAtform
09:00 UTC 12:00 UTC Zoom on the temperature forecast maps computation

9 Workflow runtime execution regarding three fire danger indices processing

10 INtegrating Distributed data Infrastructures for Global ExplOitation
INDIGO-DataCloud is a project approved within the E-INFRA call of the Horizon 2020 framework program of the European Community. It aims at developing a data/computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. It aims at targeting multiple case studies related to different domains “Climate Model Inter comparison Data Analysis” Interoperability with application domain specific software and services (e.g. IS‐ENES/ESGF) Server-side approach for data analysis Two-level workflows to fully address the case study requirements Co-ordinator: Dr. Davide Salomoni (INFN) Website:

11 Thanks - File system management (workspace with datacube objects)
- Metadata management -> provenance - Parallel datawarehouse, OLAP datacbube primitives/operators (paradigm agnostic, so the framework enables both Map-reduce like operators and parallel one with a stronger MPI component) - Hierarchy at the storage level quite flexible Convergence point between sql and nosql worlds Ophidia and Spark fall in the big data area, but Ophidia has a stronger focus on multidimensional data, OLAP, high performance paradigms, scientific data/applications/dataflows. SciDB array-based primitives Rasdaman ---> Earth Server abstraction --> datacube non array data management, parallel I/O embedded/native data partitioning distribution fall in the same big data area relies on high performance database management techniques for I/O it also provides a native in-memory engine, even though it is not in production yet iniziato nel 2009 climate change it's our primary use case, we are planning to exploit it in a bioinformatics project with the department of biology at our university deploy su cloud WPS PyWPS

12 Mapping a high-level use case description onto a concrete analytics workflow
A Data Analytics Workflow Modelling Language (DAWML) has been defined Extensible schema jointly defined with application-domain scientists The schema allows the definition of abstract workflows papers + summary To define a common backbround Cosimo Palazzo, A. Mariello, Sandro Fiore, Alessandro D'Anca, Donatello Elia, Dean N. Williams, Giovanni Aloisio: A workflow-enabled big data analytics software stack for escience. HPCS 2015:


Download ppt "Big data analytics workflows for climate"

Similar presentations


Ads by Google