Big data analytics workflows for climate

Slides:



Advertisements
Similar presentations
Prof. Natalia Kussul, PhD. Andrey Shelestov, Lobunets A., Korbakov M., Kravchenko A.
Advertisements

© 2006 Open Grid Forum Astro-CG C. Vuerli - INAF.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High-Performance Computing
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
SCD in Horizon 2020 Ian Collier RAL Tier 1 GridPP 33, Ambleside, August 22 nd 2014.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
Slide 1 STO-CMRE/T2 GEOSS AIP-7 12 November 2014 Ocean Observations Web App GEOSS Architecture Implementation Pilot (AIP-7) 12 November 2014 A. Berni,
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Research and Innovation Research and Innovation Research and Innovation Research and Innovation Research Infrastructures and Horizon 2020 The EU Framework.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Darema Dr. Frederica Darema NSF Dynamic Data Driven Application Systems (Symbiotic Measurement&Simulation Systems) “A new paradigm for application simulations.
1/8 Enhancing Grid Infrastructures with Virtualization and Cloud Technologies Ignacio M. Llorente Business Workshop EGEE’09 September 21st, 2009 Distributed.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Objective 1.2 Cloud Computing, Internet of Services and Advanced Software Engineering Arian Zwegers European Commission Information Society and Media Directorate.
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
Global Framework for Climate Services 1 World Meteorological Organization Working together in weather, climate and water Global Framework for Climate Services.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
1 Addressing Critical Skills Shortages at the NWS Environmental Modeling Center S. Lord and EMC Staff OFCM Workshop 23 April 2009.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
ENV proposal meeting, Geneva, Sep. 24, Proposal Objectives Joost van Bemmelen, ESA
CLARIN work packages. Conference Place yyyy-mm-dd
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
1 Catania, 4 th EEGE User Forum/OGF 25, OurGrid integration with gLite based grids in EELA-2 Francisco Brasileiro Universidade.
The Climate-G testbed towards a large scale data sharing environment for climate change S. Fiore Scientific Computing and Operations Division, CMCC, Italy.
Context Workshop. Diepenbeek 22 january 2004 Agenda Introduction Work methodology Context description Description frameworks Conclusion Questions.
The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S. Fiore The Climate-G testbed is an interdisciplinary.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Processing services.
The Climate-G testbed towards a large scale data sharing environment for climate change S. Fiore Scientific Computing and Operations Division, CMCC, Italy.
A Computationally Efficient Platform to Examine the Efficacy of Regional Downscaling Methods AGU Fall Meeting Abstract GC12C-04 AGU Fall Meeting Abstract.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Next generation Science Gateways in the context of the INDIGO project: a pilot case on large scale climate-change data analytics Roberto Barbera, Riccardo.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
GEO 2016 Work Programme Giovanni Rum, GEO Secretariat GEO Work Programme Symposium Geneva, 2-4 May 2016.
ESA UNCLASSIFIED – For Official Use Scientific exploitation…. Ws input to the round table DD/MM/YYYY.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
Enabling scientific applications on hybrid e-Infrastructures: the FutureGateway framework Marco Fargetta (INFN), Riccardo Bruno (INFN), Roberto Barbera.
Percipient StorAGe for Exascale Data Centric Computing Exascale Storage Architecture based on “Mero” Object Store Giuseppe Congiu Seagate Systems UK.
NextGEOSS data hub incl. alpha release
Ecological Niche Modelling in the EGI Cloud Federation
Organizations Are Embracing New Opportunities
Pasquale Pagano (CNR-ISTI) Project technical director
User Interfaces: Science Gateways, Workflows and Toolkits
H2020, COEs and PRACE.
EUBrazil Cloud Connect project overview
Data Ingestion in ENES and collaboration with RDA
Presentation on Copernicus Dissemination
Ramesh Baral Team: Marjani Peterson, Andre Guerrero
Research Challenges of Autonomic Computing
Stakeholders consultation
Presentation transcript:

Big data analytics workflows for climate Sandro Fiore, Ph.D. Director of the Advanced Scientific Computing Division Euro Mediterranean Center on Climate Change (CMCC) On behalf of the Ophidia Team Research Data Alliance Earth System Science Data Management BOF Paris - September 24, 2015

Data analytics requirements and use cases Requirements and needs focus on: Time series analysis Data subsetting Model intercomparison Multimodel means Massive data reduction Data transformation (through array-based primitives) Climate change signal Maps generation Ensemble analysis Worflow support Tens, hundreds of tasks Metadata management support …. Big data analytics Framework Ophidia

Ophidia in a nutshell Big data stack for scientific data analysis Use of parallel operators and parallel I/O Support for complex workflows / operational chains Extensible: simple API to support framework extensions like new operators and array-based primitives currently 50+ operators and 100+ primitives provided Multiple interfaces available (WS-I, GSI/VOMS, OGC-WPS). Programmatic access via C and Python APIs Support for both batch & interactive data analysis Riassumendo, i principali contributi di questa tesi includono:… e infine l’implementazione del terminale…

EUBrazilCC project The main objective is the creation of a federated e-infrastructure for research using a user-centric approach. To achieve this, we need to pursue three objectives: Adaptation of existing applications to tackle new scenarios emerging from cooperation between Europe and Brazil relevant to both regions. Integration of frameworks and programming models for scientific gateways and complex workflows. Federation of resources, to build up a general-purpose infrastructure comprising existing and heterogeneous resources Data analytics workflows on heterogeneous datasets including climate, remote sensing data and observations (e.g. NetCDF, LANDSAT, LiDAR) EU Coordinator Ignacio Blanquer-Espert, iblanque@dsic.upv.es Universitat Politècnica de València, Spain BR Coordinator Francisco Vilar Brasileiro, fubica@dsc.ufcg.edu.br Universidade Federal de Campina Grande, Brazil

Climate indicators processing in CLIP-C Processing chains for data analysis are being defined to compute climate indicators First set of indicators includes: TNn, TNx, TXn, TXx Input files: 6GBs TasMin, TasMax Workflows have been already implemented Validation phase is ongoing Preliminary test are ongoing both on the private cloud environment of CMCC Parallel approach Inter-parallelism: Multiple branches are executed in parallel Intra-parallelism: data analysis operators have been parallelized too (e.g. MPI) Parallel I/O back-end EURO4M-MESAN Co-ordinator: Martin Juckes (STFC) Website: http://www.clipc.eu/home

Operational FIre Danger preventIon plAtform OFIDIA main objective is to build a cross-border operational fire danger prevention infrastructure that advances the ability of regional stakeholders across Apulia and Ioannina Regions to detect and fight forest wildfires OFIDIA: Operational FIre Danger preventIon plAtform Co-ordinator: Prof. G. Aloisio (CMCC) Website: http://www.ofidia.eu/

OFIDIA: Operational FIre Danger preventIon plAtform Zoom on the Fire Weather Index computation

OFIDIA: Operational FIre Danger preventIon plAtform 09:00 UTC 12:00 UTC Zoom on the temperature forecast maps computation

Workflow runtime execution regarding three fire danger indices processing https://www.youtube.com/watch?v=vxbYF1Zhpuc&feature=youtu.be

INtegrating Distributed data Infrastructures for Global ExplOitation INDIGO-DataCloud is a project approved within the E-INFRA-1-2014 call of the Horizon 2020 framework program of the European Community. It aims at developing a data/computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. It aims at targeting multiple case studies related to different domains “Climate Model Inter comparison Data Analysis” … Interoperability with application domain specific software and services (e.g. IS‐ENES/ESGF) Server-side approach for data analysis Two-level workflows to fully address the case study requirements Co-ordinator: Dr. Davide Salomoni (INFN) Website: https://www.indigo-datacloud.eu/

Thanks - File system management (workspace with datacube objects) - Metadata management -> provenance - Parallel datawarehouse, OLAP datacbube primitives/operators (paradigm agnostic, so the framework enables both Map-reduce like operators and parallel one with a stronger MPI component) - Hierarchy at the storage level quite flexible Convergence point between sql and nosql worlds Ophidia and Spark fall in the big data area, but Ophidia has a stronger focus on multidimensional data, OLAP, high performance paradigms, scientific data/applications/dataflows. SciDB array-based primitives Rasdaman ---> Earth Server abstraction --> datacube non array data management, parallel I/O embedded/native data partitioning distribution fall in the same big data area relies on high performance database management techniques for I/O it also provides a native in-memory engine, even though it is not in production yet iniziato nel 2009 climate change it's our primary use case, we are planning to exploit it in a bioinformatics project with the department of biology at our university deploy su cloud WPS PyWPS

Mapping a high-level use case description onto a concrete analytics workflow A Data Analytics Workflow Modelling Language (DAWML) has been defined Extensible schema jointly defined with application-domain scientists The schema allows the definition of abstract workflows papers + summary To define a common backbround Cosimo Palazzo, A. Mariello, Sandro Fiore, Alessandro D'Anca, Donatello Elia, Dean N. Williams, Giovanni Aloisio: A workflow-enabled big data analytics software stack for escience. HPCS 2015: 545-552