Climate Data Analytics in a Big Data world

Slides:



Advertisements
Similar presentations
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Advertisements

Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
IS-ENES [ees-enes] InfraStructure for the European Network for Earth System Modelling IS-ENES will develop a virtual Earth System Modelling Resource Centre.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
Initiatives toward Climate Services in France and in the European Communities C. Déandreis (CNRS/IPSL); M. Plieger and W. Som de Cerff (KNMI); Ph. Dandin,
CERFACS: Christian Pagé Laurent Terray Météo-France: Philippe Dandin Pascale Delecluse Serge Planton Jean-Marc Moisselin Maryvonne Kerdoncuff High resolution.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
The New Zealand Institute for Plant & Food Research Limited Use of Cloud computing in impact assessment of climate change Kwang Soo Kim and Doug MacKenzie.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Processing services.
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
Welcome to the PRECIS training workshop
CIRCLE-2 Final Conference Adaptation Frontiers: Conference on European Climate Adaptation Research and Practice March 2014, Lisbon, Portugal The.
Next generation Science Gateways in the context of the INDIGO project: a pilot case on large scale climate-change data analytics Roberto Barbera, Riccardo.
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
Jost von Hardenberg ISAC-CNR, Torino, Italy with Paolo Davini, Susanna Corti, and many others EUDAT User Forum, Rome,Italy 3-4 February, 2016.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
ICOS To collect high-quality observational data relevant to the greenhouse gas budget of Europe To make the ICOS data freely available to all interested.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
RI EGI-InSPIRE RI Earth science e-infrastructures workshop Diego Scardaci, EGI.eu Technical Outreach Expert.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
WMO WIS strategy – Life cycle data management WIS strategy – Life cycle data management Matteo Dell’Acqua.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Accessing the VI-SEEM infrastructure
PIDs in EUDAT Webinar, 15 Februari 2013
Ecological Niche Modelling in the EGI Cloud Federation
The BlueBRIDGE project
This work is licensed under the Creative Commons CC-BY 4.0 licence.
The EUDAT Services Suite
Approaches and Challenges in Managing Persistent Identifiers
Tokamak data mirror for JET and MAST Moving towards an open data repository for European nuclear fusion research.
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
EUDAT’s engagement with the Earth Sciences
Alessandro Spinuso, Andreas Rietbrock, Andrè Gemuend,
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Supporting Research on Biodiversity: LifeWatch on the Cloud
ICOS on-demand atmospheric transport computation A use case for interoperability of EGI and EUDAT services Ute Karstens, André Bjärby, Oleg Mirzov, Roger.
INTAROS WP5 Data integration and management
Data Ingestion in ENES and collaboration with RDA
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
DI4R, 30th September 2016, Krakow
Research Data Collections WG Plenary 9 Barcelona
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Data Access and Re-use Carl Johan Håkansson EUDAT Service Area Manager
Monitoring of the infrastructure from the VO perspective
EISCAT-3D: a data centric design for extreme scale computing
DATA SPHINX & EUDAT Collaboration
CMIP6 / ENES Data TF Meeting: DKRZ
Production and use of regional climate model projections at the Swedish Meteorological and Hydrological Institute Erik Kjellström Rossby Centre,
ELIXIR Competence Center
European Research Data Services, Expertise & Technology Solutions
Applying Data Warehousing and Big Data Techniques to Analyze Internet Performance Thiago Barbosa, Renan Souza, Sérgio Serra, Maria Luiza and Roger Cottrell.
Copernicus Data in the EGI infrastructure
Quoting and Billing: Commercialization of Big Data Analytics
DATATURB Direct simulation data of turbulent flows
RDA uptake activities and plans: ESGF
EOSC-hub T6.1 Meeting 08 February 2018 Amsterdam Science Park
Digital Object Management for ENES: Challenges and Opportunities
Joining the EOSC Ecosystem
Expand portfolio of EGI services
EOSC-hub Contribution to the EOSC WGs
Photon & Neutron working meeting
Presentation transcript:

Climate Data Analytics in a Big Data world Christian Pagé Research Engineer Xavier Pivan Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS), Toulouse, France

Motivations Scientific Perform efficient Data Analysis Large number of realizations (ensemble of scenarios) Uncertainties range estimation Process Higher spatial and temporal resolution Easily share intermediate results with collaborators Achieve a more robust and flexible Data Life Cycle More robust experiments setup Explore several experiment configurations to answer scientific questions Reproducible experiments

Motivations Societal Provide climate projections data to climate change impact researchers, facilitators, practitioners Ease access with better intuitive interfaces Provide more common data formats Generate tailored products from data processing workflows http://climate4impact.eu

Current situation Climate Research Community Data available for scientific analysis: a very large trend Limitations in data access means limitations in data analytics and scientific results Download locally then Analyze: a workflow that cannot be sustained Climate researchers Impact researchers

Practical Example: Climate Community Current situation Practical Example: Climate Community Temperature at 850 hPa field (Aggregated files 30 levels) 10 climate models 1960-1990 & 2040-2070 = 60 years = 21 915 days Daily fields = 1 field per day Global spatial scale 100 km resolution TOTAL: 6 754 500 fields to download ~100 Kb per 2D field = 626 Gb After the analysis post-processing Anomaly of the average of the two periods over a specific country for each climate model Result: 10 times 2D fields over a small domain Estimated datasize after post-processing: 1 Mb Federation Service Data reduction...

Climate Data Distribution: ESGF EGU 7.05.2014 ESGF Data Nodes 2015: 40 worldwide 18 in Europe (coordinated in IS-ENES) IS-ENES ESGF Portals BADC (UK) DKRZ (Germany) IPSL (France) SMHI (Sweden) CMCC (Italy) DMI (Denmark) IS-ENES climate4impact Portal KNMI (Netherlands) Interlinked with Uni. Cantabria downscaling portal (Spain) CLIPC Portal Climate Information Portal for Copernicus Ack: Michael Lautenschlager, DKRZ

Current situation Status CMIP5 data archive: 1.8 PB for 59000 data sets stored in 4.3 Mio Files in 23 ESGF data nodes CMIP5 data is about 50 times CMIP3 Extrapolation to CMIP6: CMIP6 has a more complex experiment structure than CMIP5. Expectations: more models, finer spatial resolution and larger ensembles Factor of 20: 36 PB in 86 Mio Files Factor of 50: 90 PB in 215 Mio Files

Simplified Prototype ENES Use Case Researcher finds data in B2SHARE using B2FIND, or provides PIDs/URLs 2. Researcher performs Data Analytics of selected data using GEF deployed on EGI FedCloud. Output is stored into EGI DataHub. 3. Results are sent back to B2SHARE/B2DROP for researcher to download, or execute another GEF for further calculations or to generate a figure. Resulting figure could be put into B2DROP.

Solutions: Putting it all together Bridge EUDAT / EGI / ESGF / IS-ENES EUDAT Workflow API (GEF)  EGI Federated Cloud EUDAT B2SAFE/B2STAGE/B2SHARE & B2DROP Services ESGF Computing API WPS IS-ENES Data Analytics Services => climate4impact.eu platform Detailed technical prototype schema follows.... along with needs wrt resources

Prototype overview EGI Federated cloud LOCALHOST Transfer with globus B2SAFE B2STAGE LOCALHOST B2SHARE/B2DROP GEF – User Interface Send request / calculation order Data URL or PID Transfer with globus Virtual Machine New data GEF backend deploy docker container Execute calculation command Docker Volume EGI Volume Data New data Data transfer New data In progress Calculation Input EGI Federated cloud Output EUDAT service

Infrastructure in progress EGI infrastructure IAAS: Virtual Machine – CPU / RAM Storage - Docker engine – GEF backend

EUDAT – EGI interoperability We have: IAAS structure VM: CPU – RAM – Storage Docker engine We want to use / set up: Globus Interoperability B2SAFE-B2STAGE / EGI