- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL
- Vendredi 27 mars Context : countdown of the GIEC/IPCC report End of 2009 Fall 2010 : Climate simulations End of 2010 ? : Data Distribution End of 2010 Early 2012 : Scientific publications Early 2013 : Report publication GIECC/IPCC AR5 (Assessment Report #5) Octobre 2013 : Nobel price
- Vendredi 27 mars Context : National and European Project PRODIGUER : project submitted in september 2008 to the GIS climat In the wake of IS-ENES (FP7), Virtual Earth System Modeling resources Centre, Metadata standard and Metafor (FP7) metadata standard for climate modeling Implementation of these tools at national level and integration to International effort Must be done in close collaboration with national computing centers
- Vendredi 27 mars ESG/CMIP5 Timeline 2008: Design and implement core functionality: Browse and search Registration Single sign-on / security Publication Distributed metadata Server-side processing Early 2009: Testbed By early 2009 it is expected to include at least seven centres in the US, Europe and Japan: Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.), National Centre for Atmospheric Research - NCAR (U.S.), Geophysical Fluid Dynamics Laboratory - GFDL (U.S.), Oak Ridge National Laboratory - ORNL (U.S.), British Atmosphere Data Centre - BADC (U.K.), Max Planck Institute for Meteorology - MPI (Germany), The University of Tokyo Centre for Climate System Research (Japan). 2009: Deal with system integration issues and develop production system. By summer 2009, the hardware and software requirements will be provided to centres that want to be Nodes. 2010: Modelling centres publish data : Research and journal articles submissions 2013: IPCC Report
- Vendredi 27 mars AR5 open issues What are the set of runs to be done and, derived from that, the expected data volumes we can expect? Expected participants – where will data be hosted? (Who is going to step up and host the data nodes, and provide the level of support expect in terms of manpower and hardware capability.). This includes minimum software and hardware data holding site requirement (e.g. ftp access and ESG authentication and authorization) and a skilled staff help desk. The AR5 archive is to be globally distributed with support for WG1, WG2, and WG3. Will there be a need for a central (or core) archive and what will it look like? Replication of holdings - disaster protection, a desire to have a replica of the core data archive on every continent, etc. Number of users and level of access – scientist, policy makers, economists, health officials, etc.
- Vendredi 27 mars
7
8 Orders of magnitude Climate models, centennial runs. Resolutions used Atmosphere 2.5° (280 Km) : 144 x 143 x 39 Ocean 2° (220 Km) : 180 x 149 x 31 Atm 2.5° - Ocean 2° : 20 GB/y, 300 ans 5,85 TB Atm 1.0° - Ocean 2° : 60 GB/y, 300 ans 17,5 TB Atm 0.5° - Ocean 0,5° : 400 GB/y, 30 ans 11,75 TB
- Vendredi 27 mars Global data amount Raw Data amount low bound 565 TB Raw Data amount high bound 1000 TB CMIP5 Distribution (25-50%) ( ) ( ) TB Global Storage (Raw+Distributed) TB LMDz 0.5° (50 Km)
- Vendredi 27 mars Management of data since years Mainly centralised, store on a SAN OpenDap access on Supercomputing Centre Basic system of data retrieval Access to raw data Security/Authentication/Restriction to data access : not an issue No on demand post-processing No metadata integration No support for high level database query
- Vendredi 27 mars Data management with Prodiguer Move the data a minimum, keep them close to supercomputing centres if possible Data access protocol, strong links with computing centres When data needs to be moved do it quickly and with a minimum amount of human intervention Management of storage resources, fast network Keep a track of what we got, particularly what is on deep storage Metadata et data catalogues Exploiting of federation of sites Grid middleware Data grid ?