IST E-infrastructure shared between Europe and Latin America Climate Application Jose M. Gutierrez Valvanuz Fernandez Antonio S. Cofiño Fernando García Jesús Fernandez Richard Miguel San Martín Mauricio Carillo Gabriela Rosas Amelia Diaz Delia Acuña Rodrigo Abarca Claudio Baeza UC-SpainSENAMHI-PerúUDEC-Chile
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Enabling grid computing for climate model simulation: Challenges Global circulation models provide a coarse description of the ocean and atmosphere (200km resolution) and have to be linked to regional models to obtain useful representations over areas of interest. CAM and WRF are open-source state of the art global and regional models. They need to be run in cascade: Sea surface temperature CAM WRF output converter NCAR Graphics library Regional models depend on many parameters related to sub-grid physical processes (multi-parametric jobs). CAM + WRF
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Enabling grid data access to simulations: Challenges Binary files with meteorological formats (netCDF, GRIB, BUFR, HDF5, etc.) need to be partially accessed (e.g. a certain geographical region). THREDDS (Thematic Realtime Environmental Distributed Data Services) project is developing middleware to bridge the gap between data providers and data users. A recent initiative, the Earth System Grid (ESG) project, have made an initial attempt to griddify this technology. To this aim, OpenDAP data servers are included within grid infrastructure and data enters into grid storage elements when it is first requested to OpenDAP servers.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Enabling data mining applications on simulations: The high-dimensional character of the data involved in climate simulations requires efficient data mining techniques to extract some useful knowledge. Unsupervised clustering allows partitioning the simulation databases, producing characteristic weather or climate types (or groups) governing the global dynamics. Self-Organizing Maps (SOM) is one of the most popular clustering algorithms, which is especially suitable for high dimensional data visualization and modeling. The weather types can be locally projected to obtain statistical regional forecasts of variables of interest. (Right) Precipitation at two different stations in Peru for a El Niño period. Challenges
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Climate Cascade Demo Ensemble prediction systems comprise multiple runs of a weather model with slightly different initial conditions and/or model parameterizations. The resulting simulations contain valuable information about the sampled sources of uncertainty. Sea surface temperature CAM WRF (par 1) WRF (par 2) One El Niño year 365 simulations … WRF (par n) … SE SOM Compare the SOM distribution of each parameterization.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), CAM: Community Atmospheric Model The Community Atmosphere Model (CAM) is the latest in a series of global atmosphere models developed at NCAR for the weather and climate research communities. –grid size: 128 x 64 x 27 (XYZ) = gridpoints –6 output time steps = 197MB NetCDF -> 33MB/tstep –This includes ALL default variables (32x3D + 56x2D) –WRF only requires as input 5x3D and 9x2D (effective MB: 5/step = 620MB/month(6hly input). 720GB per 100 Years –1 day takes 8mins, then 1 Month is 4 hours. 1 Year 48 hours. 10 Years 20 days. 100 Years takes 7 months of computer simmulation. A case study simmulating the climate of the past century It will require a CAM job running 7 months. Then Checkpoints is an important feature.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), WRF: Weather Research and Forecasting Model The Weather Research and Forecasting (WRF) Model is a next-generation mesocale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs, developed at NCAR and contributed by the research community. –The current example uses grid dimensions: 74x61x28 (XxYxZ) = gridpoints –Time step: 1.5 min (40 steps/h) –Iberia Peninsula region: Grid 63x4x31, points, 24h takes 10‘ (Multiple jobs for each CAM run) 5.2MB/tstep. 1.1GB per Month. 1.5TB per 100 years (3hly step) A CAM job will produce multiple WRF jobs during the climate simulation. How these jobs will be triggered?.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), NetCDF (network Common Data Form) NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. CAM netCDF datasets will be accessed from WRF simulations, but CAM data is Global and WRF will need only to access to a subregion.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), NcML: The netCDF Markup Language Metadata extraction from NetCDF datasets describing the contents of the dataset. All dataset generated will suitable for searching and retrieval, helping to the scientist querying to the GRID about past an ongoing simulations.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), CAM + WRF in the GRID Working in the grid Working in local Not working, yet Cam2wrfWRFSIWRFGraphics netCDF Catalog (LFC) MetaCatalog (AMGA) netCDF XML CAM CAM and WRF are running in the EELA testbed like separated process. Both produce and consume datasets from the file catalog in NetCDF. Metadata from datasets is generated in XML and it is processed to be inserted in AMGA.
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Structure of the current status There are a "static" repository in the file catalog: and another one with the updated modules: following structure will be created to run the CAM+WRF suite in WN: The output is the tar-ed 'output' directory stored in
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), Job Description JDL file Shell Script executed by WN
IST E-infrastructure shared between Europe and Latin America EGRIS-1, Itacuruçá (Brasil), What is expected from EGRIS DAGs and Checkpointable job submission. –Restart of jobs with dependencies. Using metadata catalog from worker nodes: –Loading metadata with AMGA API from WN. –Integration of the metadata catalogs and datasets catalogue Data access protocol to datasets. –OpenDAP service in the Storage Element. Development of a portal for job submission and monitoring: –Authentication management from portal –Monitoring status of jobs. –Retrieval of information from metadata catalog