Presentation is loading. Please wait.

Presentation is loading. Please wait.

GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June 19-21 2006, LLNL.

Similar presentations


Presentation on theme: "GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June 19-21 2006, LLNL."— Presentation transcript:

1 GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

2 Outlines  GFDL Data Portal Hardware Upgrade  Data Portal Statistics  Metadata Database design for Data Portal usage and for whole modeling process The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

3  Dell PowerEdge 2850  Two Intel 3.2GHz Xeon processors  2GB RAM  300GB system disk  Two QLogic QLA2340 fiber channel controllers (2Gb/s)  Red Hat Enterprise Linux 4.0 ES operating system  Ten StorageTek FlexLine FLC200 fiber channel disk arrays  Fourteen 250GB SATA drives per array  140 drives, total 35TB raw (27TB usable)  Increasing by 40% data transferring and processing speed  Future plan is to double storage capacity every 2 yrs Data Portal Hardware Upgrade The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

4 Data Statistics 01-Oct-2004 to 1-June-2006 01-Oct-2004 to 1-June-2006  Total amount of data: 8 TB (increased by 50% for 1 yr)  12,500 NetCDF files, average file size: 650 MB  Distinct files requested: 6,000  Distinct hosts served: ~1,200  Data transferred: ~20 TB (increased in 2 times)  Average data transferred per day: ~25 GB The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

5  There is already progress done here in CGAM (NMM Suite); also Curator project is devoted partially to developing model and model output metadata standards. Those ideas and discussions were extremely useful for our design.  For comprehensive data analysis Data Portal should give description not only data but also how this data was generated.  It should use the same metadata database as modeling system (Flexible Runtime Environment). This database is a joining element of whole system.  Analysis of existing data through Data Portal will help to modelers in improving models and planning new experiments.  Thus Data Portal can be considered not as a separate independent system, but subsystem of modeling system The 5 th GO-ESSP Workshop June 19-21 2006, LLNL Metadata Database Design

6 Common functionality schema of modeling system The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

7 Metadata Database usage on different stages of modeling process The 5 th GO-ESSP Workshop June 19-21 2006, LLNL Metadata Database Component Building Model Composition Experiment Preparation Postprocessing Plan Data Portal Service

8 Main Database Compartments and their relationships The 5 th GO-ESSP Workshop June 19-21 2006, LLNL 4. Composition 5. Simulation 1. Domains 2. Physical Processes 3. Algorithmization

9 Scheme Rationales  Process Domains: arenas where physical processes play.  Physical Process: descriptions of accepted theoretical approaches for given processes considered in modeling.  Algorithmization: describes program modules of elementary physical processes  Composition: components, couplers; drivers; technical environment  Simulation: describe model output data and its location, including all accompanied administrative information. The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

10 Process Domains They define phase spaces of the equations expressing in mathematical form physical phenomena. Also they serve as containers where elements are put (gases, aerosols, where elements are put (gases, aerosols, clouds). It contains common descriptions clouds). It contains common descriptions and sets of elements constituent domain. and sets of elements constituent domain. Examples: atmosphere or ocean 3D space Examples: atmosphere or ocean 3D space for dynamics. for dynamics.

11 Physical Processes  It contains theoretical assumptions, full description, references and other information specific for process.  Identified by name and domain where they act.  Described individually in different tables.  All process tables have subset of the same fields: process id process name domain full description. Others reflect process specific.  Process name and domain are the one of the criteria for preventing to include the same process into component or coupled model twice. The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

12 Algorithmization  Process codebase – set of modules implementing process including input data description (namelists and datasets) and accompanied with CVS tag  Numeric artifices – set of modules implementing numeric smoothing (filters, artificial viscosity, general algorithms, etc)  Tracer models – descriptions with pointing to fieldtables files associated with tracers  Grid specs  Boundary conditions  Namelists & datasets (model parameters), fieldtables (tracers) – their locations, versions, descriptions, checksums. The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

13 Composition  Main actors here are components.  Component can be of 2 types – physical component and coupler.  Component consists of modules.  Modules constituent of component are defined by physical process to be participating in final model. These set of modules are described in Algorithmization part of database.  Another entity of Composition compartment is a driver. It is a program unit responsible for running components (solely or as whole coupled model).  Component is a minimal unit capable to be run by driver  Components have PMIOD description and system should make decision about components compatibility using it. Other criteria working at component building stage is that there should not be two the same processes of the same domain in component or in couple model.  Coupled Model table describes set of components are member of final coupled model The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

14 Simulation  Institution  Author  Project  Scenario  Experiment  Realization  Postprocessing plan  Variables  Variable bundles  Metadata standards  Data fields  Files Contains tables having full description of conducted experiment that includes: The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

15 Process Domain Process Algorithmization Composition Simulation Domains Atmosphere Ocean Ice Land Surf_Boundary Layer Rivers Lakes Compartment Structure of Curator Database Dynamics Radiation IceProc BiotaProc Hydrology CloudProc Chemistry Convection ProcCodeBase NumArtificies GridSpecs BoundCond TracerModels NameLists DataSets Components CmpPMIOD CmpDrivers Services Versioning Compiling PlatformEnv Others...Projects Experiments Scenarios PostProc Variables OutDataFields OutDataFiles Realizations DomConstituents The 5 th GO-ESSP Workshop June 19-21 2006, LLNLInitCond CouplModels

16 Modes of working with database  Research mode - modeler introduces new physical processes in modeling or new algorithmizations and new components from newly developed modules for future usage in coupled models. New components are to be described in database. The model runs conducted for this developed purpose are not to be recorded in DB excepting final ones proving physical correctness of new approach.  Production mode – experimenter composes coupled model from available components described in database, builds scenario, postprocessing plan and runs experiment. All this activity is recorded in database. Thoroughly elaborated very friendly GUI is critical need for these modes otherwise users will avoid the database based way of working, DB will be empty, project will fail. Thoroughly elaborated very friendly GUI is critical need for these modes otherwise users will avoid the database based way of working, DB will be empty, project will fail.  Automatic mode – applications fill metadata into database grabbing it from data files or reads metadata for their needs during execution. The most progress was done here with usage Simulation compartment of Curator DB The most progress was done here with usage Simulation compartment of Curator DB The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

17 Current usage of Curator DB  Currently the Simulation part of DB is designed for operational usage and it’s kept updated and used in Data Portal activity.  DB serves for GFDL Data Portal web site for data discovery and navigation: IPCC CM2.1. The daemon screens Data Portal storage seeking newly put data files and records metadata extracted from files and system information about them into DB. IPCC CM2.1IPCC CM2.1  It’s used for bringing metadata consistency data files on Data Portal with standards defined in DB. The application accesses to DB for metadata standard assumed for given file and compares/fixes it in the file.  It’s used by automatic tool for configuring DODS Aggregation Server. The tool checks the experiment status (public/not public) into DB and requests all needed metadata for generating DODS xml configuration file and creates this file. The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

18 Tables examples - 1 TracerModel ProcCodeBase Algorithmization Radiation Dynamics Physical Processes The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

19 Tables examples - 2 Simulation Experiments OutDataFields Composition Components Realization OutDataFiles The 5 th GO-ESSP Workshop June 19-21 2006, LLNL CoupledModels

20 Data examples - 1 The 5 th GO-ESSP Workshop June 19-21 2006, LLNL Experiments

21 Data examples - 2 OutDataFields OutDataFiles The 5 th GO-ESSP Workshop June 19-21 2006, LLNL

22 Thanks!Questions?


Download ppt "GFDL Data Portal Update: Curator DB Approach S.Nikonov, V.Balaji, K.Dixon GFDL The 5 th GO-ESSP Workshop June 19-21 2006, LLNL."

Similar presentations


Ads by Google