Curator: Gap Analysis (from a schema perspective) Rocky Dunlap Spencer Rugaber Georgia Tech
Deliverables What do we want the outcome to be... At the end of this session today At the end of this year of Curator 3-5 years from now Let’s think in terms of tangible products
Expected outcomes from Curator one year from now Schema side: A coherent, upper-level metadata model for describing modeling components, configured and un-configured models, model-generated datasets, and other software resources such as modeling frameworks But, what form does the product take? Software side (CDP-Curator): A prototype of an online repository of modeling resources, including a faceted search interface, download mechanism, and basic compatibility checking abilities
Expected outcome at the end of this session
Overview of Schemata NMM Component NMM Potential Model NMM Model Gridspec Component Input and Output (CIAO) (was PMIOD) NMM Application Framework ESG Ontology FMS Runtime Environment (FRE) XML
Other Schemata NetCDF + CF FLUME Metadata PRISM Schemata SMIOC (Specific Model Input Output Config.) SCC AD Legacy Code Markup Language (LCML)
Schemata Links SchemaSourceNotationDescriptionStatus Curator NMMCuratorXSDGeneralized NMM+ Component+ Potential Model+ Model+ Application+ CIAO+ Framework+ Gridspec+ NMM XSDModels+ Curator CompleteCuratorXSDComprehensive Curator schema+ FREGFDLXMLFMS Runtime Environement+ Curator DatabaseGFDLSQLModels and workflow (FRE++)+ ESGESG OntologyESGOWLDatasets? CDPCuratorCommunity Data Portal+ ESMF Component DBESMFSQLModels+ PRISM XSD PMIODPotential Model Input Output Description+ SCCHigh-level configuration- SMIOCModel run (fields)+ ADApplication Description Model Platform- FLUMEMET Ofc.?Flexible Unified Modeling Language- NetCDFNCARTableData format for gridded datasets? CFNCARTableStandard names for physical quantities? LCMLMITXSDLegacy Computing Markup Language- + Will be actively included in Curator - Will not be included as part of Curator ? Extent of incorporation into Curator unknown
NMM Component Describes a single modeling component – not necessarily a complete or runnable model Technical properties (compilers, platforms) Scientific properties (parameterizations) Numerical properties (grids, calendars) Interface (coupling), inputs and outputs
NMM Potential Model Composition of NMM Components that could form an executable model, but is not configured Basic info such as name, version, description, contact info, etc. At its most basic level: just a list of component identifiers Most “interesting” metadata still resides in NMM Component
NMM Model Describes a fully configured and executable model Set of configured components (e.g., actual values used during a model run, or that could be used to perform a run) Parameter settings, input files, output files, grid resolution Pointer back to NMM Potential Model
Gridspec A standard description of the grids used to discretize model output Proposal to add to CF Mosaic made up of tiles Tiles have properties: Regularity Uniformity Coordinate system Edges, vertices, angles, arc type Exchange grid for moving data between grid tiles
CIAO (Component Input and Output) Describes coupling fields supported/used by a component Basically: a list of fields, where each field has: Local name, standard name Units Data type I/O direction (Input, Output, In/Out) Field dependencies
NMM Application “describe an application which ‘bins’ or ‘wraps’ several models into one, for example an ensemble” (NMM Website: cms.nerc.ac.uk/NMM/content/category/6/22/51/) cms.nerc.ac.uk/NMM/content/category/6/22/51/
Framework Schema Description of a software framework used in Earth system modeling Only basics: name, version, etc.
Earth System Grid (ESG) Ontology Class hierarchy used by ESG Example classes: Activity ClimateModelExperiment, Ensemble, Simulation Grid LogicallyRectangular, PixelBased, Triangular, Unstruct. ModelType Atmopshere Resource Software, Dataset
FMS Runtime Environment XML XML configuration file used to assemble, compile, execute, and post-process an FMS model (used at GFDL) – workflow Directory locations (CVS, output, temp) Batch queues and commands Experiment descriptions (regression tests, production runs, run lengths, number of processors, platforms) Fortran namelists, parameter settings
UML Package Diagram
Conceptual Model of CDP-Curator RDF