Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
Background Needed universal solution for processing tabular data sets (majority of IM work) Goals: Import from various data sources Standardize units, date formats, attribute names Assign metadata descriptors Validate/QAQC Generate statistical summaries, plots, maps Export to various data/metadata formats Support sub-setting & queries, super-setting (unions/joins) Support automation of all steps Automatically capture metadata throughout interactive processing
Background Developed Matlab data structure specification for storing data table tightly coupled with metadata Developed ‘Toolbox’ (function library) for working with data structures Many roles in GCE IS: Primary tool for acquisition, QAQC of data from monitoring network, PI submissions Data/metadata packaging (linked to RDMS) Data distribution (flexible formats) New Role: Automated harvesting/processing/QC/web posting of remote data stores (USGS, NOAA) and post-processing of CSI arrays downloaded via modem Began public distribution of toolbox in 2002 (primarily for end-user analysis of GCE data)
Toolbox Metadata Standard Full implementation of FLED (+ user- extensible content) Attribute-level metadata managed with data General documentation descriptors stored in simple array format (Category, Field, Value) – designed for pre-formatted metadata, but parseable/updateable Simple user-editable style definition tables used to produce formatted ASCII metadata
EML Differences Higher granularity Hierarchical structure (vs flatter 3-tier) Different delineation of semantic/numerical attribute descriptors (much overlap, but different philosophy) New unit dictionary requirements for validation contrary to units/unit conversion conventions (at odds with non-IM end-user focus of toolbox) XML-based (requires extra steps for presentation)
Strategy Short term: develop XSLT to convert EML (primarily dataset, entity, attribute) to ASCII headers for importing metadata along with data Medium term: switch to EML-oriented metadata schema (e.g. use similar arrays, but support direct eml schema mapping by using xpath syntax for category/field info) Long term: add support for direct caching of EML docs, include native xml routines for syncing metadata during processing (requires more users adopt latest Matlab version - R13)
Significance Allow IM community take full advantage of these tools/capabilities for their own site’s data with minimal re- mastering (EML + ASCII/Matlab table) Allow LTER IM community to showcase research- oriented, metadata-driven tools to bolster support for EML efforts immediately If full EML support achieved, could become a useful mechanism for automatically producing EML- documented/validated data sets (datalogging -> harvest -> process -> QC -> dataset+EML -> validation)