Download presentation
Presentation is loading. Please wait.
Published byArthur Brown Modified over 8 years ago
1
Data Formats, Flags and Vocabularies Roy Lowry British Oceanographic Data Centre SeaDataNet Training Course, Ostend, June 16-19, 2008
2
File Formats Available formats Format Selection Criteria Types of Data Delivery Use Case Issues SeaDataNet Profiling Objectives SeaDataNet Profiling Details
3
Available Formats Three format profiles are being developed for SeaDataNet data transfers SeaDataNet ODV Profile Simple ASCII format based on a spreadsheet model SeaDataNet MEDATLAS Profile Minor variation on an established ASCII format SeaDataNet CF NetCDF Profile Binary data conforming to API and content model based on an established community standard (CF)
4
Format Selection Criteria The $64,0000 question is “What format should I use for my data?” The answer depends on the type of data and on the data delivery use case
5
Types of Data Think of data in terms of ‘feature types’ Profiles (x, y, t effectively fixed: z varies) Bottle casts, CTDs, XBTs, radiosondes, core profiles Point series (x, y, z effectively fixed: t varies) Current meters, wave statistics, sea level, wind velocity Trajectories (x, y, z (sometimes), t all vary) Underway data (TSG, bathymetry, meteorology), undulator data, airborne measurements Grids (Two or more of x, y, z, t vary systematically ) Satellite data, model output, synthesised data products
6
Types of Data Most of our data may be modelled in terms of these feature types For example: CTD data –Modelled well by the ‘profile’ type Recording current meter data –Modelled well by the ‘point series’ type Moored ADCP –Modelled poorly by ‘point series’ type (needs to be considered as one point series per depth bin) –But is modelled well by ‘grid’ with z, t varying and x, y fixed
7
Delivery Use Case Issues Data exchange between consenting Mediterranean partners Data provider holds data in MEDATLAS format Data recipient wants data in MEDATLAS format Could be addressed using Nemo software to convert MEDATLAS to ODV profile
8
Delivery Use Case Issues Problems with this approach Recipient needs to do unnecessary work converting ODV to MEDATLAS Risk of information loss in the conversion process MEDATLAS is used by a significant proportion of the SeaDataNet community Consequently, the transaction system development overhead to support exchange in MEDATLAS format was considered worthwhile
9
Format Recommendations Mandatory formats Use ODV for Profiles Point series Trajectories (including underway ADCP) Use NetCDF for Grids Data that don’t fit comfortably into ODV due to shape or volume Data for use with NetCDF-enabled tools
10
Format Recommendations Optional format Use MEDATLAS for Whatever you use MEDATLAS for at the moment
11
SeaDataNet Profiling Objectives Two objectives Providing linkage between data and SeaDataNet metadata (CDI record) Standardising semantics Consistent labelling of parameters –Use terms from a controlled vocabulary (more on this later) Consistent labelling of storage units –Use terms from a controlled vocabulary –Parameter definition DOES NOT dictate storage unit
12
SeaDataNet ODV Profile Described in BSCW document (Word) https://www.ifremer.fr/bscw/bscw.cgi/d93460/Specific ation%20of%20SeaDataNet%20Data%20Transport%2 0Formats https://www.ifremer.fr/bscw/bscw.cgi/d93460/Specific ation%20of%20SeaDataNet%20Data%20Transport%2 0Formats Examples of profile, point series and trajectory data (Excel) https://www.ifremer.fr/bscw/bscw.cgi/d93465/Exampl es%20of%20SeaDataNet%20variant%20ODV%20spre adsheet-based%20import%20format https://www.ifremer.fr/bscw/bscw.cgi/d93465/Exampl es%20of%20SeaDataNet%20variant%20ODV%20spre adsheet-based%20import%20format
13
SeaDataNet ODV Profile ODV format based on a spreadsheet model with three types of row Comment row One cell with text starting with // Column header row Data row Column header and data rows have three types of column Metadata columns Primary variable data columns (value + flag) Data columns (value + flag pairs)
14
SeaDataNet ODV Profile SeaDataNet profile extensions CDI linkage Addition of two extra metadata columns (LOCAL_CDI_ID and EDMO_code) Semantic mapping Structured comment records immediately preceding the ODV column header record First record is ‘//SDN_parameter_mapping’ Followed by one mapping record for each data column in the file
15
SeaDataNet ODV Profile Mapping record example // SDN:LOCAL:Depth SD N:P011::ADEPZZ01 SDN:P061::ULA A –Subject element is the column heading text excluding ODV units field (e.g. ‘Depth’ for ‘Depth [m]’) –Object element is the SeaDataNet URN for the parameter (SDN:P011::ADEPZZ01) –Units element is the SeaDataNet URN for the data storage units (SDN:P061::ULAA) More about URNs and what we can do with them later…..
16
SeaDataNet ODV Profile SeaDataNet Metadata and Primary Variables Profile data Metadata (x,y,t) set to nominal profile position and time (same for every data value) Primary variable is the z co-ordinate (depth in metres or pressure in decibars) Point series data Metadata (x,y,t) set to the measurement location and series start time (same for every data value) Primary variable is the t co-ordinate (Chronological Julian Day - days elapsed since 00:00 on January 1 4713 BC) Trajectory data Metadata (x,y,t) set to measurement time and position Primary variable is the z co-ordinate (depth in metres or pressure in decibars)
17
SeaDataNet ODV Profile Watchpoints File extension should be.txt Field separator is the tab character (not semi-colon) Physical file mapping The format is capable of holding multiple SeaDataNet data objects in a single physical file The SeaDataNet 1 system CANNOT support this Means aggregation and splitting tools (or a lot of patience!) will be required (hardly rocket science)
18
SeaDataNet MEDATLAS Profile Those who want to use MEDATLAS know it better than me, so I’m not going to try and teach the format! The most important SeaDataNet extension is the link to CDI records, which is done by a pair of structured comment records for each SeaDataNet object thus: *EDMO_CODE = EDMO identifier of the data centre managing the CDI *LOCAL_CDI_ID = local identifier of the station
19
SeaDataNet MEDATLAS Profile We can also add standardised semantic mapping records as per ODV such as: * SDN:LOCAL:Temperature SDN:P 011::TEMPS901 SDN:P061::UPAA However, once the mapping between MEDATLAS parameter codes and P011 is completed, these become unnecessary
20
SeaDataNet CF NetCDF Profile This is VERY immature, so currently there is nothing to teach ASCII formats should be sufficient for most SeaDataNet 1 transactions Further work during the next 6 months Partners who feel they need NetCDF for their data should contact the Technical Task Team (Dick Schaap or Roy Lowry)
21
SeaDataNet Qualifying Flags What is a Qualifying Flag? SeaDataNet Flags Conflict resolution
22
What is a Qualifying Flag? Back in the mists of time (IODE in early 1980s?) it was decreed that all data values should be accompanied by a ‘flag’ in the form of a 1-byte code Built into many data format specifications (MEDATLAS, BODC PXF/QXF, GF3…) Initially thought of as a data quality label However, it provides the only metadata ‘hook’ that is unambiguously linked to a specific data value Consequently, it has suffered information overload carrying other information about non-quality issues We cannot correct this without major re-engineering of data held as files, which isn’t going to happen
23
SeaDataNet Flags Information overloading has led to two types of flag in SeaDataNet Quality Flags 0 – quality unknown 1 – good value (looks good and no reported problems) 2 – probably good value (associated with a known malfunction but looks OK) 3 – probably bad value (associated with a known malfunction but looks wrong) 4 – bad value (clearly wrong)
24
SeaDataNet Flags Information overloading has led to two types of flag in SeaDataNet Information flags 5 – changed value (during quality control) 6 – below detection (true value <quoted value) 7 – value in excess (true value >quoted value) 8 – interpolated value (special case of a changed value) 9 – missing value A – phenomenon uncertain (e.g. question over identification of biological specimen)
25
Conflict Resolution We can now see the problems caused by overloading How can we tell the difference between a ‘good changed value’ and a ‘bad changed value’? Simple answer is the we can’t. We can indicate the value was changed (flag 5), good (flag 1) or bad (flag 4) So we have to compromise…..
26
Conflict Resolution How do we compromise? By prioritising flag assignments Initially, all flags are set to 0, 9, 7, 6 or A (detection level and uncertainty information comes from the originator, not QA) Next we either interpolate or replace and flag appropriately (8 or 5) Finally we switch remaining zero flags to 1, 2, 3 or 4 as appropriate This is not ideal and we need to do better in SeaDataNet 2.
27
Vocabularies What are vocabularies and mappings? Vocabularies for Metadata Vocabularies for Data Vocabulary Access Vocabulary Maintenance
28
What is a Vocabulary? A vocabulary is a list of standardised terms used to populate a metadata field The SeaDataNet vocabulary model considers each such term to possess A key (permanent, semantically neutral (possibly a mnemonic) identifier for the term A term (full human-readable label) An abbreviation (short human-readable label) A definition (full explanation of the term’s meaning)
29
What is a Mapping? A mapping is a set of relationships between terms Each relationship consists of a subject term (sometimes called subject concept), a predicate and an object term The predicate gives the relationship ‘meaning’ Predicates may be simple to underpin something like a thesaurus (e.g. SKOS) exactMatch - synonyms narrowMatch – subject concept totally embraces the object concept broadMatch – subject concept is totally embraced by the object concept majorMatch – subject and object have a lot in common but some unique semantic elements minorMatch - subject and object have something in common but significant unique semantic elements
30
What is a Mapping? Predicates may also be semantically rich such as: hasUnits – links a parameter to a unit of measurement isMember – links a person to a group hasName – links a person to a label Mappings between defined entities with semantically rich predicates are what computer scientists call an ontology
31
Vocabularies for Metadata Many fields in SeaDataNet metadata are linked through the document schema to appropriate vocabularies These cover subject areas such as: Discovery parameters Instruments Platforms Geographic locations (e.g. ports, sea areas) Lists to be used are defined in the metadata guidance documentation. List references (e.g. P021) provide the key to vocabulary access information
32
Vocabularies for Data There are four vocabularies needed for data in SeaDataNet ‘Light’ Parameter Usage Vocabulary (P012) ‘The Full Monty’ Parameter Usage Vocabulary (P011) SeaDataNet flags (L201) Units Vocabulary (P061)
33
Vocabularies for Data ‘Light’ Parameter Usage Vocabulary (P012) Terms to describe parameters (i.e. column headings) Kept as pure (no methods) and as simple as possible Definitions available Mapped to MEDATLAS/GF3 extended terms Should be the first port of call for SeaDataNet data providers
34
Vocabularies for Data ‘Full’ Parameter Usage Vocabulary (P011) Comprehensive (nearly 20,000 terms) but can be hard to navigate Microsoft Access navigation tool used inside BODC could be made available on request True superset of P012, so all P012 URLs have an identical P011 equivalent Handling data files will be easier if P011 version is used in SeaDataNet data files Port of call if P012 fails to deliver
35
Vocabularies for Data SeaDataNet data qualifier flags (L201) The full list of the flags discussed previously Units Vocabulary (P061) Unlike MEDATLAS or the BODC internal system, SeaDataNet policy is to label a value with parameter and units INDEPENDENDLY The vocabulary is a standardised description of the units used, it does not dictate the units An aspiration is to develop units interconversion based on P061 terms
36
Vocabulary Access There are five ways to access the SeaDataNet vocabularies SeaDataNet Vocabulary Portal Term and list URLs HTTP-POX interface SOAP API BODC client interface But I’m only going to cover the first four as the portal should cover SeaDataNet needs
37
Vocabulary Access SeaDataNet Vocabulary Portal User input through a web form at http://seadatanet.maris2.nl/v_bodc_vocab/welcome.aspx http://seadatanet.maris2.nl/v_bodc_vocab/welcome.aspx Returns a human-readable table with key, term, abbreviation, definition and modification date columns Table may be exported as a semicolon-delimited ‘CSV’ ASCII file
38
Vocabulary Access Term and List URLs User input is a URL Returns an XML document based on the SKOS standard List documents include labels and definitions for all terms in the list Term documents include labels, definition and mappings for the term
39
Vocabulary Access URL syntax Namespace base (http://vocab.ndg.nerc.ac.uk/) ‘list’ or ‘term’ List identifier (e.g. P021) List version or ‘current’ Term identifier for term URL (e.g. TEMP) Examples List (SeaDataNet Parameter Discovery Vocabulary) http://vocab.ndg.nerc.ac.uk/list/P021/current/ http://vocab.ndg.nerc.ac.uk/list/P021/current/ Term (CF Standard Name for sea temperature) http://vocab.ndg.nerc.ac.uk/term/P071/current/CFSN0335 http://vocab.ndg.nerc.ac.uk/term/P071/current/CFSN0335
40
Vocabulary Access - SDN:P071:7:CFSN0335 sea_water_temperature 2008-02-26T10:02:57.564+0000
41
Vocabulary Access In SeaDataNet data and metadata we use URNs, not URLs (in case the server namespace changes) URN syntax is Namespace base (SDN) List identifier (e.g. P021) List version or null field for ‘current’ Term identifier (e.g. TEMP) For example the URL http://vocab.ndg.nerc.ac.uk/list/P021/current/TEMP is represented by the URN SDN:P021::TEMP http://vocab.ndg.nerc.ac.uk/list/P021/current/TEMP URN to URL conversion is simple string slicing
42
Vocabulary Access HTTP-POX API User input is a URL Returns an XML document based on a BODC-defined schema Provides access to List catalogue List contents (keys, terms, abbreviations, definitions, mappings) Mappings Plaintext searches across lists Term verification The API is documented at http://www.bodc.ac.uk/products/web_services/vocab/methods.html http://www.bodc.ac.uk/products/web_services/vocab/methods.html
43
Vocabulary Access SOAP API User input is a programmatic service call from Java, Perl, PHP, Python, etc. application Returns an XML document based on a BODC-defined schema Provides access to List catalogue List contents (keys, terms, abbreviations, definitions, mappings) Mappings Plaintext searches across lists Term verification The API is documented at http://www.bodc.ac.uk/products/web_services/vocab/methods.html http://www.bodc.ac.uk/products/web_services/vocab/methods.html The WSDL is available from http:// vocab.ndg.nerc.ac.uk/ http:// vocab.ndg.nerc.ac.uk/
44
Vocabulary Maintenance What if you can’t find the term you need? Initially contact the SeaDataNet help desk (sdn- userdesk@seadatanet.org)sdn- userdesk@seadatanet.org If they cannot resolve your problem they will pass the problem on to me I will endeavour to add new terms or identify appropriate existing terms Adding terms may involve discussions with vocabulary governance authorities This can take time (possibly 2-3 weeks) so please try to think ahead
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.