Data Formats, Flags and Vocabularies Roy Lowry British Oceanographic Data Centre SeaDataNet Training Course, Ostend, June 16-19, 2008.

Slides:



Advertisements
Similar presentations
OBSERVATIONS & PRÉVISIONS CÔTIÈRES Training Workshop – Ostende – October 2009 REFORMATTING TOOLS Med2MedSDN and NEMO M. Fichaut.
Advertisements

ECOOP Data Management System (T2.2/WP2) Declan Dunne 13 th February 2008, Athens.
Online data access in the SeaDataNet V1 system by Dick M.A. Schaap – technical coordinator Bologna, September 08.
A Semantic Modelling Approach to Biological Parameter Interoperability Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat,
Roy Lowry British Oceanographic Data Centre. Presentation Overview What are NetCDF and CF? SeaDataNet profiling of CF1.6 Interoperability Parameter Naming.
NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.
Roy Lowry Adam Leadbetter British Oceanographic Data Centre.
The BODC Parameter Markup and Usage Vocabulary Semantic Model Roy Lowry British Oceanographic Data Centre GO-ESSP Meeting, RAL, June 2005.
Training course – Ostende – 2-6 July 2012 Training Workshop – Ostende – 1-4 March 2010 Practical work on NEMO Converting files M. Fichaut.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
1 NODC, Russia GISC & DCPC developers meeting Langen, 29 – 31 March E2EDM technology implementation for WIS GISC development S. Sukhonosov, S. Belov.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
NERC DataGrid Vocabulary Governance Vocabulary Workshop, RAL, February 25, 2009.
Demonstration of adding content to an ICAN Semantic Resource Roy Lowry, Adam Leadbetter, Olly Clements (NETMAR - BODC) Tanya Haddad (ICAN - OCA)
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Quality Control Standards for SeaDataNet Review status at 1 st Annual Meeting (March 2007) Review developments over last year Current status Future work.
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Ocean Data View - Available Data Collections and Data Model.
EDMED and EDIOS Roy Lowry, Karen Vickers (Technical) Lesley Rickards, Liz Bradshaw (Content) British Oceanographic Data Centre.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
The NERC DataGrid Vocabulary Server Roy Lowry British Oceanographic Data Centre Ontology Registry Meeting.
The NERC DataGrid Vocabulary Server: an operational system with distributed ontology potential Roy Lowry British Oceanographic Data Centre GO-ESSP 2008,
SeaDataNet Ontology Use Case Roy Lowry British Oceanographic Data Centre Coastal Atlas Interoperability Workshop, Corvallis, July (+ Lessons.
Internet Skills An Introduction to HTML Alan Noble Room 504 Tel: (44562 internal)
Status of upgrading CDI service (user interface, harvesting via GeoNetwork, CDI interoperability options following SeaDataNet D8.7) By Dick M.A. Schaap.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
1 NODC, Russia SeaDataNet TTG meeting Paris, May Overview and potential use of E2EDM technology for SeaDataNet Sergey Belov, Nick Mikhailov.
OBSERVATIONS & PRÉVISIONS CÔTIÈRES 3 rd SeaDataNet training course – Ostende – June 2008 NEMO reformatting tool v1 M. Fichaut.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
7 th TTG meeting – Trieste – 3-4 March, 2015 SeaDataNet tools, new developments IFREMER : M. Fichaut, A. Briand, M. Larour, T. Loubrieu, V. Tosello ALTRAN.
Bryan Lawrence on behalf of BADC, BODC, CCLRC, PML and SOC An Introduction to NDG concepts [ ]=
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
CF Conventions Support at BADC Alison Pamment Roy Lowry (BODC)
NERC DataGrid Vocabulary Server Access Vocabulary Workshop, RAL, February 25, 2009.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
Common Data Index CDI V1 How to proceed By Dick M.A. Schaap – technical coordinator Madrid, March 09.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
NERC DataGrid NERC DataGrid Vocabulary Server Use Cases Vocabulary Workshop, RAL, February 25, 2009.
Chapter 17 Creating a Database.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
EMODnet Chemistry 2 Service Contract MARE/2012/10 S Progress of the CDI service By Dick M.A. Schaap – Technical Coordinator Istanbul – Turkey,
Chapter 10 Intro to SOAP and WSDL. Objectives By study in the chapter, you will be able to: Describe what is SOAP Exam the rules for creating a SOAP document.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
OBSERVATIONS & PRÉVISIONS CÔTIÈRES SeaDataNet annual meeting, Madrid, March 2009 How to prepare data for integration in SeaDataNet.
Chemical lot - HOW: Infrastructure set up based on SeaDataNet V1 efficient distributed Marine Data Management Infrastructure; Principle of “ADOPTED AND.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Files Tutor: You will need ….
1 G52IWS: Web Services Description Language (WSDL) Chris Greenhalgh
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
Download Manager software Training Workshop Ostend, Belgium, 20 th May 2014 D.M.A. Schaap - Technical Coordinator.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
2 nd training course – Ostende – May, 2014 What’s new in NEMO 1.5?
Roy Lowry British Oceanographic Data Centre.  Controlled Vocabularies - What and Why  Controlled Vocabularies - History  Controlled Vocabularies -
3rd Training Workshop June 2008, Ostende Management of CSR Anne Che-Bohnenstengel, BSH  Metadata Formats  Defined Vocabularies  Content Management.
Metadata V1 By Dick M.A. Schaap – technical coordinator Oostende, June 08.
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Data Quality Control and Visualization with Ocean Data View 4.
MIKADO – Generation of ISO – SeaDataNet metadata files
SeaDataNet tools NEMO, OCTOPUS, MIKADO
NEMO – Reformating tool
GML in CDI and CSR ISO using Ends&Bends
Practical work on NEMO Converting files
OCTOPUS – SeaDataNet Format conversion tool
MIKADO: Generation of CDI ISO19139 XML files
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
Vanessa Tosello (IFREMER), Flavian Gheorghe (MARIS)
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Tools for generation of SDN NetCDF (CFPOINT) files
Fundamentals of Data Structures
Tutorial 7 – Integrating Access With the Web and With Other Programs
Practical work on NetCDF - CFPOINT
Presentation transcript:

Data Formats, Flags and Vocabularies Roy Lowry British Oceanographic Data Centre SeaDataNet Training Course, Ostend, June 16-19, 2008

File Formats  Available formats  Format Selection Criteria  Types of Data  Delivery Use Case Issues  SeaDataNet Profiling Objectives  SeaDataNet Profiling Details

Available Formats  Three format profiles are being developed for SeaDataNet data transfers  SeaDataNet ODV Profile  Simple ASCII format based on a spreadsheet model  SeaDataNet MEDATLAS Profile  Minor variation on an established ASCII format  SeaDataNet CF NetCDF Profile  Binary data conforming to API and content model based on an established community standard (CF)

Format Selection Criteria  The $64,0000 question is “What format should I use for my data?”  The answer depends on the type of data and on the data delivery use case

Types of Data  Think of data in terms of ‘feature types’  Profiles (x, y, t effectively fixed: z varies)  Bottle casts, CTDs, XBTs, radiosondes, core profiles  Point series (x, y, z effectively fixed: t varies)  Current meters, wave statistics, sea level, wind velocity  Trajectories (x, y, z (sometimes), t all vary)  Underway data (TSG, bathymetry, meteorology), undulator data, airborne measurements  Grids (Two or more of x, y, z, t vary systematically )  Satellite data, model output, synthesised data products

Types of Data  Most of our data may be modelled in terms of these feature types  For example:  CTD data –Modelled well by the ‘profile’ type  Recording current meter data –Modelled well by the ‘point series’ type  Moored ADCP –Modelled poorly by ‘point series’ type (needs to be considered as one point series per depth bin) –But is modelled well by ‘grid’ with z, t varying and x, y fixed

Delivery Use Case Issues  Data exchange between consenting Mediterranean partners  Data provider holds data in MEDATLAS format  Data recipient wants data in MEDATLAS format  Could be addressed using Nemo software to convert MEDATLAS to ODV profile

Delivery Use Case Issues  Problems with this approach  Recipient needs to do unnecessary work converting ODV to MEDATLAS  Risk of information loss in the conversion process  MEDATLAS is used by a significant proportion of the SeaDataNet community  Consequently, the transaction system development overhead to support exchange in MEDATLAS format was considered worthwhile

Format Recommendations  Mandatory formats  Use ODV for  Profiles  Point series  Trajectories (including underway ADCP)  Use NetCDF for  Grids  Data that don’t fit comfortably into ODV due to shape or volume  Data for use with NetCDF-enabled tools

Format Recommendations  Optional format  Use MEDATLAS for  Whatever you use MEDATLAS for at the moment

SeaDataNet Profiling Objectives  Two objectives  Providing linkage between data and SeaDataNet metadata (CDI record)  Standardising semantics  Consistent labelling of parameters –Use terms from a controlled vocabulary (more on this later)  Consistent labelling of storage units –Use terms from a controlled vocabulary –Parameter definition DOES NOT dictate storage unit

SeaDataNet ODV Profile  Described in BSCW document (Word)  ation%20of%20SeaDataNet%20Data%20Transport%2 0Formats ation%20of%20SeaDataNet%20Data%20Transport%2 0Formats  Examples of profile, point series and trajectory data (Excel)  es%20of%20SeaDataNet%20variant%20ODV%20spre adsheet-based%20import%20format es%20of%20SeaDataNet%20variant%20ODV%20spre adsheet-based%20import%20format

SeaDataNet ODV Profile  ODV format based on a spreadsheet model with three types of row  Comment row  One cell with text starting with //  Column header row  Data row  Column header and data rows have three types of column  Metadata columns  Primary variable data columns (value + flag)  Data columns (value + flag pairs)

SeaDataNet ODV Profile  SeaDataNet profile extensions  CDI linkage  Addition of two extra metadata columns (LOCAL_CDI_ID and EDMO_code)  Semantic mapping  Structured comment records immediately preceding the ODV column header record  First record is ‘//SDN_parameter_mapping’  Followed by one mapping record for each data column in the file

SeaDataNet ODV Profile  Mapping record example  // SDN:LOCAL:Depth SD N:P011::ADEPZZ01 SDN:P061::ULA A –Subject element is the column heading text excluding ODV units field (e.g. ‘Depth’ for ‘Depth [m]’) –Object element is the SeaDataNet URN for the parameter (SDN:P011::ADEPZZ01) –Units element is the SeaDataNet URN for the data storage units (SDN:P061::ULAA)  More about URNs and what we can do with them later…..

SeaDataNet ODV Profile  SeaDataNet Metadata and Primary Variables  Profile data  Metadata (x,y,t) set to nominal profile position and time (same for every data value)  Primary variable is the z co-ordinate (depth in metres or pressure in decibars)  Point series data  Metadata (x,y,t) set to the measurement location and series start time (same for every data value)  Primary variable is the t co-ordinate (Chronological Julian Day - days elapsed since 00:00 on January BC)  Trajectory data  Metadata (x,y,t) set to measurement time and position  Primary variable is the z co-ordinate (depth in metres or pressure in decibars)

SeaDataNet ODV Profile  Watchpoints  File extension should be.txt  Field separator is the tab character (not semi-colon)  Physical file mapping  The format is capable of holding multiple SeaDataNet data objects in a single physical file  The SeaDataNet 1 system CANNOT support this  Means aggregation and splitting tools (or a lot of patience!) will be required (hardly rocket science)

SeaDataNet MEDATLAS Profile  Those who want to use MEDATLAS know it better than me, so I’m not going to try and teach the format!  The most important SeaDataNet extension is the link to CDI records, which is done by a pair of structured comment records for each SeaDataNet object thus:  *EDMO_CODE = EDMO identifier of the data centre managing the CDI  *LOCAL_CDI_ID = local identifier of the station

SeaDataNet MEDATLAS Profile  We can also add standardised semantic mapping records as per ODV such as:  * SDN:LOCAL:Temperature SDN:P 011::TEMPS901 SDN:P061::UPAA  However, once the mapping between MEDATLAS parameter codes and P011 is completed, these become unnecessary

SeaDataNet CF NetCDF Profile  This is VERY immature, so currently there is nothing to teach  ASCII formats should be sufficient for most SeaDataNet 1 transactions  Further work during the next 6 months  Partners who feel they need NetCDF for their data should contact the Technical Task Team (Dick Schaap or Roy Lowry)

SeaDataNet Qualifying Flags  What is a Qualifying Flag?  SeaDataNet Flags  Conflict resolution

What is a Qualifying Flag?  Back in the mists of time (IODE in early 1980s?) it was decreed that all data values should be accompanied by a ‘flag’ in the form of a 1-byte code  Built into many data format specifications (MEDATLAS, BODC PXF/QXF, GF3…)  Initially thought of as a data quality label  However, it provides the only metadata ‘hook’ that is unambiguously linked to a specific data value  Consequently, it has suffered information overload carrying other information about non-quality issues  We cannot correct this without major re-engineering of data held as files, which isn’t going to happen

SeaDataNet Flags  Information overloading has led to two types of flag in SeaDataNet  Quality Flags  0 – quality unknown  1 – good value (looks good and no reported problems)  2 – probably good value (associated with a known malfunction but looks OK)  3 – probably bad value (associated with a known malfunction but looks wrong)  4 – bad value (clearly wrong)

SeaDataNet Flags  Information overloading has led to two types of flag in SeaDataNet  Information flags  5 – changed value (during quality control)  6 – below detection (true value <quoted value)  7 – value in excess (true value >quoted value)  8 – interpolated value (special case of a changed value)  9 – missing value  A – phenomenon uncertain (e.g. question over identification of biological specimen)

Conflict Resolution  We can now see the problems caused by overloading  How can we tell the difference between a ‘good changed value’ and a ‘bad changed value’?  Simple answer is the we can’t. We can indicate the value was changed (flag 5), good (flag 1) or bad (flag 4)  So we have to compromise…..

Conflict Resolution  How do we compromise?  By prioritising flag assignments  Initially, all flags are set to 0, 9, 7, 6 or A (detection level and uncertainty information comes from the originator, not QA)  Next we either interpolate or replace and flag appropriately (8 or 5)  Finally we switch remaining zero flags to 1, 2, 3 or 4 as appropriate  This is not ideal and we need to do better in SeaDataNet 2.

Vocabularies  What are vocabularies and mappings?  Vocabularies for Metadata  Vocabularies for Data  Vocabulary Access  Vocabulary Maintenance

What is a Vocabulary?  A vocabulary is a list of standardised terms used to populate a metadata field  The SeaDataNet vocabulary model considers each such term to possess  A key (permanent, semantically neutral (possibly a mnemonic) identifier for the term  A term (full human-readable label)  An abbreviation (short human-readable label)  A definition (full explanation of the term’s meaning)

What is a Mapping?  A mapping is a set of relationships between terms  Each relationship consists of a subject term (sometimes called subject concept), a predicate and an object term  The predicate gives the relationship ‘meaning’  Predicates may be simple to underpin something like a thesaurus (e.g. SKOS)  exactMatch - synonyms  narrowMatch – subject concept totally embraces the object concept  broadMatch – subject concept is totally embraced by the object concept  majorMatch – subject and object have a lot in common but some unique semantic elements  minorMatch - subject and object have something in common but significant unique semantic elements

What is a Mapping?  Predicates may also be semantically rich such as:  hasUnits – links a parameter to a unit of measurement  isMember – links a person to a group  hasName – links a person to a label  Mappings between defined entities with semantically rich predicates are what computer scientists call an ontology

Vocabularies for Metadata  Many fields in SeaDataNet metadata are linked through the document schema to appropriate vocabularies  These cover subject areas such as:  Discovery parameters  Instruments  Platforms  Geographic locations (e.g. ports, sea areas)  Lists to be used are defined in the metadata guidance documentation.  List references (e.g. P021) provide the key to vocabulary access information

Vocabularies for Data  There are four vocabularies needed for data in SeaDataNet  ‘Light’ Parameter Usage Vocabulary (P012)  ‘The Full Monty’ Parameter Usage Vocabulary (P011)  SeaDataNet flags (L201)  Units Vocabulary (P061)

Vocabularies for Data  ‘Light’ Parameter Usage Vocabulary (P012)  Terms to describe parameters (i.e. column headings)  Kept as pure (no methods) and as simple as possible  Definitions available  Mapped to MEDATLAS/GF3 extended terms  Should be the first port of call for SeaDataNet data providers

Vocabularies for Data  ‘Full’ Parameter Usage Vocabulary (P011)  Comprehensive (nearly 20,000 terms) but can be hard to navigate  Microsoft Access navigation tool used inside BODC could be made available on request  True superset of P012, so all P012 URLs have an identical P011 equivalent  Handling data files will be easier if P011 version is used in SeaDataNet data files  Port of call if P012 fails to deliver

Vocabularies for Data  SeaDataNet data qualifier flags (L201)  The full list of the flags discussed previously  Units Vocabulary (P061)  Unlike MEDATLAS or the BODC internal system, SeaDataNet policy is to label a value with parameter and units INDEPENDENDLY  The vocabulary is a standardised description of the units used, it does not dictate the units  An aspiration is to develop units interconversion based on P061 terms

Vocabulary Access  There are five ways to access the SeaDataNet vocabularies  SeaDataNet Vocabulary Portal  Term and list URLs  HTTP-POX interface  SOAP API  BODC client interface  But I’m only going to cover the first four as the portal should cover SeaDataNet needs

Vocabulary Access  SeaDataNet Vocabulary Portal  User input through a web form at  Returns a human-readable table with key, term, abbreviation, definition and modification date columns  Table may be exported as a semicolon-delimited ‘CSV’ ASCII file

Vocabulary Access  Term and List URLs  User input is a URL  Returns an XML document based on the SKOS standard  List documents include labels and definitions for all terms in the list  Term documents include labels, definition and mappings for the term

Vocabulary Access  URL syntax  Namespace base (  ‘list’ or ‘term’  List identifier (e.g. P021)  List version or ‘current’  Term identifier for term URL (e.g. TEMP)  Examples  List (SeaDataNet Parameter Discovery Vocabulary)   Term (CF Standard Name for sea temperature) 

Vocabulary Access - SDN:P071:7:CFSN0335 sea_water_temperature T10:02:

Vocabulary Access  In SeaDataNet data and metadata we use URNs, not URLs (in case the server namespace changes)  URN syntax is  Namespace base (SDN)  List identifier (e.g. P021)  List version or null field for ‘current’  Term identifier (e.g. TEMP)  For example the URL is represented by the URN SDN:P021::TEMP  URN to URL conversion is simple string slicing

Vocabulary Access HTTP-POX API  User input is a URL  Returns an XML document based on a BODC-defined schema  Provides access to  List catalogue  List contents (keys, terms, abbreviations, definitions, mappings)  Mappings  Plaintext searches across lists  Term verification  The API is documented at

Vocabulary Access  SOAP API  User input is a programmatic service call from Java, Perl, PHP, Python, etc. application  Returns an XML document based on a BODC-defined schema  Provides access to  List catalogue  List contents (keys, terms, abbreviations, definitions, mappings)  Mappings  Plaintext searches across lists  Term verification  The API is documented at  The WSDL is available from vocab.ndg.nerc.ac.uk/ vocab.ndg.nerc.ac.uk/

Vocabulary Maintenance  What if you can’t find the term you need?  Initially contact the SeaDataNet help desk (sdn-  If they cannot resolve your problem they will pass the problem on to me  I will endeavour to add new terms or identify appropriate existing terms  Adding terms may involve discussions with vocabulary governance authorities  This can take time (possibly 2-3 weeks) so please try to think ahead