1 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 Caveats, Versions, Quality and Documentation Specification Chris Perry
2 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 METADATA The general concept is to have a standard way to describe all products in the CAA The level and detail of the description may be different (e.g. between CEF and non-CEF) The semantics are defined in the MDD For non-CEF products we are using CEF detached headers to describe products Need to work within the constrains of the MDD
3 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS The caveats provide a means to warn users about uncommon features or problems in the data The MDD supports specification of caveats at each hierarchical level within the data model FILE_CAVEATS, DATASET_CAVEATS etc There are some important considerations and limitations Handling and merging of metadata Need support for fine grain time specification
4 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Only metadata at FILE level can vary Merging of FILE metadata is done on delivery All other metadata applies for whole dataset Use of detached headers strongly encouraged
5 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS
6 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Example of merged file caveats
7 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS FILE_CAVEATS in the file should consist of processing information (e.g. s/w and cal info) If time varying metadata needs to be specified, this is done by providing a separate dataset The metadata entry is set to a fixed value that references the dataset * The reference can be any valid CAA dataset However we make some recommendations for caveats files
8 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Recommendation Caveat datasets should be in CEF format They should use the CQ (rather than CP) type Each record should contain an ISO time range plus the caveat information (e.g. a text string) The records may be overlapping but should be sorted on start time then stop time All the normal CAA/CEF formatting rules apply
9 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Example (C1_CQ_RAP_CAVEATS)
10 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS The CAA MDD has two items intended for versioning. They are DATASET_VERSION file VERSION_NUMBER. Supplementary information are the DATASET_CAVEATS and FILE_CAVEATS for individual file information. The DATASET_VERSION information is specified as one or more lines of free- form text that the data provider can use in what they consider to be the most appropriate way to give visibility of the provenance of the processing giving rise to the data contained within a given file. Reprocessed data should generally have a new DATASET_VERSION and updated description in DATASET_CAVEATS. The file VERSION_NUMBER is an integer value that monotonically increases from low to high is used for configuration control of files ingested within the CAA system. This ensures that ingested products have a unique identifier allowing the provenance of individual files to be tracked.
11 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS On delivery the CAA system can select the most recent fragments of files to produce the most up-to-date time line. Overlaps make ingestion checks difficult so fixed intervals (e.g. day files) are preferred Only FILE_CAVEATS and DATASET_VERSION are merged other metadata is treated as static. The VERSION_NUMBER of the delivered file will be set to a six digit value corresponding to the yymmdd of the most recently ingested file.
12 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS DATASET_VERSION is merged on delivery
13 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS Keep dataset version short. E.g. Use ID and maintain running history in the static DATASET_CAVEATS
14 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 QUALITY CAA define a QUALITY metadata entry and a standard range 1 (poor) to 4 (excellent) and 0 for N/A) QUALITY is a parameter metadata item If a value is assigned in the metadata it applies for the whole dataset This not usually appropriate except for support data Instead specify a parameter name This provides per record values In theory could reference an alternate quality dataset using the “*” reference
15 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Currently CAA handles documents via lists on the web documentation pages Key documentation will continue to be made available in this way But also need to catalogue all docs for long-term archive This will follow the same scheme as for non-CEF products
16 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Detached CEF headers will be used to supply the static metadata. Documents will be assigned a data set ID and unique file ID. Documents will use the CD data type (see MDD). In many cases there will only be a single document within a dataset. Where there are many documents within a dataset (e.g. ESOC anomaly reports) a CSV file will hold the file varying metadata. All the usual metadata rules apply except, as with non-CEF products, no parameter metadata is supplied. If a time does not apply to the document, , will be used in the file ID as specified in the MDD.
17 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Keywords to help with document location can be included in the DATASET_DESCRIPTION metadata. It is intended that, where possible, the contents of the documents will be indexed to support simple text search. There may be a need to extend some of the MDD enumerated lists, please advise us if you cannot describe your documents with the current terms. A web service will be provided to access documentation based on the unique file ID Documents referenced within CEF files may be configured for automatic delivery (through the caveats delivery scheme).