1 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, 25-27 March 2009 Caveats, Versions, Quality and Documentation Specification Chris Perry.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

Configuration Management
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Sales Rep User Friendly Maintenance – with Zip Code.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
DT211/3 Internet Application Development
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
Catalog: Batch delete old Patron Records How to conduct global/batch updates to records – patron Adding Faculty and Patron/Student Records Manually Standardizing.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Accessible Word Document Training Microsoft Word 2010.
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata Format Access to Digital Documents © Adolf Knoll, National Library of the Czech Republic.
Exchange formats and APIs Questions – how and when to access metadata? – lifecycle/status – how to access? can things disappear? – is CSV enough? – is.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Sept 19,  Provides a common set of terminology and definitions  A framework for describing resources and processes  Enables computer based interoperability.
Virtual Interaction Manager
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Chapter 6 Server-side Programming: Java Servlets
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
AIP Backup & Restore Sunita Barve NCRA, Pune. AIP The latest version of DSpace 1.7.0, supports backup and restore of all its contents as a set of AIP.
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
ESIP & Geospatial One-Stop (GOS) Registering ESIP Products and Services with Geospatial One-Stop.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
MOOS SSDS Data Access Features A Discussion with MBARI’s Science Data Users.
Database Management Systems (DBMS)
CE Operating Systems Lecture 17 File systems – interface and implementation.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Accessible Word Document Creation Using Microsoft Word 2010.
Exmouth House 3–11 Pine Street London EC1R 0JH T F E W ASCE Master-class Configuration.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
CAA/CFA Meeting | CFA Team | ESAC | Octiber CFA Under Development CAA/CFA Meeting ESAC, Oct 11 th 2011 European Space AgencyCFA Team.
20 th CAA Cross-Calibration Workshop MPS, Göttingen, Germany Oct ESOC datasets.
METADATA ORGANISATION ESDS APPROACHES AND RESOURCES …………………………………………
CAA Database Overview Sinéad McCaffrey. Metadata ObservatoryExperiment Instrument Mission Dataset File.
21 th CAA Cross-Calibration Workshop Leiden, Netherlands24-26 March ESOC datasets.
Cluster Active Archive Status of DWP Data Activities Simon Walker, Keith Yearby, Michael Balikhin Automatic Control and Systems Engineering, University.
 Andersen Consulting 2000 MM03 - Master Data in Purchasing & Contract November, 2000.
20th CAA Cross Calibration Göttingen - October 15-16, 2014 Status of CIS Data Archival I. Dandouras, A. Barthe.
1 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 CAA Cross Cal Meeting Oct 2014 Pipeline Automation Chris Perry.
Storage and File Organization
Status Report of EDI on the CAA
Chapter 14: System Protection
CHP - 9 File Structures.
CAA-OR (End of Phase 1) CAA DWP Operations Review
Editing Your Website on SharePoint 2013
Software Documentation
MM03 - Master Data in Purchasing & Contract
SDMX Information Model
Updating GML datasets S-100 WG TSM September 2017
Open Archival Information System
USER MANUAL - WORLDSCINET
The ultimate in data organization
USER MANUAL - WORLDSCINET
Presentation transcript:

1 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 Caveats, Versions, Quality and Documentation Specification Chris Perry

2 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 METADATA The general concept is to have a standard way to describe all products in the CAA The level and detail of the description may be different (e.g. between CEF and non-CEF) The semantics are defined in the MDD For non-CEF products we are using CEF detached headers to describe products Need to work within the constrains of the MDD

3 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS The caveats provide a means to warn users about uncommon features or problems in the data The MDD supports specification of caveats at each hierarchical level within the data model FILE_CAVEATS, DATASET_CAVEATS etc There are some important considerations and limitations Handling and merging of metadata Need support for fine grain time specification

4 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Only metadata at FILE level can vary Merging of FILE metadata is done on delivery All other metadata applies for whole dataset Use of detached headers strongly encouraged

5 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS

6 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Example of merged file caveats

7 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS FILE_CAVEATS in the file should consist of processing information (e.g. s/w and cal info) If time varying metadata needs to be specified, this is done by providing a separate dataset The metadata entry is set to a fixed value that references the dataset * The reference can be any valid CAA dataset However we make some recommendations for caveats files

8 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Recommendation Caveat datasets should be in CEF format They should use the CQ (rather than CP) type Each record should contain an ISO time range plus the caveat information (e.g. a text string) The records may be overlapping but should be sorted on start time then stop time All the normal CAA/CEF formatting rules apply

9 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 CAVEATS Example (C1_CQ_RAP_CAVEATS)

10 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS The CAA MDD has two items intended for versioning. They are DATASET_VERSION file VERSION_NUMBER. Supplementary information are the DATASET_CAVEATS and FILE_CAVEATS for individual file information. The DATASET_VERSION information is specified as one or more lines of free- form text that the data provider can use in what they consider to be the most appropriate way to give visibility of the provenance of the processing giving rise to the data contained within a given file. Reprocessed data should generally have a new DATASET_VERSION and updated description in DATASET_CAVEATS. The file VERSION_NUMBER is an integer value that monotonically increases from low to high is used for configuration control of files ingested within the CAA system. This ensures that ingested products have a unique identifier allowing the provenance of individual files to be tracked.

11 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS On delivery the CAA system can select the most recent fragments of files to produce the most up-to-date time line. Overlaps make ingestion checks difficult so fixed intervals (e.g. day files) are preferred Only FILE_CAVEATS and DATASET_VERSION are merged other metadata is treated as static. The VERSION_NUMBER of the delivered file will be set to a six digit value corresponding to the yymmdd of the most recently ingested file.

12 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS DATASET_VERSION is merged on delivery

13 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 VERSIONS Keep dataset version short. E.g. Use ID and maintain running history in the static DATASET_CAVEATS

14 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 QUALITY CAA define a QUALITY metadata entry and a standard range 1 (poor) to 4 (excellent) and 0 for N/A) QUALITY is a parameter metadata item If a value is assigned in the metadata it applies for the whole dataset This not usually appropriate except for support data Instead specify a parameter name This provides per record values In theory could reference an alternate quality dataset using the “*” reference

15 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Currently CAA handles documents via lists on the web documentation pages Key documentation will continue to be made available in this way But also need to catalogue all docs for long-term archive This will follow the same scheme as for non-CEF products

16 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Detached CEF headers will be used to supply the static metadata. Documents will be assigned a data set ID and unique file ID. Documents will use the CD data type (see MDD). In many cases there will only be a single document within a dataset. Where there are many documents within a dataset (e.g. ESOC anomaly reports) a CSV file will hold the file varying metadata. All the usual metadata rules apply except, as with non-CEF products, no parameter metadata is supplied. If a time does not apply to the document, , will be used in the file ID as specified in the MDD.

17 CAA 2009 Cross Cal 9, Jesus College, Cambridge, UK, March 2009 DOCUMENTATION Keywords to help with document location can be included in the DATASET_DESCRIPTION metadata. It is intended that, where possible, the contents of the documents will be indexed to support simple text search. There may be a need to extend some of the MDD enumerated lists, please advise us if you cannot describe your documents with the current terms. A web service will be provided to access documentation based on the unique file ID Documents referenced within CEF files may be configured for automatic delivery (through the caveats delivery scheme).