The BADC-CSV Format Meeting user and metadata requirements Graham A Parton*, Sam J Pepler British Atmospheric Data Centre, Rutherford Appleton Laboratory,

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
1 Demystifying metadata Ann Chapman UKOLN University of Bath UKOLN is funded by Resource: The Council for Museums, Archives and Libraries, the Joint Information.
© Crown copyright Met Office E-AMDAR evaluation. Mark Smees & Tim Oakley, Met Office, May 2008.
BADC Workshop 1: Data & Services from the BADC Royal Met. Soc. Conference – 12 September 2005 Kevin Marsh et al.
Mapping Site Instruments. Introduction The Mapped Instrument is a tool that guides you in matching your site specific Instrument and materials to the.
A Draft Standard for the CF Metadata Conventions Cheryl Craig and Russ Rew UCAR.
10 th Argo data management 2009 Toulouse Argo format and CF compatibility OceanOBS09 Strengthen and enhance the international framework under GCOS, GOOS,
Data Formats: Using self-describing data formats Curt Tilmes NASA Version 1.0 Review Date.
Information Modelling MOLES Metadata Objects for Linking Environmental Sciences S. Ventouras Rutherford Appleton Laboratory.
BADC Workshop 2: BADC Services to Data Suppliers Royal Met. Soc. Conference – 14 September 2005 Ag Stephens et al.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Preparing CMOR for CMIP6 and other WCRP Projects
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
07/16/2007Dean User Guide for eCAFSlide 1 Dean’s User Guide for eCAF.
Tutorial 8 Sharing, Integrating and Analyzing Data
#PhUSE Standard Scripts Project Proposal for Qualification of Standard Scripts.
OASIS document rules Nigel Shaw Eurostep Limited.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
1 CF Unleashed: Introduction to Cf/Radial Joe VanAndel National Center for Atmospheric Research 2013/1/8 The National Center for Atmospheric.
Seiler Instrument January 7, 2013
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Bryan Lawrence on behalf of BADC, BODC, CCLRC, PML and SOC An Introduction to NDG concepts [ ]=
An introduction to MEDIN Data Guidelines. What MEDIN data guidelines are not… Protocols for collection methods Prescriptive of how you have to collect.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
Why do I want to know about HDF and HDF- EOS? Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA's primary format for standard data.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
NOCS, PML, STFC, BODC, BADC The NERC DataGrid = Bryan Lawrence Director of the STFC Centre for Environmental Data Archival (BADC, NEODC, IPCC-DDC.
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF.
National adaptations to main survey instruments and layout verification National Research Coordinators Meeting Windsor, June 2008.
0 eCPIC User Training: Resource Library These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
Chapter 17 Creating a Database.
The european ITM Task Force data structure F. Imbeaux.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
The CF Conventions: Options for Sustained Support Involving Unidata Russ Rew Unidata Policy Committee May 12, 2008.
Comparison of different output options from Stata
CESD 1 SAGES Scottish Alliance for Geoscience, Environment & Society The challenges of geo-simulation data Centre For Earth System Dynamics
Integrate, check and share documents Module 3.3. Integrate, check and share documents Module 3.3.
Alison Pamment 1, Steve Donegan 1, Calum Byrom 2, Oliver Clements 3, Bryan Lawrence 1, Roy Lowry 3 1 NCAS/BADC, Science and Technology Facilities Council,
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
NEFIS (WP5) Evaluation Meeting, November 2004 Evaluation Metadata Aljoscha Requardt, University of Hamburg Response rate: 93% (14 of 15 partners.
Non-standard ASCII to netCDF. CF Conventions REQUIRE Latitude Longitude Date/Time …for EVERY observation.
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
SOLAS and the British Atmospheric Data Centre Charles Kilburn Anne De Rudder.
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
METADATA ORGANISATION ESDS APPROACHES AND RESOURCES …………………………………………
An Introduction to the MEDIN Discovery Metadata Standard MEDIN Workshop NOC, Liverpool, Sept 2015.
The CF Conventions: Governance and Community Issues in Establishing Standards for Representing Climate, Forecast, and Observational Data Russ Rew 1, Bob.
Making FAAM Flights Discoverable
How to get started with RefWorks
GDWG Agenda Item: Tools for GRWG activities
An introduction to MEDIN Data Guidelines September 2016
How to get started with RefWorks
An introduction to MEDIN Data Guidelines.
Omeka for Digital Archives
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
Penn State Educational Programming Record (EPR) Guide
Standard Scripts Project 2
Standard Scripts Project 2
School of Information Studies, Syracuse University, Syracuse, NY, USA
Standard Scripts Project 2
Information system analysis and design
Presentation transcript:

The BADC-CSV Format Meeting user and metadata requirements Graham A Parton*, Sam J Pepler British Atmospheric Data Centre, Rutherford Appleton Laboratory, HSIC, Didcot, Oxfordshire, UK, OX11 0QZ * Introduction - the need for the format In 2007 the British Atmospheric Data Centre (BADC) undertook a user survey to determine the skill base within its user community. Results from the survey (figure 1) indicated: a high proportion of users are able to handle ASCII files (such as csv data) a high degree of familiarity with spreadsheet programmes such as Excel within the user community The BADC uses NASA-Ames format for ASCII data. However, the NASA-Ames format was devised primarily for aircraft observations, but can be adapted for many atmospheric observation data. Users find the NASA-Ames format to be complex and confusing, striping the header off and import the text file into Excel. The metadata is generally not used in its machine readable form, but is simply read by the researcher. Much effort is expended supporting data producers in the creation of NASA-Ames files. As it is complicated and can’t be done simply from spreadsheet packages like Excel. Metadata fields offered by NASA-Ames are fixed and inflexible, with desirable metadata elements being limited to the comments section. As a consequence a new ASCII format meeting the needs of supplier, data centre and end user was required. Figure 1 BADC survey results: top panel shows user familiarity with various format types, bottom panel shows user proficiency with various analysis tools (BADC User Survey, 2007). Format Requirements To meet the requirements of data suppliers, data centres and end user the following criteria were set: The format should be open source human readable recognisable by spreadsheet programmes (e.g. Excel, OpenOffice Calc) Easy to generate within spreadsheet and other common data processing software and scripting languages (e.g. IDL, matLab, Python) confirm to metadata conventions including CF, Dublin Core, NASA- Ames, I SO19115) checkable by some libraries for levels of compliance To meet these requirements a structured comma-separated-value format was developed. The format would contain : a designated metadata section flexibility for additional metadata elements a controlled list of metadata tags the data section Checks for compliance to common standards would also be set. The BADC-CSV format was generated. The format description document can be found in the CEDA Document Repository at: Compliance Submitted BADC-CSV files are checked for format compliance. All BADC-CSV files must adhere to the following levels of compliance: CSV: The file should conform to Excel dialect CSV file format. Structure: Data and Metadata sections exist Valid metadata: Metadata has right number of values and refers to legal objects. The controlled metadata list (see appendix of the format description document for details) allows further checks to be made on the files. Some metadata elements are compulsory, others are desirable. Thus, three levels of compliance result: Basic: Parameter names for all columns exist. This provides a file with the same information numbers and column headings. The basic structure of the file is correct. This level requires valid metadata. Complete: Mandatory metadata exists. Metadata should exist for some items. Requires basic compliance. Standardised: Metadata values for appropriate is from standard list. Requires complete compliance. References BADC User Survey 2007: BADC-CSV Format Description Document: Further Reading NASA-Ames Format: Gaines S. E. and Hipskind R. S., Format Specification for Data Exchange, version 1.3, 1998, netCDF format: The NetCDF Users Guide, CF Metadata: Eaton B, Gregory J, Drach B, Taylor K, Hankin S, Caron J, Signell R, Bentley P, Rappa G, CF metadata conventions: NetCDF Climate and Forecast (CF) Metadata Conventions, Version 1.4,2009 ISO19115: available from Metadata section Data Section File Type Identifier Figure 2: An example file (BADC-CSV Format Description Document) Structure of the BADC-CSV format The format contains three sections, (as in the example highlighted in figure 2): 1.File type identifier 2.Metadata section 3.Data File type identifier The first metadata line in the file should be the Conventions line. This aids recognising the file type. This is given as shown below to conform to the CF conventions and is the only metadata field that is capitalised. All others that follow this line are in lower case. Conventions,G,BADC-CSV, Metadata section The all metadata entries are of the format:,, [,, …] is a metadata tag which may be an item form the list of controlled metadata items or may be one generated by the user. is the column reference to which the metadata applies. “G” indicates that the metadata applies globally. This allows reference to variables and the data as found in NetCDF., … one or more comma separated values associated with the metadata element. For readability metadata tags can be repeated on subsequent lines. Data section Consists of a record with a single “data” entry, followed by a line of the column references, the data records and a terminating “end data” entry. The “end data” element permits partial file flagging. data end data Conventions,G,BADC-CSV,1 title, G, My data file creator, G, G Parton, CEDA contributor, G, Sam Pepler, BADC creator, met_temp, S Aylingby, CEDA variable_name, time, time, days since variable_name, temp, air temperature variable_name, met_temp, met station air temperature creator, met_temp, unknown,Met Office comments, met_temp, measured using a thermometer comments, met_temp, the instrument materials comments, met_temp, field details the main comments, met_temp, material of the instrument comments, met_temp, only instrument_materials, met_temp, glass and mercury coordinate_variable,1, X location_name, G, Rutherford Appleton Lab data time, temp, met_temp 0.8, 2.4, , 3.4, , 3.5, , 6.7, , 5.7, 5.8 end data Figure 3: Plot of surface temperatures from SYNOP messages stored at the BADC in the BADC-CSV format BADC-CSV file in use. The BADC already stores observational data from the UK Met Office’s MetDB system as BADC-CSV formatted data, including land SYNOP messages. The dataset was used to carry out a field-trial for generating an entire dataset in the BADC-CSV format, using commonly available data preparation tools: Excel and Python. The metadata elements were prepared within Excel, while Python scripting handles the incoming ASCII files, sorts the data and outputs as BADC-CSV files. To further field-test the dataset sample plots of the data (see figure 3 for an example) were generated using IDL. Publication quality plots were able to be prepared with only a couple of hours, including preparing scripts to read in the BADC-CSV formatted data.