European Space Weather Week 3 Brussels, November 13-17, 2006 Atmospheric Data Management - A Challenge - Anne De Rudder and Sue Latham Rutherford Appleton.

Slides:



Advertisements
Similar presentations
Grey Literature, Institutional Repositories and the Organisational Context Simon Lambert, Brian Matthews & Catherine Jones Business & Information Technology.
Advertisements

Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
CLADDIER project fundamentals Citation, Location and Deposition in Discipline and Institutional Repositories Sam Pepler Project Manager BADC CLADDIER workshop,
Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
BADC Workshop 1: Data & Services from the BADC Royal Met. Soc. Conference – 12 September 2005 Kevin Marsh et al.
Environmental Information Data Centre: enabling the discovery of CEH-held data John Watkins Deputy Director EIDC.
02-Oct-2008 European Forum for GeoStatistics 2008 in Bled Concept for an Integrated Web Solution / an Infrastructure for Geostatistics (Subproject 3)
Clearing out your files PP Staff Development 10 November 2004 Anne Thompson Assistant Records Manager.
BADC Workshop 2: BADC Services to Data Suppliers Royal Met. Soc. Conference – 14 September 2005 Ag Stephens et al.
Data Management Planning Kerry Miller Digital Curation Centre University of Edinburgh DIY Research Data Management Training Kit for.
MEDIN Standards M. Charlesworth and the MEDIN Standards Working Group.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
2.2 Data Group Membership: Reeves (chair), Lloyd, Morse. Reports to SSC. This group will be responsible for preparation of the AMMA-UK Data Protocol, which.
NERC Data Grid Helen Snaith and the NDG consortium …
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
JRC's Open Access (OA) Policy G. P. Tartaglia, A. Annoni, G. Merlo, F
Bringing XBRL tax filing to the UK Jeff Smith, Customer Contact, Online Services,
August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
VO Sandpit, November 2009 Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan.
Data Management Development and Implementation: an example from the UK SLA Conference, Boston, June 2015 Geraldine Clement-Stoneham Knowledge and Information.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
CF Conventions Support at BADC Alison Pamment Roy Lowry (BODC)
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
CCSM DATA MANGEMENT POLICY The Community Climate System Model (CCSM) Data Management Policy documents the procedures for the management of model data produced.
I.Information Building & Retrieval Learning Objectives: the process of Information building the responsibilities and interaction of each data managing.
‘intelligent openness’ The common objective of an RCUK data policy Gregor McDonagh
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Introduction / Context.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Asia Pacific Regional Council OCLC Record Use Policy Some Recent Developments OCLC Asia Pacific Regional Council Meeting National Library of Australia.
Alison Pamment 1, Steve Donegan 1, Calum Byrom 2, Oliver Clements 3, Bryan Lawrence 1, Roy Lowry 3 1 NCAS/BADC, Science and Technology Facilities Council,
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
DOE Data Management Plan Requirements
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
NESC Worshop – 07 September 2005 Development of a Marine Metadata Standard Greg Reed Executive Officer Australian Ocean Data Centre Joint Facility.
SOLAS and the British Atmospheric Data Centre Charles Kilburn Anne De Rudder.
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Metadata V1 By Dick M.A. Schaap – technical coordinator Oostende, June 08.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
JCU Australian Marine Science Data Network.
Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1 SI O S Svalbard Integrated Arctic.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
TRSS Terminology Registry Scoping Study
INPE, São José dos Campos (SP), Brazil
Horizon 2020: Open data pilots and lessons learnt
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
2. An overview of SDMX (What is SDMX? Part I)
LO2 - Be Able to Design IT Systems to Meet Business Needs
Future Requirements of WIS Centres
School of Information Studies, Syracuse University, Syracuse, NY, USA
Data Management Components for a Research Data Archive
Presentation transcript:

European Space Weather Week 3 Brussels, November 13-17, 2006 Atmospheric Data Management - A Challenge - Anne De Rudder and Sue Latham Rutherford Appleton Laboratory, UK

European Space Weather Week 3 Brussels, November 13-17, 2006  ? In 2 or 3 decades, the universe of data has gone from… …to

European Space Weather Week 3 Brussels, November 13-17, 2006 One of the NERC designated Data Centres and a component of the NCAS Documented long-term data archive (currently about 130 catalogued datasets) About 8,000 registered users worldwide, among whom 3,000 have applied for access to specific datasets and 2,000 have downloaded data in the past year Data management in support to NERC research programmes, grants and facilities and occasionally to some international research projects Data are distributed via the web Assistance to users regarding atmospheric data issues (trajectories, online help desk, visualisation facilities, software, links, …) The BADC

European Space Weather Week 3 Brussels, November 13-17, 2006 Data policies – their purpose and implementation Model versus observation Metadata Citation and publication Data access networks (grids) Speaking the same language A few traps to beware of Contents

European Space Weather Week 3 Brussels, November 13-17, 2006 Aims Ensuring the swift exchange of knowledge within a research project. Ensuring that the newly acquired knowledge, or at least the material on which it relies, is kept for possible future reference, improvement and use and is made available to the community. Ensuring that the data is documented in a way that will allow long-term access to — and understanding of it Ensuring that researchers’ rights are not infringed on. Data policies Data management plans To implement the principles outlined in the data policy To plan how and when data will be generated, shared, stored within a project DMPs also include arrangements for the provision of supporting third- party data (e.g. met data from the UK MetOffice, provision of NRT data or forecasts to support field campaigns)

European Space Weather Week 3 Brussels, November 13-17, 2006 oa discussion forum oa way to work on common documents oa way to validate and format preliminary data Data policies To provide a long-term archive to the community: Regular backups on at least two supports and in two places Advertisement of the dataset (dataset catalogue, dataset “publication”) To ease the exchange of knowledge within the project: Submission schedule and deadlines taking into account the synergy between the different groups taking part in the project Common format (often seen as a devilish obstacle in our Excel times…) Provision of a workspace (e.g. BSCW) to be used as

European Space Weather Week 3 Brussels, November 13-17, 2006 as possible Data policies To ensure that this long-term archive can be read, interpreted and used: Use of a worldwide metadata standard (CF Convention) Use formats that allow the metadata to be attached to the data inseparably Documentation (metadata) should be as  specific  accurate  explicit  complete

European Space Weather Week 3 Brussels, November 13-17, 2006 Metadata To associate to a dataset key terms that will allow its discovery. To give all the information needed to read, understand, interpret the data. Metadata standards Integrate a terminology, recommendations on the metadata content and some format considerations The Climate Forecast Metadata Convention was developed for NetCDF but is largely applicable to information provided with any atmospheric data regardless of its format. Providing (good) metadata and conforming to metadata standards is a habit that still needs to be acquired…

European Space Weather Week 3 Brussels, November 13-17, 2006 In order to allow the researchers to be the first ones to analyse and publish their data, while at the same time ensuring some synergy between the different groups participating to the project During the project duration or for a certain period of time after the end of the project, access is restricted to the project participants… With exceptions for close collaborators or participants to associated projects This retention period ranges from 1 to …10 years! Password protected system Modalities of application and of access granting vary (e.g. consultation of PI, list of authorised users, etc.) … after which, the data is released to the public domain. Data policies Protecting researchers’ work and rights: Temporary restriction of access

Access to restricted data – Authorised Users Project participants Immediate availability On application External Collaborators (during retention period) Must apply for access Applications channelled through Project PI(s) External Collaborators Public Discovery metadata immediately visible Free access to the data after the retention period (sometimes, Conditions of Use continue to apply) European Space Weather Week 3 Brussels, November 13-17, 2006 Data policies

European Space Weather Week 3 Brussels, November 13-17, 2006 Protecting researchers’ work and rights: Conditions of use and publication Data policies Applying during the project and sometimes after it has ended Sometimes included in the data files, as a stamp Committing the user to respect rules such as oRestricting the use of the data to the research topic stated at the time of application oNot to disclose the data to other parties oContacting the data provider oAcknowledging the data provider oOffer co-authorship to the data provider

European Space Weather Week 3 Brussels, November 13-17, 2006  Research facility National programme International project Intercontinental initiative Data policies

European Space Weather Week 3 Brussels, November 13-17, 2006 (Quoted by David Stevenson, University of Edinburgh, at an UTLS Ozone Science Meeting) Model versus observation any output of model computation (e.g. simulations), datasets resulting from some kind of data assimilation technique, compilation of observations from different sources (synthesized datasets) Is there such a clear difference between the two things? Is processed or derived data observation or modelling? Is a programme “model data”? Nobody believes a modelling paper except the author. Everybody believes an observational paper… except the author. For the purpose of data management, Model data = … which have in common to be more likely or more quickly superseded by newer versions than observations are. They are also usually the end-product of project, while observations are a starting point for further analyses and studies.

European Space Weather Week 3 Brussels, November 13-17, 2006 BADC Guidelines for the Archival of Simulated Data oLikely future existence of a community of potential users. oHistorical, legal or scientific importance likely to persist. oThe results will be used in an intercomparison exercise. oIntegration of observation data in a way that adds value to the observations. oThe results have been the basis of a publication. oThe results have confirmed or led to some outstanding discovery. Model versus observation Codes archived only as metadata to support model output Datasets peer-reviewed at regular intervals (a few years) Criteria to select model runs to be archived for the long-term

European Space Weather Week 3 Brussels, November 13-17, 2006 Citation and publication Some projects gather together the worlds of librarians and data scientists, e.g. CLADDIER To investigate how datasets can be (better) versioned catalogued peer-reviewed referenced in papers published

European Space Weather Week 3 Brussels, November 13-17, 2006 Citation and publication

European Space Weather Week 3 Brussels, November 13-17, 2006 E-grids Networks linking several organisations with similar or complementary competences in such a way as to ensure their interoperability. E.g. network of data repositories, models and computers allowing the user to search and use these resources simultaneously and transparently. Issues: Transfer of information (balance between redundant storage and speed of transfer) Authentication (security and access) Format conversion Vocabulary (metadata standards)

European Space Weather Week 3 Brussels, November 13-17, 2006 E-grids

European Space Weather Week 3 Brussels, November 13-17, 2006 The NERC Data Grid (NDG) Project Infrastructure system to enable the discovery and retrieval of data held at distributed data centres via one single portal Partners: BADC, BODC, PCMDI (LLN) Security issues tackled through “role mapping”, i.e. definition of equivalent authorisations (avoiding the user the need to register with each organisation) A discovery metadatabase already exists based on MOLES = Metadata Objects for Links in Environmental Science Further we intend to make the connection between data held in managed archives and data held by individual research groups seamless in such a way that the same tools can be used to compare and manipulate data from both sources. What will be completely new will be the ability to compare and contrast data from an extensive range of (US, European, UK, NERC) datasets from within one specific context. E-grids

European Space Weather Week 3 Brussels, November 13-17, 2006 E-grids

European Space Weather Week 3 Brussels, November 13-17, 2006 Standard terminologies Speaking the same language Sets of terms of reference with, sometimes, unique identifiers (key values), definitions and version numbers System of relationships between terms (synonyms, inclusion, related terms) Underpin catalogues and search engines Ex.: GCMD, CF, SeaDataNet MOLES (Metadata Objects for Links in Environmental Science): The metadata scheme underpinning the NDG discovery tool (based on a set of XML records) and the next BADC catalogue (relational metadatabase) Developed in-house Integrates tentative mappings between GCMD, CF, SeaDataNet

European Space Weather Week 3 Brussels, November 13-17, 2006 Lessons learnt and traps to avoid Envisage the data policy at an early stage of a project proposal and in consideration of already running projects that may become associated or involved. Design and develop an open standard terminology with direct input from the researchers and carefully thought relationships between terms. Do not try to build a terminology that covers everything but focus on the vocabulary needed in your community. Resist the temptation of replacing tools (software, applications, conceptual tools) every time a new shiny one is launched on the market.