Managing data. Why do it?
What is “managing” data? Data – Accessible? – Enough storage? – Backed up? Metadata – Change history? – Link to research proposal/experimental design/analysis tools/published papers/...? – Will it be needed for 5, 10, 20 years/lawsuits?
I already know how to do it Yes, but.. have you considered moving: From the desktop to the (semantic!) web From small-scale to continental-scale From disciplinary to multi-disciplinary From a private collection to a public library From unfunded to funded From the academic realm to popular science
In other words.. data sharing Data sharing matters even if you do not share We all use shared data to some degree Data sharing is de rigueur in some disciplines Data sharing is legally required in some cases Data sharing is necessary for open science Data sharing is a minimum for collaborations
Benefits of data sharing Public benefits New research: new results from data reuse Reproducibility: access to provenance info Economies of scale: fund projects ones Private benefits Prestige: promote one’s work in new ways Career dev: count data as research output Efficiency: division of labour, streamline research
This requires changes to how data is managed Data need to be discoverable Data need to be standardised Data need to be put in context Data need to acknowledge contributors Data need to include use conditions Data need to link to other relevant resources (Data need to be peer-reviewed?)
Case study Environmental acoustics Emerging, inter-disciplinary research Environmental research using sound recordings of the environment Used to monitor, analyse the environment Recordings require sophisticated statistics No existing theory, method, tool or standard
Growing pains Started on a small scale Initially used to monitor individual sites Opportunistic deployment of recorders Ad hoc management of data Closed system, inaccessible data Serves the needs of a small group at QUT Competes with similar efforts elsewhere
A new opportunity Became part of the Australian supersite Network Data part of a long-term monitoring program Deploy recorders at continental scale Interest in integrating with other data sources Obligation to publicise and describe data Requires collaboration with partner sites Need for storage capacity and server software
From managing data to eResearch Develop better tools for science Two workshops to define data standards Build a dedicated acoustic repository Develop a library of acoustic data tools Embed recordings with environmental data Release data under Creative Commons Release platform as open source
Freebies! New research opportunities Approached by potential collaborators Partnerships with community groups (research + linkage) Special issue of Ecological Informatics Raise public profile of image Newspaper articles Radio interviews Documentary Access to in-kind support Funding of national deployment of equipment Access to existing infrastructure and services