VO Sandpit, November 2009 Tracking the impact of data – how? Sarah 1 st Altmetrics conference, London,

Slides:



Advertisements
Similar presentations
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Advertisements

Dr. Markus Quandt GESIS – Leibniz-Institute for the Social Sciences Workshop: Persistent Identifiers for the Social Sciences University Club, Bonn, February.
VO Sandpit, November 2009 Data Citation, Principles and Practice Sarah DataCite Annual Conference, 2014.
Open Data – one researcher’s experience Sarah
BADC Workshop 2: BADC Services to Data Suppliers Royal Met. Soc. Conference – 14 September 2005 Ag Stephens et al.
MAIN MESSAGE key reasons enumerated ->please read speaker notes Research. Report. Reposit. Deposit your scholarly research - it’s as easy as 1, 2, 3 id.
Queensland University of Technology CRICOS No J How can a Repository Contribute to University Success? APSR - The Successful Repository June 29,
VO Sandpit, November 2009 NERC Big Data And what’s in it for NCEO? June 2014 Victoria Bennett CEDA (Centre for Environmental Data Archival)
Data citation from the perspective of a scholarly publisher Lyubomir Penev TDWG Data Citation Workshop, New Orleans, Oct 2011 ViBRANT.
Highlights from the Open Access Timeline (1) 1971, Project Gutenberg launched on the Internet (originally as an FTP site). There are now 18,000 free books.
1 Using metrics to your advantage Fei Yu and Martin Cvelbar.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
New SpringerLink… ICSTI Conference, Moscow November 2010 Elwin Gardeur.
VO Sandpit, November 2009 Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Other responses: Librarian 3 Data Scientist/data manager/data analyst 7 Student/assistant 2 Writer/Editor/publications support 3 Programme Manager 1 Computer.
VO Sandpit, November 2009 Credit where it’s due: Data citation and publication in the geosciences Sarah
EPSRC expectations on research data: What researchers need to know 12/03/2015 Masud Khokhar and Hardy Schwamm.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
DataCite Canada Cyndie Found, CISTI Background : Who is CISTI, Definition of Data Research Data Management(RDM) – Benefits, Challenges Addressing.
Data Citation: the next big thing… ?!?! 1 Victoria University 20 Nov
Data citation in CSIRO Building a culture of data citation CSIRO INFORMATION MANAGEMENT & TECHNOLOGY Anne Stevenson | Research Data Services Support 26.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
Login / Upload / Share Deposit your scholarly research - it’s as easy as 1, 2, 3 MAIN MESSAGE key reasons enumerated ->please read speaker notes id / who.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits crib sheet Graham Parton With many thanks to Dr.
‘intelligent openness’ The common objective of an RCUK data policy Gregor McDonagh
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
1. 2 Rewards are real … but few (yet) 3 The citation benefit intensified over time... ...with publications from 2004 and 2005 cited 30 per cent more.
Connecting researchers and research organisations with data publication metrics the visibility and value of data publications in bibliometrics and altmetrics.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
RESEARCH DATA ALLIANCE BIBLIOMETRICS FOR DATA SURVEY RESULTS.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
What is data citation & why do we care? What’s been happening here and overseas? How ready are you for data citation? 1 Welcome! Image:
DOE Data Management Plan Requirements
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
Assessing the impact of software on science through bootstrapped learning in full texts Erjia Yan Metadata Mondays February 1, 2016.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
RDA-WDS Publishing Data IG Data Bibliometrics Working Group.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
Introduction to Research Data Management Joy Davidson and Sarah Jones Digital Curation Centre
GEOSS Future Products Workshop: Session 5 – Interoperability and Resource Discovery NOAA, Silver Spring, MD 27 March 2013 Moderator: Steve Browdy Rapporteur:
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Open Access: what you need to know This work is licensed under a Creative Commons Attribution 4.0 International License.This work is licensed under a Creative.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
The CODATA Vision on Data Publication and Data Citation Sarah ORCID: Nordic Workshop.
NRF Open Access Statement
Jeff Moon Data Librarian &
Measuring Scholarly and Public Impact: Let’s Talk Metrics
The OpenAIRE Catalogue of Services
Open Access and Research Data Management: An Overview for LLOs
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
Persistent Identifiers Implementation in EOSDIS
ACS 2016 Moving research forward with persistent identifiers
Find support in.
OpenML Workshop Eindhoven TU/e,
Research Data Management
Comparing your papers to the rest of the world
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Bird of Feather Session
Subject repositories Session 6.3
Persistent identifiers for instruments (PIDINST) working group
Supporting Open Research
Presentation transcript:

VO Sandpit, November 2009 Tracking the impact of data – how? Sarah 1 st Altmetrics conference, London, September 2014

VO Sandpit, November 2009 The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings. We deal with a variety of environmental measurements, along with the results of model simulations in: Atmospheric science Earth sciences Earth observation Marine Science Polar Science Terrestrial & freshwater science, Hydrology and Bioinformatics Space Weather Who are we and why do we care about data?

VO Sandpit, November 2009 OpenAIRE Portal 3 Develop an Open Access, participatory infrastructure for scientific information that includes: Publications Datasets Projects Interlinking

VO Sandpit, November 2009 Data, Reproducibility and Science Science should be reproducible – other people doing the same experiments in the same way should get the same results. Observational data is not reproducible (unless you have a time machine!) Therefore we need to have access to the data to confirm the science is valid! o/in/photostream/

VO Sandpit, November 2009 It used to be “easy”… Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665 The Scientific Papers of William Parsons, Third Earl of Rosse …but datasets have gotten so big, it’s not useful to publish them in hard copy anymore

VO Sandpit, November 2009 Hard copy of the Human Genome at the Wellcome Collection

VO Sandpit, November 2009 Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too. We want to reward researchers for putting that effort in!

VO Sandpit, November 2009

Most people have an idea of what a publication is

VO Sandpit, November 2009 Some examples of data (just from the Earth Sciences) 1.Time series, some still being updated e.g. meteorological measurements 2.Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer 3.2D scans e.g. satellite data, weather radar data 4.2D snapshots, e.g. cloud camera 5.Traces through a changing medium, e.g. radiosonde launches, aircraft flights, ocean salinity and temperature 6.Datasets consisting of data from multiple instruments as part of the same measurement campaign 7.Physical samples, e.g. fossils

VO Sandpit, November 2009 What is a Dataset? DataCite’s definition ( siness_Models_Principles_v1.0.pdf): siness_Models_Principles_v1.0.pdf Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). In my opinion a dataset is something that is: The result of a defined process Scientifically meaningful Well-defined (i.e. clear definition of what is in the dataset and what isn’t)

VO Sandpit, November 2009 What metrics do we use for our data?

VO Sandpit, November 2009 MetricBreakdown CEDA numbers Notes Number of discovery dataset records in the DCS QuarterlyNEODC 26 BADC 242 UKSSDC 11 Compliance with NERC data management policy. Reflects how many data sets NERC has. The number of dataset discovery records visible from the NERC data discovery service. Web site visits Quarterly BADC: 61,600 NEODC: 10,200 Active use and visibility of the data centre. Site visits from standard web log analysis systems, such as webaliser. Sensible web crawler filters should have been applied. Web site page views QuarterlyBADC: 219,900 NEODC: 25,800 See web visits notes. Queries closed this period Quarterly362 helpdesk queries 838 dataset applications Active use and visibility of the data centre. Queries marked as resolved within the quarter. A query is a request for information, a problem or ad hoc data request. Queries received in period Quarterly388 helpdesk queries 860 dataset applications Active use and visibility of the data centre. See closed query notes. Data centre metrics – produced 15th July 2014

VO Sandpit, November 2009 MetricBreakdownCEDA numbersNotes Percent queries dealt with in 3 working days Quarterly84.06 (11.57% resolved after 3 days) (10.23% resolved after 3 days) Queries receiving initial response within 1 working day Helpdesk % Dataset applications % Responsiveness. See closed query notes Identifiable users actively downloading NoneOver year to date: BADC: 4065 NEODC: 362 Use and visibility of the data centre. An estimate of the number of users using data access services over the year. Number of metadata records in data centre web site NoneBADC: 240 NEODC:33 INSPIRE compliance. Reflects how many data sets NERC has. Number of datasets available to view via the data centre web site None(Metric in development)INSPIRE compliance. Usable services. Number of datasets available to download via the data centre web site None (Metric in development)INSPIRE compliance. Usable services. Data centre metrics – produced 15th July 2014

VO Sandpit, November 2009 MetricBreakdownCEDA numbersNotes NERC funded Data centre staff (FTE) None14 (estimate for FY 14/15) Data management costs. Efficiency. Number of full time equivalent posts employed to perform data centre functions. Direct costs of Data Stewardship in data centre None (reportable at end of financial year) Data management costs. Efficiency. Cost to NERC Capital Expenditure directly related to Data Stewardship at data centre None(reportable at end financial year) Data management costs. Efficiency. Direct Receipts from Data Licenses and Sales None £0 (CEDA does not charge for data) Commercial value of data products and services Number of projects with Outline Data Management Plans None (Metric in development) Means of tracking projects’ adoption of good DM practice. Outline DMP is at proposal stage Number of projects with Full Data Management Plans None(Metric in development) Means of tracking projects’ adoption of good DM practice. Full DMP is at funded stage Users by area UK253461%Active use. Visibility of the data centre internationally. Percentage of user base in terms of geographical spread. Europe49412% Rest of the world % Unknown79 2% Users by institute typeUniversity293471%Active use. Visibility of the data centre sectorially. Percentage of users base in terms of the users host institute type. Government69417% NERC1604% Other2777% Commercial421% School351%

VO Sandpit, November 2009 Short answer: We don’t know!! Unless the data user comes back to us to tell us. Or we stumble across a paper which Cites us Or mentions us in a way that we can find And tells us what the dataset the authors used was. This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation. After the data is downloaded, what happens then?

VO Sandpit, November 2009 The Noble Eight-Fold Path to Citing Data 1.Importance 2.Credit and attribution 3.Evidence 4.Unique Identification 5.Access 6.Persistence 7.Specificity and verifiability 8.Interoperability and flexibility Principles are supplemented with a glossary, references and examples Principles are supplemented with a glossary, references and examples

VO Sandpit, November 2009 How we (NERC) cite data We using digital object identifiers (DOIs) as part of our dataset citation because: They are actionable, interoperable, persistent links for (digital) objects Scientists are already used to citing papers using DOIs (and they trust them) Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs. We have a good working relationship with the British Library and DataCite NERC’s guidance on citing data and assigning DOIs can be found at:

VO Sandpit, November 2009 Dataset catalogue page (and DOI landing page) Dataset citation Clickable link to Dataset in the archive

VO Sandpit, November 2009 Another example of a cited dataset

VO Sandpit, November

VO Sandpit, November 2009 Data metrics – the state of the art! Data citation isn’t common practice (unfortunately) Data citation counts don’t exist yet To count how often BADC data is used we have to: 1.Search Google Scholar for “BADC”, “British Atmospheric Data Centre” 2.Scan the results and weed out false positives 3.Read the papers to figure out what datasets the authors are talking about (if we can) 4.Count the mentions and citations (if any) We’re working with DataCite and Thompson Reuters to get data citation counts.

VO Sandpit, November 2009 Altmetrics and social media for data? Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers. We have a social media - Mainly used for announcements about service availability We definitely want ways of showing our funders that we provide a good service to our users and the research community. And we want to be able to tell our depositors what impact their data has had!

VO Sandpit, November 2009 RDA Bibliometrics for Data WG – preliminary survey results Launched 3 rd September As of 17 th September – 63 responses 100% completion Survey link still live bibliometrics_data bibliometrics_data Science 3 Earth sciences 16 Physics 4 Scientometrics and bibliometrics 4 Engineering 2 Chemistry 1 Biology (inc. zoology) 2 STEM 1 Medicine & biomedical research 8 Energy 1 Admin for research 2 Computer science 4 Social science, policy and economics 4 Librarian and digital curation 11

VO Sandpit, November 2009 Current use

VO Sandpit, November 2009 In the future, what would you like to use to evaluate the impact of data? Most popular suggestions: Data citations Actual use in professional practice Download statistics Mentions in social media DOIs/PIDs Altmetrics Well regarded indicators Also pleas for: Easy to use and set up Radically different tools Whatever tool can provide reliable information Best estimate of societal benefit in $$ terms What is currently missing and/or needs to be created for bibliometrics for data to become widely used? Most popular suggestions: Culture change! Principles and standards for consistent practice (and enforcement of these) Use of PIDs Mature tools for data citation, publishing, discovery and impact analysis Openness in papers and patents Also: Research on what current metrics actually measure Infrastructure Free apps Future and missing

VO Sandpit, November 2009 Please help! Survey link still live! bliometrics_data Please pass on the link to anyone who might be interested and encourage others to fill in the survey! Share your experience with altmetrics – join the RDA WG on Publishing Data Bibliometrics publishing-data-bibliometrics-wg.html Thank you! Sarah /379914/ Work funded by the European Commission as part of the project OpenAIREplus (FP7-INFRA , Grant Agreement no )