Download presentation
Presentation is loading. Please wait.
Published byShonda Reed Modified over 9 years ago
1
VO Sandpit, November 2009 Tracking the impact of data – how? Sarah Callaghan sarah.callaghan@stfc.ac.uk @sorcha_ni 1 st Altmetrics conference, London, 25-26 September 2014
2
VO Sandpit, November 2009 The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings. We deal with a variety of environmental measurements, along with the results of model simulations in: Atmospheric science Earth sciences Earth observation Marine Science Polar Science Terrestrial & freshwater science, Hydrology and Bioinformatics Space Weather Who are we and why do we care about data?
3
VO Sandpit, November 2009 OpenAIRE Portal 3 www.openaire.eu Develop an Open Access, participatory infrastructure for scientific information that includes: Publications Datasets Projects Interlinking
4
VO Sandpit, November 2009 Data, Reproducibility and Science Science should be reproducible – other people doing the same experiments in the same way should get the same results. Observational data is not reproducible (unless you have a time machine!) Therefore we need to have access to the data to confirm the science is valid! http://www.flickr.com/photos/31333486@N00/1893012324/sizes/ o/in/photostream/
5
VO Sandpit, November 2009 It used to be “easy”… Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665 The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867 …but datasets have gotten so big, it’s not useful to publish them in hard copy anymore
6
VO Sandpit, November 2009 Hard copy of the Human Genome at the Wellcome Collection
7
VO Sandpit, November 2009 Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too. We want to reward researchers for putting that effort in!
8
VO Sandpit, November 2009
9
Most people have an idea of what a publication is
10
VO Sandpit, November 2009 Some examples of data (just from the Earth Sciences) 1.Time series, some still being updated e.g. meteorological measurements 2.Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer 3.2D scans e.g. satellite data, weather radar data 4.2D snapshots, e.g. cloud camera 5.Traces through a changing medium, e.g. radiosonde launches, aircraft flights, ocean salinity and temperature 6.Datasets consisting of data from multiple instruments as part of the same measurement campaign 7.Physical samples, e.g. fossils
11
VO Sandpit, November 2009 What is a Dataset? DataCite’s definition (http://www.datacite.org/sites/default/files/Bu siness_Models_Principles_v1.0.pdf):http://www.datacite.org/sites/default/files/Bu siness_Models_Principles_v1.0.pdf Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). In my opinion a dataset is something that is: The result of a defined process Scientifically meaningful Well-defined (i.e. clear definition of what is in the dataset and what isn’t)
12
VO Sandpit, November 2009 What metrics do we use for our data?
13
VO Sandpit, November 2009 MetricBreakdown CEDA numbers Notes Number of discovery dataset records in the DCS QuarterlyNEODC 26 BADC 242 UKSSDC 11 Compliance with NERC data management policy. Reflects how many data sets NERC has. The number of dataset discovery records visible from the NERC data discovery service. Web site visits Quarterly BADC: 61,600 NEODC: 10,200 Active use and visibility of the data centre. Site visits from standard web log analysis systems, such as webaliser. Sensible web crawler filters should have been applied. Web site page views QuarterlyBADC: 219,900 NEODC: 25,800 See web visits notes. Queries closed this period Quarterly362 helpdesk queries 838 dataset applications Active use and visibility of the data centre. Queries marked as resolved within the quarter. A query is a request for information, a problem or ad hoc data request. Queries received in period Quarterly388 helpdesk queries 860 dataset applications Active use and visibility of the data centre. See closed query notes. Data centre metrics – produced 15th July 2014
14
VO Sandpit, November 2009 MetricBreakdownCEDA numbersNotes Percent queries dealt with in 3 working days Quarterly84.06 (11.57% resolved after 3 days) 87.67 (10.23% resolved after 3 days) Queries receiving initial response within 1 working day Helpdesk - 93.57 % Dataset applications - 97.91% Responsiveness. See closed query notes Identifiable users actively downloading NoneOver year to date: BADC: 4065 NEODC: 362 Use and visibility of the data centre. An estimate of the number of users using data access services over the year. Number of metadata records in data centre web site NoneBADC: 240 NEODC:33 INSPIRE compliance. Reflects how many data sets NERC has. Number of datasets available to view via the data centre web site None(Metric in development)INSPIRE compliance. Usable services. Number of datasets available to download via the data centre web site None (Metric in development)INSPIRE compliance. Usable services. Data centre metrics – produced 15th July 2014
15
VO Sandpit, November 2009 MetricBreakdownCEDA numbersNotes NERC funded Data centre staff (FTE) None14 (estimate for FY 14/15) Data management costs. Efficiency. Number of full time equivalent posts employed to perform data centre functions. Direct costs of Data Stewardship in data centre None (reportable at end of financial year) Data management costs. Efficiency. Cost to NERC Capital Expenditure directly related to Data Stewardship at data centre None(reportable at end financial year) Data management costs. Efficiency. Direct Receipts from Data Licenses and Sales None £0 (CEDA does not charge for data) Commercial value of data products and services Number of projects with Outline Data Management Plans None (Metric in development) Means of tracking projects’ adoption of good DM practice. Outline DMP is at proposal stage Number of projects with Full Data Management Plans None(Metric in development) Means of tracking projects’ adoption of good DM practice. Full DMP is at funded stage Users by area UK253461%Active use. Visibility of the data centre internationally. Percentage of user base in terms of geographical spread. Europe49412% Rest of the world 102425% Unknown79 2% Users by institute typeUniversity293471%Active use. Visibility of the data centre sectorially. Percentage of users base in terms of the users host institute type. Government69417% NERC1604% Other2777% Commercial421% School351%
16
VO Sandpit, November 2009 Short answer: We don’t know!! Unless the data user comes back to us to tell us. Or we stumble across a paper which Cites us Or mentions us in a way that we can find And tells us what the dataset the authors used was. This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation. After the data is downloaded, what happens then?
17
VO Sandpit, November 2009 The Noble Eight-Fold Path to Citing Data 1.Importance 2.Credit and attribution 3.Evidence 4.Unique Identification 5.Access 6.Persistence 7.Specificity and verifiability 8.Interoperability and flexibility Principles are supplemented with a glossary, references and examples http://force11.org/datacitation Principles are supplemented with a glossary, references and examples http://force11.org/datacitation
18
VO Sandpit, November 2009 How we (NERC) cite data We using digital object identifiers (DOIs) as part of our dataset citation because: They are actionable, interoperable, persistent links for (digital) objects Scientists are already used to citing papers using DOIs (and they trust them) Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs. We have a good working relationship with the British Library and DataCite NERC’s guidance on citing data and assigning DOIs can be found at: http://www.nerc.ac.uk/research/sites/data/doi.asp http://www.nerc.ac.uk/research/sites/data/doi.asp
19
VO Sandpit, November 2009 Dataset catalogue page (and DOI landing page) Dataset citation Clickable link to Dataset in the archive
20
VO Sandpit, November 2009 Another example of a cited dataset
21
VO Sandpit, November 2009 http://www.charme.org.uk/
22
VO Sandpit, November 2009 Data metrics – the state of the art! Data citation isn’t common practice (unfortunately) Data citation counts don’t exist yet To count how often BADC data is used we have to: 1.Search Google Scholar for “BADC”, “British Atmospheric Data Centre” 2.Scan the results and weed out false positives 3.Read the papers to figure out what datasets the authors are talking about (if we can) 4.Count the mentions and citations (if any) http://www.lol-cat.org/little-lovely-lolcat-and-big-work/ We’re working with DataCite and Thompson Reuters to get data citation counts.
23
VO Sandpit, November 2009 Altmetrics and social media for data? Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers. We have a social media presence @CEDAnews - Mainly used for announcements about service availability We definitely want ways of showing our funders that we provide a good service to our users and the research community. And we want to be able to tell our depositors what impact their data has had!
24
VO Sandpit, November 2009 RDA Bibliometrics for Data WG – preliminary survey results Launched 3 rd September As of 17 th September – 63 responses 100% completion Survey link still live https://www.surveymonkey.com/s/RDA_ bibliometrics_data https://www.surveymonkey.com/s/RDA_ bibliometrics_data Science 3 Earth sciences 16 Physics 4 Scientometrics and bibliometrics 4 Engineering 2 Chemistry 1 Biology (inc. zoology) 2 STEM 1 Medicine & biomedical research 8 Energy 1 Admin for research 2 Computer science 4 Social science, policy and economics 4 Librarian and digital curation 11
25
VO Sandpit, November 2009 Current use
26
VO Sandpit, November 2009 In the future, what would you like to use to evaluate the impact of data? Most popular suggestions: Data citations Actual use in professional practice Download statistics Mentions in social media DOIs/PIDs Altmetrics Well regarded indicators Also pleas for: Easy to use and set up Radically different tools Whatever tool can provide reliable information Best estimate of societal benefit in $$ terms What is currently missing and/or needs to be created for bibliometrics for data to become widely used? Most popular suggestions: Culture change! Principles and standards for consistent practice (and enforcement of these) Use of PIDs Mature tools for data citation, publishing, discovery and impact analysis Openness in papers and patents Also: Research on what current metrics actually measure Infrastructure Free apps Future and missing
27
VO Sandpit, November 2009 Please help! Survey link still live! https://www.surveymonkey.com/s/RDA_bi bliometrics_data Please pass on the link to anyone who might be interested and encourage others to fill in the survey! Share your experience with altmetrics – join the RDA WG on Publishing Data Bibliometrics https://rd-alliance.org/group/rdawds- publishing-data-bibliometrics-wg.html Thank you! Sarah Callaghan sarah.callaghan@stfc.ac.uk @sorcha_ni http://weknowmemes.com/generator/meme /379914/ Work funded by the European Commission as part of the project OpenAIREplus (FP7-INFRA-2011- 2, Grant Agreement no. 283595)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.