British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

The German National Library of Science and Technology as a DOI RA 2007.
Access to non-textual information 2008 Jan Brase IDF Open Meeting: Resource Access for a Digital World June 17th, 2008, Brussels.
Introduction to DataCite Adam Farquhar PhD Head of Digital Library Technology, The British Library President, DataCite June 2010.
Frighteningly Sane or The first steps to Madness?.
Preservation, access and re-use of research data A Publishers perspective……and how we can help Joep Verheggen, Elsevier PARSE.insight workshop, Darmstadt,
Introduction to DataCite Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite June, 2010.
1 Working together to strengthen research in Europe Open access and preservation: how can knowledge sharing be improved in ERA? (session 1.5) Alma Swan.
Supporting education and research Repositories in Context Digital repositories as components of an integrated infrastructure for education Leona Carpenter.
UKOLN is supported by: Digital Repositories Roadmap: looking forward The JISC/CNI Meeting, July 2006 Rachel Heery Assistant Director R&D, UKOLN
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
Partnering for the future David MacArthur 31 October 2003 The British Library and FIL.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Versioning Requirements and Proposed Solutions CM Jones, JE Brace, PL Cave & DR Puplett OR nd April
Data citation from the perspective of a scholarly publisher Lyubomir Penev TDWG Data Citation Workshop, New Orleans, Oct 2011 ViBRANT.
IDENTIFIERS & THE DATA CITATION INDEX DISCOVERY, ACCESS, AND CITATION OF PUBLISHED RESEARCH DATA NIGEL ROBINSON 17 OCTOBER 2013.
Institutional repositories a bluffer’s guide. Academic libraries and archives  Cataloguing –Computerised catalogue databases (e.g. OPACS) –Networked.
DataCite: The International Data Citation Initiative Max Wilkinson Datasets Programme Manager, The British Library May 2011.
1. UKPMC ‘We exist for everyone who wants to do research – for academic, personal, or commercial purposes.’ - BL Strategy 2005/8.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Data Publishing Workflows: Strategies and Standards
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Digital Library Architecture and Technology
Management, marketing and population of repositories Morag Greig, University of Glasgow.
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Data Citation Mechanism and.
Challenges & opportunities in the preservation of (digital) information: the case of European research libraries Museo de las Ciencias Teatro de UNIVERSUM.
The Role of Abstract and Citation Databases in Supporting Data Repositories DataCite Workshop: Möglichkeiten und neue Lösungen im Forschungsdatenmanagement.
DataCite Canada Cyndie Found, CISTI Background : Who is CISTI, Definition of Data Research Data Management(RDM) – Benefits, Challenges Addressing.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
A centre of expertise in digital information management UKOLN is supported by: Monica Duke Project.
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
AIAA’s Publications Business Publications New Initiatives Subcommittee Wednesday, 9 January 2008 Rodger Williams.
Open access & visibility Management Digital Preservation ORA: Purposes.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Data Citation & Digital Object Identifiers DOIs. 2 DOIs for articles mints DOIs for Journal articles and some datasets.
Alasdair Ball Head of Collection Acquisition and Description The British Library 26 th April 2012 EDUG Symposium 2012 ‘Classification: Leveraging the power.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
SUPPORTING THE HE RESEARCHER Cardiff University 11 th May 2006 Janet Wilkinson Head of Higher Education “Helping people advance knowledge to enrich lives”
Collection Development in a Grey World Jennie Grimshaw and Elizabeth Newbold GL10 Conference, Amsterdam December 2008.
May 2, 2013 An introduction to DSpace. Module 1 – An Introduction By the end of this module, you will … Understand what DSpace is, and what it can be.
HEFCE/Higher Education Academy/JISC cc-by-sa (uk2.5) Image source – flickr (cc-by) OER and the Open Agenda Malcolm Read, Executive Secretary, JISC.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
Data Citation & Digital Object Identifiers DOIs. 2 Digital Object Identifiers 101 Persistent identifier Identifies intellectual property in the digital.
Bridging the gap between data centres and publishers J. Brase ICSTI Workshop “Interactive Publications and the Record of Science February 8th, 2010.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
Data Citation Implementation Pilot Workshop
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Living Knowledge: Networks in a digital age Liz White Head of Strategy Development April 2016 #livingknowledge.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Merit JISC Collections Merit: presentation for UKCORR Hugh Look, Project Director.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
OceanDocs Digital Repository of Marine Science Research Outputs
ACS 2016 Moving research forward with persistent identifiers
Linking persistent identifiers at the British Library
CNI Spring 2010 Membership Meeting
Data Management: Documentation & Metadata
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Research data in library catalogues and the joint initiative of European technical libraries for data registration Jan Brase Workshop Primary data for.
Presentation transcript:

British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson

2 Today’s Talk 1. The British Library 2. Data in scholarly communication 3. The problem with data 4. The Datasets Programme Vision Strategy Activity (DataCite) 5. Other Projects

3 The British Library Exists for everyone who wants to do research – for academic, personal, and commercial purposes. Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences… Receives a copy of every item published in the UK. Holds over 150 million items, with 3 million items added each year. Used by over 16,000 people each day (on site and online).

The British Library: some facts and figures Helping people advance knowledge to enrich lives GIA Funding 08/09: £94.8m operational, £12m capital Other funding secured 07/08: c.£33m National library of the UK. Serves researchers, business, libraries, education & the general public Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m) 3 main sites in London and Yorkshire. Circa 2,000 staff Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development Generates value to the UK economy each year of 4.4 times public funding Collection fills over 600km of shelving and grows at 11km per year 70 Tb of digital material through voluntary deposit British Library Act 1972 National centre for reference, study, bibliographical and other information services, in relation both to scientific and technological matters, and to the humanities. Science and Innovation Investment Framework , H.M. Treasury (2004) UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation. The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours

5 Who do we serve? The Researcher – We provide access to research level materials to all sectors including academia, industry, government, charities and NGOs. Business -The British Library also has a critical role supporting businesses of all sizes, from individual entrepreneurs through to major organisations. The Learner - We have an important role to play in supporting education from primary schools to developing future researchers of any age. The Library Community – We play a key role in supporting the wider UK Library Community and information network. The General Public - The services we offer include exhibitions and events, tours and web services which digitally showcase our collection.

6 Modern science relies on good data

7 Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record

8 The Foundation for Research Data is a crucial component of the scholarly record. Re-acquisition may be impossible Datasets are essential to the British Library’s mission to advance the World’s knowledge.

9 Current Situation No effective way to link between datasets and article; No widely used method to identify datasets; No widely used method to cite datasets.

10 As a result… Datasets are: Difficult to discover Difficult to access In danger of being lost

11 Difficult to Discover. Good luck finding the data! “Source: Committee on Climate Change”

12 Data are diverse in the Digital Landscape Seismic measurements taken by a geologist. An audio archive of birdsong created by an ornithologist. Genetic data collected by a medical researcher. A survey of public opinions collected by a sociologist.

13 Re-join the gap… (No) effective way to link between articles and datasets (No) widely used method to identify datasets (No) widely used method to cite datasets Articles Underlying data

14 Datasets – first class citizens? Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it or getting permission to use it (your discipline may vary) Source: UKRDS Study: The Data Imperative. Managing the UK’s research data for future use (Feb 2009)

15 Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record

16 Research training based on scholarly communication Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Rarely includes data

17 Scholarly communication requires intellectual exchanges Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record No such data fabric

18 Scholarly discourse requires a record and provenance Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Almost non-existent for data

19 The Datasets Programme We envision a future where researchers can: Discover, access, reuse, and reference datasets. Track the impact of the data that they generate and receive appropriate credit. Our approach is to: Provide a focus for the community to establish needs, requirements and agreement. Explore novel technology and creative solutions.

20 Two key concepts INCENTIVE SUSTAINABILITY

21 Projects and activities Follow us on twitter

22 A Key Component for Many Goals ? CiteReuseVerify Track Impact AccessFind Make Visible Persistent Identification

23 Citation using Digital Object Identifiers (DOIs) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA Article Citation G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, How to reference Published Article (Abstract or full text) The DOI system offers an easy, internet actionable way to connect the article with the underlying publication But a complete scholarly record would also link to the evidential datasets and their location, e.g. PANGAEA doi: /nature05431

24 doi: /nature05431 leads to a landing page

25 Digital Object Identifiers (DOIs) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles Connecting an Article with the Underlying Data Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi: /PANGAEA URIs are commonly used but can decay (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).  

26 doi: /PANGAEA

27 Dataset citation using Digital Object Identifiers (DOIs) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi: /PANGAEA Article G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, doi: /nature05431 Data Citation Scholarly record is complete

28 Projects – DataCite DataCite is an international consortium which aims to: Establish easier access to scientific research data on the Internet Increase acceptance of research data as legitimate, citable contributions to the scientific record Support data archiving that will permit results to be verified and re-purposed for future study.

29 DataCite Support researchers by enabling them to locate, identify, and cite research datasets with confidence Support data centres by providing persistent identifiers for datasets, workflows and standards for data publication Support publishers by enabling research articles to be linked to the underlying data DataCite : Data Centres :: CrossRef : Publishers

30 Digital Object Identifier (DOI) doi: / PrefixSuffix

31 DOI prefix doi: / PrefixSuffix The British Library provides data centres with a unique prefix for DataCite DOI For example, Archaeology Data Service uses

32 DOI suffix doi: / PrefixSuffix Suffix generated by the data centre Guidelines for DOI syntax are being developed

33 Resolving a DOI doi: / PrefixSuffix Resolving the DOI:

34 DOIs resolve to an open landing page

35 DataCite Service  Built a service for data centres to mint DOIs for datasets and store associated metadata (  British Library is trialling the service with several UK data centres, including:

36 Projects and activities

37 For more information on the BL Datasets Programme Max Wilkinson: Programme Manager; Datasets WebSite Follow us on twitter

38 Follow On slides

39 SageCite: Data citation in bioinformatics workflow Sage bionetworks data capture and analysis workflow (Tavenra: MyExperiemnt) Data Citation service integration points and recommendations Benefits analysis SageCite: Integration of data citation services into multi-contributor bio-informatics workflow. Establishing data attribution and credit mechanisms. ► INCENTIVE Sage Bionetworks: Aggregating datasets from contributors to create massive coherent datasets that can be used for systems level analysis of disease

40 Dryad UK: Repository sustainability Expand Publisher base Seamless integration into publisher workflow Sustainability models for datasets supplementary to publication Dryad UK: Define a business case and pilot service integrating DataCite DOIs and dataset archiving into publisher workflows ► SUSTAINABILITY Leveraging the Dryad Consortium, which is addressing the acquisition and storage of long tail supplementary data

41 Discovery Science Technology & Medicine Focussing on discovery services in the library’s integration engine Based on commissioned consultations  Data resources  Selection guidelines  Making available through library search facilities

42 Dataset Discovery Project

43 Access SSCR Focussing on streamlining access to established and high value data collections  Resource guides for datasets  Streamlining access to established data centres  Raising profiles of high impact datasets  E.g Olympics and 2011 census Also piloting dataset surfacing through the Libraries search facilities

44 Projects – British Atmospheric Data Centre British Atmospheric Data Centre (BADC): Natural Environment Research Council's designated data centre for the Atmospheric Sciences. Assists researchers to locate, access and interpret atmospheric data and ensures the long-term integrity of this data. A joint project is underway to improve the citability of BADC datasets Publications based on the data will underlie the 2013 International Panel on Climate Change (IPCC) Report.

45 Challenges to Explore Helping people to … Developing and sustaining… Providing a…

46 A combination of eight social and technical factors – ideally there would be: Personal attribution and credit for data publication An established mechanism for citation of datasets A generic minimum metadata standard for datasets A tool to permit the easy creation of well-structured metadata A standard mechanism for packaging data files and their metadata Appropriate repositories to archive and publish research datasets Reciprocal citation links between datasets and research articles Mechanisms for quality control of data publications