Presentation is loading. Please wait.

Presentation is loading. Please wait.

British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson.

Similar presentations


Presentation on theme: "British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson."— Presentation transcript:

1 British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson

2 2 Today’s Talk 1. The British Library 2. Data in scholarly communication 3. The problem with data 4. The Datasets Programme Vision Strategy Activity (DataCite) 5. Other Projects

3 3 The British Library Exists for everyone who wants to do research – for academic, personal, and commercial purposes. Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences… Receives a copy of every item published in the UK. Holds over 150 million items, with 3 million items added each year. Used by over 16,000 people each day (on site and online).

4 The British Library: some facts and figures Helping people advance knowledge to enrich lives GIA Funding 08/09: £94.8m operational, £12m capital Other funding secured 07/08: c.£33m National library of the UK. Serves researchers, business, libraries, education & the general public Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m) 3 main sites in London and Yorkshire. Circa 2,000 staff Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development Generates value to the UK economy each year of 4.4 times public funding Collection fills over 600km of shelving and grows at 11km per year 70 Tb of digital material through voluntary deposit British Library Act 1972 National centre for reference, study, bibliographical and other information services, in relation both to scientific and technological matters, and to the humanities. Science and Innovation Investment Framework 2004-2014, H.M. Treasury (2004) UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation. The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours

5 5 Who do we serve? The Researcher – We provide access to research level materials to all sectors including academia, industry, government, charities and NGOs. Business -The British Library also has a critical role supporting businesses of all sizes, from individual entrepreneurs through to major organisations. The Learner - We have an important role to play in supporting education from primary schools to developing future researchers of any age. The Library Community – We play a key role in supporting the wider UK Library Community and information network. The General Public - The services we offer include exhibitions and events, tours and web services which digitally showcase our collection.

6 6 Modern science relies on good data

7 7 Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record

8 8 The Foundation for Research Data is a crucial component of the scholarly record. Re-acquisition may be impossible Datasets are essential to the British Library’s mission to advance the World’s knowledge.

9 9 Current Situation No effective way to link between datasets and article; No widely used method to identify datasets; No widely used method to cite datasets.

10 10 As a result… Datasets are: Difficult to discover Difficult to access In danger of being lost

11 11 Difficult to Discover. Good luck finding the data! “Source: Committee on Climate Change”

12 12 Data are diverse in the Digital Landscape Seismic measurements taken by a geologist. An audio archive of birdsong created by an ornithologist. Genetic data collected by a medical researcher. A survey of public opinions collected by a sociologist.

13 13 Re-join the gap… (No) effective way to link between articles and datasets (No) widely used method to identify datasets (No) widely used method to cite datasets Articles Underlying data

14 14 Datasets – first class citizens? Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it or getting permission to use it (your discipline may vary) Source: UKRDS Study: The Data Imperative. Managing the UK’s research data for future use (Feb 2009)

15 15 Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record

16 16 Research training based on scholarly communication Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Rarely includes data

17 17 Scholarly communication requires intellectual exchanges Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record No such data fabric

18 18 Scholarly discourse requires a record and provenance Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Almost non-existent for data

19 19 The Datasets Programme We envision a future where researchers can: Discover, access, reuse, and reference datasets. Track the impact of the data that they generate and receive appropriate credit. Our approach is to: Provide a focus for the community to establish needs, requirements and agreement. Explore novel technology and creative solutions.

20 20 Two key concepts INCENTIVE SUSTAINABILITY

21 21 Projects and activities www.bl.uk/datasets Follow us on twitter @datasetsBL@datasetsBL

22 22 A Key Component for Many Goals ? CiteReuseVerify Track Impact AccessFind Make Visible Persistent Identification

23 23 Citation using Digital Object Identifiers (DOIs) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA Article Citation G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, 74-77 How to reference Published Article (Abstract or full text) The DOI system offers an easy, internet actionable way to connect the article with the underlying publication But a complete scholarly record would also link to the evidential datasets and their location, e.g. PANGAEA doi:10.1038/nature05431

24 24 doi:10.1038/nature05431 leads to a landing page

25 25 Digital Object Identifiers (DOIs) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles Connecting an Article with the Underlying Data Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi:10.1594/PANGAEA.587840 URIs are commonly used but can decay (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).  

26 26 doi:10.1594/PANGAEA.587840

27 27 Dataset citation using Digital Object Identifiers (DOIs) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi:10.1594/PANGAEA.587840 Article G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, 74-77 doi:10.1038/nature05431 Data Citation Scholarly record is complete

28 28 Projects – DataCite DataCite is an international consortium which aims to: Establish easier access to scientific research data on the Internet Increase acceptance of research data as legitimate, citable contributions to the scientific record Support data archiving that will permit results to be verified and re-purposed for future study.

29 29 DataCite Support researchers by enabling them to locate, identify, and cite research datasets with confidence Support data centres by providing persistent identifiers for datasets, workflows and standards for data publication Support publishers by enabling research articles to be linked to the underlying data DataCite : Data Centres :: CrossRef : Publishers

30 30 Digital Object Identifier (DOI) doi:10.4124 / 0003.569 PrefixSuffix

31 31 DOI prefix doi:10.4124/0003.569 PrefixSuffix The British Library provides data centres with a unique prefix for DataCite DOI For example, Archaeology Data Service uses 10.5284

32 32 DOI suffix doi:10.4124/0003.569 PrefixSuffix Suffix generated by the data centre Guidelines for DOI syntax are being developed

33 33 Resolving a DOI doi:10.4124/0003.569 PrefixSuffix Resolving the DOI: http://dx.doi.org/10.4124/0003.569

34 34 DOIs resolve to an open landing page

35 35 DataCite Service  Built a service for data centres to mint DOIs for datasets and store associated metadata (http://api.datacite.org)http://api.datacite.org  British Library is trialling the service with several UK data centres, including:

36 36 Projects and activities www.bl.uk/datasets

37 37 For more information on the BL Datasets Programme Max Wilkinson: Programme Manager; Datasets Email:max.wilkinson@bl.ukmax.wilkinson@bl.uk Email: datasets@bl.ukdatasets@bl.uk WebSite www.bl.uk/datasetswww.bl.uk/datasets Follow us on twitter @datasetsBL@datasetsBL

38 38 Follow On slides

39 39 SageCite: Data citation in bioinformatics workflow Sage bionetworks data capture and analysis workflow (Tavenra: MyExperiemnt) Data Citation service integration points and recommendations Benefits analysis SageCite: Integration of data citation services into multi-contributor bio-informatics workflow. Establishing data attribution and credit mechanisms. ► INCENTIVE Sage Bionetworks: Aggregating datasets from contributors to create massive coherent datasets that can be used for systems level analysis of disease

40 40 Dryad UK: Repository sustainability Expand Publisher base Seamless integration into publisher workflow Sustainability models for datasets supplementary to publication Dryad UK: Define a business case and pilot service integrating DataCite DOIs and dataset archiving into publisher workflows ► SUSTAINABILITY Leveraging the Dryad Consortium, which is addressing the acquisition and storage of long tail supplementary data

41 41 Discovery Science Technology & Medicine Focussing on discovery services in the library’s integration engine Based on commissioned consultations  Data resources  Selection guidelines  Making available through library search facilities

42 42 Dataset Discovery Project

43 43 Access SSCR Focussing on streamlining access to established and high value data collections  Resource guides for datasets  Streamlining access to established data centres  Raising profiles of high impact datasets  E.g. 2012 Olympics and 2011 census Also piloting dataset surfacing through the Libraries search facilities

44 44 Projects – British Atmospheric Data Centre British Atmospheric Data Centre (BADC): Natural Environment Research Council's designated data centre for the Atmospheric Sciences. Assists researchers to locate, access and interpret atmospheric data and ensures the long-term integrity of this data. A joint project is underway to improve the citability of BADC datasets Publications based on the data will underlie the 2013 International Panel on Climate Change (IPCC) Report.

45 45 Challenges to Explore Helping people to … Developing and sustaining… Providing a…

46 46 A combination of eight social and technical factors – ideally there would be: Personal attribution and credit for data publication An established mechanism for citation of datasets A generic minimum metadata standard for datasets A tool to permit the easy creation of well-structured metadata A standard mechanism for packaging data files and their metadata Appropriate repositories to archive and publish research datasets Reciprocal citation links between datasets and research articles Mechanisms for quality control of data publications


Download ppt "British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson."

Similar presentations


Ads by Google