Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere,

Slides:



Advertisements
Similar presentations
The German National Library of Science and Technology as a DOI RA 2007.
Advertisements

Access to non-textual information 2008 Jan Brase IDF Open Meeting: Resource Access for a Digital World June 17th, 2008, Brussels.
Introduction to DataCite Adam Farquhar PhD Head of Digital Library Technology, The British Library President, DataCite June 2010.
Introduction to DataCite Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite June, 2010.
Professor Dave Delpy Chief Executive of Engineering and Physical Sciences Research Council Research Councils UK Impact Champion Competition vs. Collaboration:
UKRDS: the policy context 26 February 2009 Paul Hubbard Head of Research Policy, HEFCE.
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
The White Rose Collaborative Collection Partnership Brian Clifford University of Leeds.
Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
Working in collaboration with data centres Elizabeth Newbold, The British Library Presented at: DataCite Annual Conference Nancy France August 25, 2014.
UCL Library Services and Research Data Management – a case study Martin Moyle UCL Library Services ODE Workshop, LIBER Conference, 27 June 2012.
Partnering for the future David MacArthur 31 October 2003 The British Library and FIL.
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
DEVELOPMENT OF A EUROPEAN NETWORK OF LIBRARIES Hans Geleijnse Director of Library and IT Services & CIO Tilburg University, The Netherlands.
1. UKPMC ‘We exist for everyone who wants to do research – for academic, personal, or commercial purposes.’ - BL Strategy 2005/8.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)
Copyright 2006 M.R.Thorley/NERC Mark Thorley, Natural Environment Research Council Research Outputs: Their Access & Preservation A perspective.
Presented by Ansie van der Westhuizen Unisa Institutional Repository: Sharing knowledge to advance research
1 NEWSPLAN – The Way ahead Ed King, Head of Newspaper Collections, British Library NEWSPLAN LIEM Regional Council 2 October 2008.
Selecting journals for digitisation Piecing together the puzzle to create a European model Dr Hazel Woodward Cranfield University, UK
Update on research at the Institute and Faculty of Actuaries ATRC, 2 December 2014 Sarah Mathieson Head of Research and Knowledge, IFoA.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Science at the British Library Lee-Ann Coleman Head of Science, Technology and Medicine.
Ymchwil Research Ymchwil Research RESAW Ioan Isaac-Richards Ingest Processes Manager Head of Web Archiving
CrossRef, DOIs and Data: A Perfect Combination Ed Pentz, Executive Director, CrossRef CODATA ’06 Session K4 October 25, 2006.
HEALTH DEVELOPMENT AGENCY ONLINE INFORMATION RESOURCES Heidi Livingstone Marta Calonge Contreras.
DataCite Canada Cyndie Found, CISTI Background : Who is CISTI, Definition of Data Research Data Management(RDM) – Benefits, Challenges Addressing.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
ORCID and me: DataCite ORCID Outreach Meeting Jan Brase, Managing agent DataCite September 17th, 2011 CERN.
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
1 CrossRef - a DOI Implementation for Journal Publishers January 29, 2003 CENDI Workshop.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
The DOI Standard Nettie Lagace NISO Associate Director for Programs CEAL Workshop on Electronic Resources Standards and Best Practices March.
The International e-Depot to Guarantee Permanent Access to Scholarly Publications Marcel Ras Tartu, June 2012.
DataCite CODATA Symposium Jan Brase, Managing agent DataCite August 22nd, 2011 Berkeley.
Open access & visibility Management Digital Preservation ORA: Purposes.
Alasdair Ball Head of Collection Acquisition and Description The British Library 26 th April 2012 EDUG Symposium 2012 ‘Classification: Leveraging the power.
1 Annual Meeting 2004 CrossRef Publishers International Linking Association, Inc Charles Hotel, Cambridge, MA November 9 th, 2004.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Libraries and data – the DataCite consortium Jan Brase, DataCite February 2nd, 2011 Workshop: Persistent Identifiers for the Social Sciences Bonn, Germany.
Collection Development in a Grey World Jennie Grimshaw and Elizabeth Newbold GL10 Conference, Amsterdam December 2008.
Speeding Nano Progress Using Information Diffusion Walt Warnick, Ph.D. Director, Office of Scientific and Technical Information U.S. Department of Energy.
The Many Facets of Metadata Exchange Between Publishers and the Research Community: The Role that A&I Services and DOIs Play in Providing Access to Electronic.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
ETDs in the UK Progress and Challenges Maja Maricevic Head of Higher Education October
Group 1 – Session 3 Key Points. Experiences in digital archiving Who is involved? –Partnerships with library and computer centre –Who should be responsible?
Bridging the gap between data centres and publishers J. Brase ICSTI Workshop “Interactive Publications and the Record of Science February 8th, 2010.
Citing Datasets. Research: search for knowledge or any systematic investigation to establish facts. And to establish facts, one needs Data.
DataCite Adam Farquhar DataCite President ODIN Conference, CERN,
Data Citation Implementation Pilot Workshop
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
Primo at the British Library Mandy Stewart. 2 About the British Library The British Library is the National Library of the UK It is a world-class.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
Living Knowledge: Networks in a digital age Liz White Head of Strategy Development April 2016 #livingknowledge.
Open Access and the ESRC New directions in scholarly communications in the social sciences.
British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson.
NRF Open Access Statement
ACS 2016 Moving research forward with persistent identifiers
Linking persistent identifiers at the British Library
Legal Deposit & UK Publishing
ORCID y la comunidad global
DataCite - A global registration agency for research data
Research data in library catalogues and the joint initiative of European technical libraries for data registration Jan Brase Workshop Primary data for.
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere, 27 May 2009

The British Library: ‘This is the life blood of research and innovation’ GIA Funding 08/09: £94.8m operational, £12m capital Other funding secured 07/08: c.£33m Helping people advance knowledge to enrich lives National library of the UK. Serves researchers, business, libraries, education & the general public Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m) The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours 2 main sites in London and Yorkshire. Circa 2,000 staff Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development Generates value to the UK economy each year of 4.4 times public funding Collection fills over 600km of shelving and grows at 11km per year 30 Tb of digital material growing rapidly Science and Innovation Investment Framework , H.M. Treasury (2004) Information infrastructure 2.23 The growing UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation.

3 Supporting research Social Sciences Science, Technology & Medicine Arts & Humanities Document Supply service provides 1.4m articles/year primarily to scientists Renewed engagement with researchers using digital content and online services In-depth focus on biomedicine and energy/environment Collection includes journals, patents, theses and more, and is updated by some 9,000 articles every day A significant international collection of books, journals, reports, theses, official publications and other materials A unique collection of grey literature, of special interest to practitioners and theoreticians Research collaboration with ESRC Greatest research collection of its kind in the world World-class curatorial expertise by subject, medium and geographical area BL has been developing world-leading e-innovations for past decade (e.g. International Dunhuang Project) and building a significant corpus of digitised texts Research collaboration with AHRC, British Academy and HEIs

4 Building the Digital Research Infrastructure BL Digital library system Large scale, highly resilient digital store Continuous validation & correction Long term digital storage for BL content & eLegal deposit/distribution Long term access (digital preservation) Leading EU-funded digital preservation project ‘Planets’ (16 partners) Developing cost models and case studies with UCL (‘Life’ projects) Addressing root causes of digital obsolescence Edinburgh Aberystwyth Boston Spa St. Pancras Cambridge Univ. Oxford Univ.

55 Digital Library Live Content Streams Sound Archives Voluntary Digital Donations Nineteenth Century Digitised Books Born Digital Newspapers Storage >440,000 Digital Items >30 Terabytes of Content Coming soon eJournals Digitised Newspapers

66 Role of the British Library in Science, Technology and Medicine Long history of collecting scientific and technical literature Serves business & industry, researchers, academics and students Dedicated reading rooms in London The Library operates the world’s largest document delivery service - millions of items each year to customers all over the world predominantly in the STM disciplines Indexing the UK input into Medline/PubMed Creation of AMED (Allied and Complementary Medicine A&I Database) research articles on complementary medicine and allied health Lead Partner in UK PubMed Central

7 WorldWideScience.org Global science gateway based on US Department of Energy’s Science.Gov service Multilateral partnership to enable federated searching of national and international scientific databases and portals. Launched in 2008 Large number of countries already providing access to publicly funded research outputs - latest addition is China Chaired by British Library

88 UK PubMed Central Number of articles: 1.4 million Over 2,500 manuscripts submitted by grant holders Information held on 20,000 research grants awarded to 9,000 PIs by UKPMC Funders Downloads have grown strongly with over 300,000 in March 2009 UKPMC users are predominantly UK based (70%) but service is accessed across the world Working with the Bioscience community and Funders to develop the service based on UK research community needs Launched in January 2007

9 Research Information Centre – the research lifecycle 9 Based on Microsoft’s Sharepoint product Developed with Microsoft External Research Team DOI: /ADVCOMP Beta tested by 25 bioscience research teams (academia & commercial) in UK & US Supports full research life-cycle Accessible by web browser Configured for biosciences but flexible Designed for collaboration

10 Social Science Collection and Research New team established in 2006 Priorities: define and develop the collection, improve accessibility, raise awareness, build networks, build capacity Strong focus on researcher needs Develop strategies for grey literature and data access Build the collection of government publications Recent and historic print collections with LSE and Oxford Soc Science Library, … Digital and web collections with TNA and UK e-OP ‘digital continuity’ Managing Access to Government Information Collaboratively (MAGIC) with LSE ©Clive Sherlock

11 Social Science Collection and Research Research collaborations Voices of the UK; Children’s play in the media age Knowledge exchange, awareness and capacity building Corporate and Social Responsibility seminars Multi-modal PhD seminars ESRC Festival of Social Science ESRC Interns Postgraduate training days, thematic study days, ESDS seminars Public events - Census 2011 to explain the role of quantitative and qualitative social surveys ©Clive Sherlock

12 Books and data – a parable A scientist measured environmental conditions to determine their impact on leather bindings When the project was complete, he printed the data, bound it, and submitted it to UK copyright libraries Thirty years later, a scientist took it off the shelf and started to reuse the data, and collect anew When his project was complete, he had had 30,000 images and megabytes of data Too big for any shelf Not interesting for a data centre Is the project web site enough?

13 Journals and data – a problem In 2003, Legal Deposit Legislation in the UK is extended to cover digital material Building on the 1911 Legal Deposit Act Electronic journal articles are covered – they will be collected and archived for the long term … But supplementary material is not covered For now, it remains on the publisher web sites

14 Long-term access is critical According to a Parse.Insight survey 50% needed research data gathered by other researchers that was not available Within High Energy Physics More than 90% think that data preservation is important - crucial Benefits include Verify scientific results independently (60%) Combine past and future data (60%) Re-analyze in the light of new theories and future results (75%) 45% - old data could have improved their scientific results 40% - important HEP data have been lost in the past. Many are willing to share 80% would provide data behind tables and figures 45% would provide “raw” data But 50% believe costs to repackage for sharing are high

15 Widening gap A widening gap in the scientific record between published research and the data that underlies it Published work held by libraries Datasets held by data centres No effective way to link between datasets and articles No widely used method to identify datasets No widely used method to cite datasets As a result, datasets are Difficult to discover Difficult to access Second-class citizens in the scientific record

16 Datasets in the scholarly record (OECD White Paper) 45% of journal publishers provide access to datasets associated with journal articles they publish (ALPSP) But there are no rules about how to publish, present, cite, or otherwise catalogue datasets Citation Main mortality estimate: Estimated settler mortality. Settler mortality is calculated from the mortality rates of European-born soldiers, sailors, and bishops when stationed in colonies. It measures the effects of local diseases on people without inherited or acquired immunities. Source: Acemoglu et al. (2001), based on Curtin (1989) and other sources. Citation Tertiary school enrollment: School enrollment, tertiary (% of gross). Source: Barro and Lee (2000) and their databases

17 Datasets – first class citizens? Datasets Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it (your discipline may vary)! UKRDS Study Published articles Libraries ensure long-term storage and management Established funded services provide the primary means of access Nearly all published articles are held in multiple national libraries Articles and citations form the backbone of impact analysis Catalogues and full-text search support discovery

18 Global responses to the challenge Research council mandates Data management plans Data retention plans Funded initiatives Australian National Data Service UK Research Data Service UK Digital Curation Centre US DataNet programme JISC Data programme EU Science Data Infrastructure, … STM publishers Brussels Declaration: Raw research data should be made freely available to all researchers

19 ? CiteReuseVerify Track Impact AccessFind Make Visible Persistent Identification A key component for many goals

20 Dataset citation using Digital Object Identifiers (DOIs) The DOI system offers an easy way to connect the article with the underlying data Several organisations have started to assign DOIs to datasets IUCR, ICPSR, OECD through CrossRef Pangea, Mare, and others through TIB (German Science Library) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi: /PANGAEA Article G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, doi: /nature05431 Cites

21 It looks so easy Organisational challenges Data centres, funders have regional or disciplinary scope Universities have teaching and research mission and competitive relationships Publishers do not cover un- published material Consortium of the above require large and fragile coalitions We need an consortium of national institutions with a long- term stewardship role Social challenges Acceptance by key stakeholders including funders, data centres, universities, researchers, publishers Use by data creators and authors Technical challenges Robust infrastructure Identifying the right thing Ensuring longevity

22 DataCite Organisations with the national science library role are working together to establish a European and global infrastructure to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence Publishing agents (data centres, research institutes) are responsible for: Quality assurance Content storage and access Creating the identifier Creating and updating metadata The DataCite registration agency Maintains the resolution infrastructure Maintains a searchable database of metadata Manages the identifiers over the long term Establish and share best practice

23 Memorandum of Understanding Paris, March 2, 2009 Recognizing the importance of research datasets as the foundation of knowledge and sharing a common commitment to promote and establish persistent access to such datasets, we, the signed parties, hereby express our interest to work together to promote global access to research data. Our long term vision is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.

24 Initial Signatories Technische Informationsbibliothek (TIB), Germany Library or the ETH Zürich, Switzerland L’Institut de l’Information Scientifique et Technique (INIST), France Library of TU Delft, The Netherlands Technical Information Center of Denmark The British Library

25 Key facts about DOI Usage >35m DOIs have been assigned >2m resolutions each month Organizational Not-for-profit International DOI Foundation (IDF) Provides social infrastructure Includes registration agencies Registration done in co- operation with a publication agent Publication agents are responsible for the content Technical A DOI Name is a persistent identifier used to cite and link resources Linked to an object – not to a location The location may change, but the DOI remains the same The DOI System holds metadata about objects including their URL Resolution redirects the user from a DOI name to the URL

26 Strengths and weaknesses of DOI DOIs have some strong advantages Accepted by researchers and scientists Mature infrastructure Put datasets on the same playing field as articles But perceived as Expensive The current IDF business model favours larger registration agencies Publisher oriented The largest registration agency is the publisher-oriented CrossRef

27 DataCite Structure DataCite National Institution Data Centre National Institution Data Centre … Carries Works with International DOI Foundation Global Handle System

28 Typical workflow (Data Centre) Data Centre registers with DataCite Data Centre ingests a dataset and assigns an identifier Data Centre registers the dataset by submitting an XML file containing relevant bibliographic metadata and the URL for the dataset’s access page Metadata drawn from ISO for referencing electronic information language publisher publishing date publishing place author title size edition

29 Typical workflow (2) Author Includes citation using the DOI, just like an article Reader Follows the resolvable link that includes the DOI (or searches for it), just like an article Reaches a unique landing page at the Data Centre for the dataset Open to every reader Includes the DOI and metadata to help the reader decide if the dataset will help May need to take additional steps to access the dataset

30 Research Data in Articles

31 Thanks! The British Library has a duty of care for the scientific record Renewed engagement in STM and Social Sciences Actively partnering to achieve goals There is a widening gap between published research and the data that underlies it DataCite will support researchers by enabling them to locate, identify, and cite research datasets with confidence This is the start of a long and open dialogue There are many open issues to address We welcome your comments, questions, and ideas! adam.farquhar bl.uk