Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere,

Similar presentations


Presentation on theme: "Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere,"— Presentation transcript:

1 Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere, 27 May 2009

2 The British Library: ‘This is the life blood of research and innovation’ GIA Funding 08/09: £94.8m operational, £12m capital Other funding secured 07/08: c.£33m Helping people advance knowledge to enrich lives National library of the UK. Serves researchers, business, libraries, education & the general public Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m) The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours 2 main sites in London and Yorkshire. Circa 2,000 staff Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development Generates value to the UK economy each year of 4.4 times public funding Collection fills over 600km of shelving and grows at 11km per year 30 Tb of digital material growing rapidly Science and Innovation Investment Framework 2004-2014, H.M. Treasury (2004) Information infrastructure 2.23 The growing UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation.

3 3 Supporting research Social Sciences Science, Technology & Medicine Arts & Humanities Document Supply service provides 1.4m articles/year primarily to scientists Renewed engagement with researchers using digital content and online services In-depth focus on biomedicine and energy/environment Collection includes journals, patents, theses and more, and is updated by some 9,000 articles every day A significant international collection of books, journals, reports, theses, official publications and other materials A unique collection of grey literature, of special interest to practitioners and theoreticians Research collaboration with ESRC Greatest research collection of its kind in the world World-class curatorial expertise by subject, medium and geographical area BL has been developing world-leading e-innovations for past decade (e.g. International Dunhuang Project) and building a significant corpus of digitised texts Research collaboration with AHRC, British Academy and HEIs

4 4 Building the Digital Research Infrastructure BL Digital library system Large scale, highly resilient digital store Continuous validation & correction Long term digital storage for BL content & eLegal deposit/distribution Long term access (digital preservation) Leading EU-funded digital preservation project ‘Planets’ (16 partners) Developing cost models and case studies with UCL (‘Life’ projects) Addressing root causes of digital obsolescence Edinburgh -2009 Aberystwyth Boston Spa St. Pancras Cambridge Univ. Oxford Univ.

5 55 Digital Library Live Content Streams Sound Archives Voluntary Digital Donations Nineteenth Century Digitised Books Born Digital Newspapers Storage >440,000 Digital Items >30 Terabytes of Content Coming soon eJournals Digitised Newspapers

6 66 Role of the British Library in Science, Technology and Medicine Long history of collecting scientific and technical literature Serves business & industry, researchers, academics and students Dedicated reading rooms in London The Library operates the world’s largest document delivery service - millions of items each year to customers all over the world predominantly in the STM disciplines Indexing the UK input into Medline/PubMed Creation of AMED (Allied and Complementary Medicine A&I Database) research articles on complementary medicine and allied health Lead Partner in UK PubMed Central

7 7 WorldWideScience.org Global science gateway based on US Department of Energy’s Science.Gov service Multilateral partnership to enable federated searching of national and international scientific databases and portals. Launched in 2008 Large number of countries already providing access to publicly funded research outputs - latest addition is China Chaired by British Library

8 88 UK PubMed Central Number of articles: 1.4 million Over 2,500 manuscripts submitted by grant holders Information held on 20,000 research grants awarded to 9,000 PIs by UKPMC Funders Downloads have grown strongly with over 300,000 in March 2009 UKPMC users are predominantly UK based (70%) but service is accessed across the world Working with the Bioscience community and Funders to develop the service based on UK research community needs Launched in January 2007

9 9 Research Information Centre – the research lifecycle 9 Based on Microsoft’s Sharepoint product Developed with Microsoft External Research Team DOI:10.1109/ADVCOMP.2007.14 Beta tested by 25 bioscience research teams (academia & commercial) in UK & US Supports full research life-cycle Accessible by web browser Configured for biosciences but flexible Designed for collaboration

10 10 Social Science Collection and Research New team established in 2006 Priorities: define and develop the collection, improve accessibility, raise awareness, build networks, build capacity Strong focus on researcher needs Develop strategies for grey literature and data access Build the collection of government publications Recent and historic print collections with LSE and Oxford Soc Science Library, … Digital and web collections with TNA and UK e-OP ‘digital continuity’ Managing Access to Government Information Collaboratively (MAGIC) with LSE ©Clive Sherlock

11 11 Social Science Collection and Research Research collaborations Voices of the UK; Children’s play in the media age Knowledge exchange, awareness and capacity building Corporate and Social Responsibility seminars Multi-modal PhD seminars ESRC Festival of Social Science ESRC Interns Postgraduate training days, thematic study days, ESDS seminars Public events - Census 2011 to explain the role of quantitative and qualitative social surveys ©Clive Sherlock

12 12 Books and data – a parable A scientist measured environmental conditions to determine their impact on leather bindings When the project was complete, he printed the data, bound it, and submitted it to UK copyright libraries Thirty years later, a scientist took it off the shelf and started to reuse the data, and collect anew When his project was complete, he had had 30,000 images and megabytes of data Too big for any shelf Not interesting for a data centre Is the project web site enough?

13 13 Journals and data – a problem In 2003, Legal Deposit Legislation in the UK is extended to cover digital material Building on the 1911 Legal Deposit Act Electronic journal articles are covered – they will be collected and archived for the long term … But supplementary material is not covered For now, it remains on the publisher web sites

14 14 Long-term access is critical According to a Parse.Insight survey 50% needed research data gathered by other researchers that was not available Within High Energy Physics More than 90% think that data preservation is important - crucial Benefits include Verify scientific results independently (60%) Combine past and future data (60%) Re-analyze in the light of new theories and future results (75%) 45% - old data could have improved their scientific results 40% - important HEP data have been lost in the past. Many are willing to share 80% would provide data behind tables and figures 45% would provide “raw” data But 50% believe costs to repackage for sharing are high

15 15 Widening gap A widening gap in the scientific record between published research and the data that underlies it Published work held by libraries Datasets held by data centres No effective way to link between datasets and articles No widely used method to identify datasets No widely used method to cite datasets As a result, datasets are Difficult to discover Difficult to access Second-class citizens in the scientific record

16 16 Datasets in the scholarly record (OECD White Paper) 45% of journal publishers provide access to datasets associated with journal articles they publish (ALPSP) But there are no rules about how to publish, present, cite, or otherwise catalogue datasets Citation Main mortality estimate: Estimated settler mortality. Settler mortality is calculated from the mortality rates of European-born soldiers, sailors, and bishops when stationed in colonies. It measures the effects of local diseases on people without inherited or acquired immunities. Source: Acemoglu et al. (2001), based on Curtin (1989) and other sources. Citation Tertiary school enrollment: School enrollment, tertiary (% of gross). Source: Barro and Lee (2000) and their databases

17 17 Datasets – first class citizens? Datasets Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it (your discipline may vary)! UKRDS Study Published articles Libraries ensure long-term storage and management Established funded services provide the primary means of access Nearly all published articles are held in multiple national libraries Articles and citations form the backbone of impact analysis Catalogues and full-text search support discovery

18 18 Global responses to the challenge Research council mandates Data management plans Data retention plans Funded initiatives Australian National Data Service UK Research Data Service UK Digital Curation Centre US DataNet programme JISC Data programme EU Science Data Infrastructure, … STM publishers Brussels Declaration: Raw research data should be made freely available to all researchers

19 19 ? CiteReuseVerify Track Impact AccessFind Make Visible Persistent Identification A key component for many goals

20 20 Dataset citation using Digital Object Identifiers (DOIs) The DOI system offers an easy way to connect the article with the underlying data Several organisations have started to assign DOIs to datasets IUCR, ICPSR, OECD through CrossRef Pangea, Mare, and others through TIB (German Science Library) Dataset G.Yancheva, N. R. Nowaczyk et al (2007) Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi:10.1594/PANGAEA.587840 Article G. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoon Nature 445, 74-77 doi:10.1038/nature05431 Cites

21 21 It looks so easy Organisational challenges Data centres, funders have regional or disciplinary scope Universities have teaching and research mission and competitive relationships Publishers do not cover un- published material Consortium of the above require large and fragile coalitions We need an consortium of national institutions with a long- term stewardship role Social challenges Acceptance by key stakeholders including funders, data centres, universities, researchers, publishers Use by data creators and authors Technical challenges Robust infrastructure Identifying the right thing Ensuring longevity

22 22 DataCite Organisations with the national science library role are working together to establish a European and global infrastructure to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence Publishing agents (data centres, research institutes) are responsible for: Quality assurance Content storage and access Creating the identifier Creating and updating metadata The DataCite registration agency Maintains the resolution infrastructure Maintains a searchable database of metadata Manages the identifiers over the long term Establish and share best practice

23 23 Memorandum of Understanding Paris, March 2, 2009 Recognizing the importance of research datasets as the foundation of knowledge and sharing a common commitment to promote and establish persistent access to such datasets, we, the signed parties, hereby express our interest to work together to promote global access to research data. Our long term vision is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.

24 24 Initial Signatories Technische Informationsbibliothek (TIB), Germany Library or the ETH Zürich, Switzerland L’Institut de l’Information Scientifique et Technique (INIST), France Library of TU Delft, The Netherlands Technical Information Center of Denmark The British Library

25 25 Key facts about DOI Usage >35m DOIs have been assigned >2m resolutions each month Organizational Not-for-profit International DOI Foundation (IDF) Provides social infrastructure Includes registration agencies Registration done in co- operation with a publication agent Publication agents are responsible for the content Technical A DOI Name is a persistent identifier used to cite and link resources Linked to an object – not to a location The location may change, but the DOI remains the same The DOI System holds metadata about objects including their URL Resolution redirects the user from a DOI name to the URL

26 26 Strengths and weaknesses of DOI DOIs have some strong advantages Accepted by researchers and scientists Mature infrastructure Put datasets on the same playing field as articles But perceived as Expensive The current IDF business model favours larger registration agencies Publisher oriented The largest registration agency is the publisher-oriented CrossRef

27 27 DataCite Structure DataCite National Institution Data Centre National Institution Data Centre … Carries Works with International DOI Foundation Global Handle System

28 28 Typical workflow (Data Centre) Data Centre registers with DataCite Data Centre ingests a dataset and assigns an identifier Data Centre registers the dataset by submitting an XML file containing relevant bibliographic metadata and the URL for the dataset’s access page Metadata drawn from ISO 690-2 for referencing electronic information language publisher publishing date publishing place author title size edition

29 29 Typical workflow (2) Author Includes citation using the DOI, just like an article Reader Follows the resolvable link that includes the DOI (or searches for it), just like an article Reaches a unique landing page at the Data Centre for the dataset Open to every reader Includes the DOI and metadata to help the reader decide if the dataset will help May need to take additional steps to access the dataset

30 30 Research Data in Articles

31 31 Thanks! The British Library has a duty of care for the scientific record Renewed engagement in STM and Social Sciences Actively partnering to achieve goals There is a widening gap between published research and the data that underlies it DataCite will support researchers by enabling them to locate, identify, and cite research datasets with confidence This is the start of a long and open dialogue There are many open issues to address We welcome your comments, questions, and ideas! Email: adam.farquhar {@} bl.uk


Download ppt "Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital Library Technology The British Library IASSIST, Tampere,"

Similar presentations


Ads by Google