Pure Silver Reusing and Repurposing Bibliographic Data in a Current Research Information System and Institutional Repository
15 September 2010
Robin Armstrong Viner
Cataloguing Manager, Library & Historic Collections
University of Aberdeen
Introduction Background PURE UK Data Model Common European Research Information Format (CERIF) Workflow Bibliographic Data Harvesting Institutional Repository Integration Bibliographic Data Export Benefits and Lessons Learned for ‘Interesting Times’ Background PURE UK Data Model Common European Research Information Format (CERIF) Workflow Bibliographic Data Harvesting Institutional Repository Integration Bibliographic Data Export Benefits and Lessons Learned for ‘Interesting Times’
Background 2008-09 – Review of the University’s existing publications database Likely requirements of the Research Excellence Framework (REF) Lessons learned from the Research Assessment Exercise 2008 (RAE 2008) Data quality issues Technical support issues Usability issues Existing publications database Time consuming All publications entered manually Separate workflow for adding publications to our Full Text Institutional Repository – the Aberdeen University Research Archive Inconsistent Only publications selected for RAE 2008 checked Changes made outside the database Incomplete Between 90% (2005) to 60% (2008) of publications recorded Less than 3% of publications added to Institutional Repository Isolated No web interface No integration with the University web pages for individual members of staff
Pure Spring 2009 – Pure from Atira selected as the University’s Current Research Information System (CRIS) Offered synchronisation with the University’s existing data sources Offered integration with external data sources Offered integration with Institutional Repository Offered Research Portal and CV Modules Offered improved usability Spring 2009 – Pure from Atira selected as the University’s Current Research Information System (CRIS) Offered synchronisation with the University’s existing data sources Offered integration with external data sources Offered integration with Institutional Repository Offered Research Portal and CV Modules Offered improved usability
Pure UK Data Model Eight Families linked by Relationships Additional Families added to reflect REF Four additional Families added to support Workflow Families have Templates, Classifications and Elements Relationships have Classifications Eight Families linked by Relationships (or Roles) Activities, Events, Impacts, Journals, Organisations, Persons, Projects and Research Outputs Additional Families added to reflect REF Assessment Environments, Assessment Organisations (Organisation), Assessment Persons (Persons) and Impact Statements (Impacts) Four additional Families added to support Workflow External Organisations (Organisations), External Persons (Persons), Publishers (Organisations) and Users (Persons) Families have Templates, Classifications and Elements Relationships have Classifications
Pure UK Data Model Research Output Family Nine Templates covering 40 Classifications Additional Templates developed for other UK customers 41 Elements Linked to eight Families Research Output Family Nine Templates covering 40 Classifications Book/Report Anthology, Book, Commissioned Report, Other Report and Scholarly Edition Contribution to journal Article, Article/Book/Film Review, Comment/Debate, Editorial, Letter, Scientific Review and Special Issue Chapter in Book/Conference Proceeding/Report Chapter, Conference Contribution, Entry for Dictionary/Encyclopaedia, Foreword/Postscript and Other Contribution Contribution to Conference Abstract, Other, Paper and Poster Contribution to Specialist Publication Article, Article/Book/Film Review, Featured Article, Editorial, Letter and Special Issue Non-Textual Form Artefact, Composition, Data set/Database, Design, Digital or Visual Products, Exhibition, Performance, Software and Web Publication/Site Other Contribution Patent Working Paper Discussion Paper and Working Paper Additional Templates developed for other UK customers Thesis 41 Elements Abstract, Bibliographical note, Document Filename, Documents Document version, Documents Embargoed until, Documents Type, Documents Visibility, DOIs, Edition, Editors First name, Editors Last name, Group author, ID, IPC (International Patent Classification), ISBN, ISBN (Electronic), Issue number, Keywords, Library keywords, Number of pages, Original language, Pages (from-to), Patent number, Peer review, Place of publication, Publication date, Series Name, Series Number, Series Volume, Short description, Size, Status, Subtitle of the contribution in original language, Subtitle of the host publication in the original language, Title of the contribution in the original language, Title of the host publication in the original language, URL Text, URLs Web address (URLs), UT, Visibility and Volume Linked to eight Families with an additional 20 Elements Event City, Country, End Date, Start Date and Title External Organisations Name External Persons Current Organisations, First Names, Last Names and Roles Journals Electronic ISSN, ISSN and Title Organisations Persons Projects ID and Name Publisher
CERIF Data Model Common European Research Information Format (CERIF) Standard data model developed by the European Union and now supported by euroCRIS Aims to enable access to CRIS and exchange of information between CRIS Entities linked by Relationships Four Entity Types Core OrganisationUnit ,Person and Project Represented by the Organisation, Person and Project Families Link Conects two Entities Result ResultPatent, Result Product and ResultPublication Represented by the Research Output Family Second Level Represented by Elements in each Family Three Features Additional, Multiple Language and Semantics
Workflow - Users A single user workflow in the CRIS allowing individual members of staff (or their designated administrators) to Add their Activities, Impacts and Research Outputs Harvesting Bibliographic Data from external sources Linking them to Organisation, Person and Projects Data harvested from internal sources Add the Full Text of Research Outputs for inclusion in the Institutional Repository Update the Research Portal and their University Web Pages Update their Research Outputs – including adding the Full Text A single user workflow in the CRIS which allows individual members of staff (or their designated administrators) to Add their Activities, Impacts and Research Outputs Harvesting Bibliographic Data from external sources Linking them to Organisation, Person and Projects Data harvested from internal sources Add the Full Text of Research Outputs for inclusion in the Institutional Repository Either they are notified that the Research Output has been validated and the Full Text made available in Institutional Repository if it was attached Or the Research Output is returned to them with a request for more information or the appropriate version of the Full Text Update the Research Portal and their University Web Pages The Bibliographic Data displays immediately in the Research Portal and their University Web Pages unless The Research Output has not been published They choose not to make the Bibliographic Data publically available Update their Research Outputs – including adding the Full Text The Bibliographic will be updated immediately on the Research Portal and their University Web Pages) The Full Text will be temporarily removed from Institutional Repository if it had previously been made available Either they are notified that the Research Output has been re-validated and the Full Text made available in Institutional Repository
Workflow – Institutional Repository Team A single workflow in the CRIS allowing the Repository team to Validate published Research Outputs Re-validate published Research Outputs updated by individual members of staff Manage embargoes Update the Research Portal and the individual member of staff’s University Web Page automatically with any changes A single workflow in CRIS allowing the Institutional Repository team to Validate published Research Outputs Checking the Bibliographic Data Checking the Full Text against Sherpa Romeo Either triggering the transfer of the Full Text to the appropriate collection in the Institutional Repository Or returning the Research Output to the individual member of staff requesting more information or the appropriate version of the Full Text Re-validate published Research Outputs updated by individual members of staff Either triggering the transfer of the Full Text to Institutional Repository, setting the embargo if appropriate Manage embargoes Update the Research Portal and the individual member of staff’s University Web Page automatically with any changes
Bibliographic Data Harvesting Individual members of staff (or their designated administrators) can import Bibliographic Data via ArXiv, PubMed and Web of Science Application Programming Interfaces (APIs) BibTex, RefMan and RIS Formats Planned developments include Harvesting of InCites Bibliometric Data Web of Science harvesting service Integration of SHERPA RoMEO and JULIET APIs Individual members of staff (or their designated administrators) can import Bibliographic Data via ArXiv, PubMed and Web of Science Application Programming Interfaces (APIs) Each API mapped to the Elements in the Research Output Family by Atira Thomson Reuters Unique Tag (UT) used to highlight potential duplicates Access to the Scopus API is currently under negotiation BibTex, RefMan and RIS Formats Allows import of data harvested from a variety of sources Planned developments include adding EndNote to the list of sources Planned developments include Harvesting of InCites Bibliometric Data Web of Science harvesting service Allows regular updating of CRIS with new Bibliographic Data Integration of SHERPA RoMEO and JULIET APIs Allows individual members of staff to see what their funder requires and their publisher permits
Bibliographic Data Harvesting CRIS pre-populated with enhanced and new Bibliographic Data provided by Thomson Reuters Three Bibliographic Data Sets uploaded Existing Bibliographic Data enhanced with Web of Science Bibliographic Data Existing Bibliographic Data which could not be matched to Web of Science Bibliographic Data Web of Science Bibliographic Data linked to the University by Thomson Reuters which could not be matched to the existing Bibliographic Data CRIS pre-populated with enhanced and new Bibliographic Data provided by Thomson Reuters Included in InCites subscription Three Bibliographic Data Sets uploaded Existing Bibliographic Data enhanced with Web of Science Bibliographic Data Enhancements included abstracts, DOIs and Thomson Reuters UT Given ‘Validated’ status in the workflow Existing Bibliographic Data which could not be matched to Web of Science Bibliographic Data Often relating to the humanities but including items with limited or poor quality Bibliographic Data Given ‘For Validation’ status in the workflow Web of Science Bibliographic Data linked to the University by Thomson Reuters which could not be matched to the existing Bibliographic Data Matched using addresses
Internal Data Harvesting Internal Data sources Grants and Contracts Database Projects HR Database Organisations and Person Mapped to Elements in Organisation, Person and Project Families by University Data team
Institutional Repository Integration Bibliographic Data and Full Text transferred from the CRIS to the Institutional Repository via Lightweight Network Interface (LNI) 50 Elements from Research Output Family and linked Families mapped to 23 Qualified Dublin Core Elements Metadata Profile based on analysis of Scottish Practice Bibliographic Data re-presented using MODS Crosswalk Bibliographic Data and Full Text transferred from the CRIS to the Institutional Repository via Lightweight Network Interface (LNI) Transfer triggered when Institutional Repository team change Research Output Workflow Status to ‘Validated’ Workflow Status changed to ‘For Revalidation’ when Full Text added to ‘Validated’ Research Outputs 50 Elements from Research Output Family and linked Families mapped to 23 Qualified Dublin Core Elements Abstract (description.abstract), Bibliographical Note (description), Current Organisations (contributor.institution), Document Type (type), Document Version (description.version), DOIs (identifier.uri), Edition (identifier.citation), Editor’s First Name (identifier.citation), Editor’s Last Name (identifier.citation), Electronic ISBN (identifier.isbn), Electronic ISSN (identifier.issn), Event City (contributor.other), Event Country (contributor.other), Event End Date (contributor.other), Event Start Date (contributor.other), Event Title (contributor.other), First Name (contributor.[Defined by Pure element [Person] role]), Funding Organisation (contributor.funder), Group Author (contributor.author), ID (identifier.other), International Patent Classification (subject.classification), ISBN (identifier.isbn), ISSN (identifier.issn), Issue Number (identifier.citation), Journal Title (identifier.citation), Keywords (subject), Last Name (contributor.[Defined by Pure element [Person] role]), Library Keywords (subject.lcc), Number of Pages (format.extent), Original Language (language.iso), Pages (From-To) (identifier.citation), Patent Number (identifier.other), Peer Review (description.statuts), Place of Publication (identifier.citation), Publication Date (identifier.citation and date.issued), Publisher Name (identifier.citation), Role (contributor.[Defined by Pure element [Person] role]), Series Name (identifier.citation), Series Number (identifier.citation), Series Volume (identifier.citation), Short Description (description), Size (format.extent), Subtitle of the Contribution in the Original Language (title), Subtitle of the Host Publication in the Original Language (identifier.citation), Title of the Contribution in the Original Language (title), Title of the Host Publication in the Original Language (identifier.citation), UT (identifier.other), Volume (identifier.citation) and Web Address (identifier.uri) Metadata Profile based on analysis of Scottish Practice Some conflicts with the Institutional Repository Infrastructure for Scotland (IRIScotland) Metadata Agreement Bibliographic Data re-presented using MODS crosswalk Developed by the University of Helsinki Enhanced by Atira for UK Pure Data Model Enhanced by Scottish Digital Library Consortium (SDLC) for the University
Bibliographic Data Export Bibliographic Data exported Automatically to Research Portal and the University Web Pages of individual members of staff Manually through management reports and through lists and reports downloaded by individual members of staff Planned developments include export and electronic submission of REF Data Bibliographic and Bibliometric Data use limited by copyright and licence restrictions Bibliographic Data exported Automatically to Research Portal and the University Web Pages of individual members of staff Individual members of staff set visibility of Bibliographic Data and Full Text for Portal and select items of their University Web Pages Manually through management reports and through lists and reports downloaded by individual members of staff Reports can be scheduled or created on demand Planned developments include export and electronic submission of REF Data Bibliographic and Bibliometric Data use limited by copyright and licence restrictions Thomson Reuters unable to grant permission for public reuse of abstracts Thomson Reuters unwilling to grant permission for systematic public reuse of Bibliometric Data
Benefits and Lessons Learned for ‘Interesting Times’ Streamlined process For individual members of staff For the Institutional Repository team Increased content and profile for the CRIS and the Institutional Repository Improved internal profile for Library & Historic Collections with Project Board Individual members of staff Compliance Collaborative working Streamlined process For individual members of staff Reuseing and repurposing Bibliographic Data allows Research Outputs to be recorded and the Full Text added to the Institutional Repository, the Research Portal and the University Web Pages of individual members of staff in minutes without the need to rekey Bibliographic Data For the Institutional Repository team Importing Bibliographic Data from external sources ensures better quality data entry requiring fewer corrections and augmentation Single integrated workflow eliminates double checking of Bibliographic Data Increased content and profile for the CRIS and the Institutional Repository In the first month the number of Research Outputs increased by 20% and there was a potential 35% increase to the content of Institutional Repository The Institutional Repository team request the Full Text as Research Outputs are added Improved internal profile for Library & Historic Collections with Research Information System Project Board Vice Principal for Research & Commercialisation; Heads of research of the College of Arts & Social Sciences and the College of Life Sciences & Medicine and the College of Physical Sciences; and the Research & Innovation Team Individual members of staff Demonstrating research support Compliance Moving towards a single source of Full Text minimising the risk of infringing Copyright Collaborative working CRIS procured jointly with the University of St Andrews Separate installations of a single data model Institutional Repository externally hosted solution supported by the Scottish Digital Library Consortium (SDLC) Institutional Repository is one of five Scottish repositories hosted by SDLC, including St Andrews’ Digital Research Repository (DRR) Separate installations of the same integration model Bibliographic Data licences and subscriptions jointly negotiated with Atira, St Andrews (and now other members of the Pure UK User Group)
