Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text Andrea Bollini – Susanna Mornati
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
The Company as last week! Interuniversity Consortium 57 Members No-Profit Founded in 1969 Headquarter in Bologna 57 Members 54 Universities 2 Research institutes MIUR Owned companies: Kion, SCS. Employees: 400 (+150 Kion) Total turnover: 70M€ www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
The Merge 2.0 67 Members More than 700 employees (+ 150 Kion) The “merging process” of the three Italian Consortia started in September 2012 It was concluded in July 1st 2013 (last week!) 2.0 67 Members More than 700 employees (+ 150 Kion) The only Italian Interuniversity Consortium www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
What CINECA does Higher Education Scientific Research Solutions & Services for the University Administration Services for the Ministry of Education, University and Research (MIUR) Scientific Research High Performance Computing – FERMI: 2° in EU / 7° WW) Scientific Visualization & Interactive Virtual Environments Technological Innovation Data Center Information and Knowledge Management Services Health Care Systems Da aggiungere immagine primo blocco: u-gov, surplus, u-cloud, ministero www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
How we work with Universities Cineca Board of Directors Product Managers Board U-GOV & SURplus Restricted Board Customer Service Technical & Delivery Board Apps Road Map Tech University Customers Focus Groups Cineca Technical Board Requirements www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Solutions for HE = ERP = Best of Breed AU GW Authentication Gateway www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
SURplus: supporting the World of Research Collect institutional research output for evaluation and assessment purposes Measure research results for benchmarking Preserve ICT investments and maximize ROI Disseminate data to enhance impact and visibility www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Why Open Source? The adoption of open-source solutions allows the SURplus team to customize and enhance the source code depending on the Institutions’ needs. The OS community provides innovative, high-quality and safe software and it is challenging to work with & for them www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
SURplus: CINECA’ CRIS System An interoperable infrastructure made of different components Ingestion of data from any legacy systems adopted by an institution Maintenance of specific functional requirements, data model and preferred technologies at the level of applications Data warehouse and Business Intelligence tools to facilitate aggregations of data and the application of measurement parameters and algorithms www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
SURplus: Dimension Beginning of activities: 2004 9 institutions 22 institutional repositories Total modules: 77 www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
DSpace: SURplus’ Open Archive Module CINECA is a registered service provider at DuraSpace Long-term collaboration with DSpace community, since 2003 Manages collection and dissemination of research results Simplifies data collection’s processes Service Integration The OA Module, developed on DSpace: Upgrades are periodically released to the open source community www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
DSpace-CRIS: SURplus’ Expertise & Skills DSpace-CRIS: designed together with the Hong Kong University & released as open-source “dissemination of entities’ descriptions in the research environment which go beyond publications” www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
The information already exists in other database! IR as part of a CRIS system: what change? Professional support HA infrastructure Dedicated team Benefits: Strong deposit mandate More funding Issues to mitigate: IR become a critical application Author have a “requirements” perception Wasting time Late submission advocacy The information already exists in other database! Make the submission process easy www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
New first submission step Free search form Available providers: each provider is a spring service Main metadata common to all publication types (article, book, etc.) Title of the contribution Year Authors/Editors www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
New first submission step Lookup by unique identifier Each provider declares which identifiers is able to manage www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
New first submission step For each result providers are shown that match the record. Grouping is done via DOI www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Records from different providers are merged to get richer metadata Modal box publication details Records from different providers are merged to get richer metadata The system guesses a collection for the submission but the user can change it if required www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
When lookup fails the user can always proceed manually Manual submission When lookup fails the user can always proceed manually www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Format/provider must be specified by the user Batch import from external source Import data (identifiers or structured text) can be inputed manually or uploaded as a file Format/provider must be specified by the user www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Batch import from external source Request are processed: Inline for specific providers and/or within configured data limits Submitter can immediately complete the pre-filled submissions In a background process Submitter will receive a summary email with import result Pre-filled submissions are available as in-progress submission in the MyDSpace The legacy batch import feature for JSPUI has been already shared as pull request on GitHub, see DS-1252 www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Enhanced Describe step: showing metadata source www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Technical details original normalized Translation logic PubMed Lookup Provider WGET http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi ?db=pubmed&id=23297105&retmode=xml&rettype=full original normalized Translation logic Normalized Repository Translation logic Mapping file Split, aggregate fields Derive data ISSN Journal title … public class PubmedItem { private String pubmedID; private String doi; private String issn; private String eissn; private String journalTitle; private String title; private String pubblicationModel; private String year; private String volume; private String issue; private String language; private List<String> type; private List<String> primaryKeywords; private List<String> secondaryKeywords; … JAVA Bean PubMed record <bean name="pubmedLookupProvider" class=“...lookup.PubmedLookupProvider"> <property name="pubmedService" ref="pubmedService"/> </bean> implements SubmissionLookupProvider Mapping file Enhancer plugins arXiv Lookup Provider JAVA Bean <bean name="pubmedService" class=“...service.PubmedService"/> arXiv record Technical details DSpace Item Normalized record Mapping file public class PubmedLookupProvider extends ConfigurableLookupProvider public abstract class ConfigurableLookupProvider … Scopus Lookup Provider JAVA Bean Scopus record Mapping file www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Enhanced upload step Using the ISSN or EISSN provided in the describe step the upload form is improved showing on the right side the publisher policy from the Sherpa/Romeo database www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Enhanced upload step Access policy for the bitstream: Open access, embargo, intranet, etc. Deposit of fulltext to the national database for individual CVs www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
What is the problem? Get researcher aware (very) late submissions produce some issues for the repository both at technical and organization level: The system is subjected to periods of intense input activities. DSpace, but in general IR software, scales well for read operations less well for write operations IR staff involved in workflow get lot of task to perform in small period Get researcher aware Remind researcher about IR presence Intercept early new content www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
How we plan to mitigate the problem? Citation databases provide APIs to perform search (we already use them for the lookup) and in some cases they provide additional APIs or search filters/indexes to make more raffinated search and allow scanning of the database. The interesting filters/indexes are: Time based (much better if related to insertion in the citation database) Author ID (better if related to a «standard/common» identifier as ORCID) Affiliation Subject category www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Implementation idea Allow the researcher to store personal preferences about scanning: Enabled providers (e.g disable arXiv if you are not a physicist) Frequencies Subject categories filters AuthorIDs will be stored/retrieved from the Researcher profile. Subject categories could be proposed from previous items or researcher profile. www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
DSpace-CRIS: Researcher profile www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Who are the potential targets? ORCID Scopus Web of Science arXiv PubMed Central DBLP REPEC The Repository itself! www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
The repository as source of missing content? The submitter has to match authors of publication with the University staff to higthlight internal authors Sometimes matches are missing Othertimes matches are wrong (homonymous) External authors could become «internal» at some point in the future www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
The repository as source of missing content? Send email to internal «co-authors» when a submission is done prevent wrong attribution (and reduce duplication) Allow researcher to unclaim publications from her profile last chance to fix wrong attribution Allow researcher to claim publications fix missing attribution and/or engagement of new researcher The last two features are included in the DSpace-CRIS addon www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
This is the current status of the publication Current implementation: claim/unclaim publications in the repository This is the current status of the publication U Unlinked You can claim it A Active, simple claim S Make it a selected publication H Claim it but hide from you public profile www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
You can unclaim a publication Current implementation: claim/unclaim publications in the repository You can unclaim a publication U Unlink www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Current implementation: claim/unclaim publications in the repository www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Topics Some context: Integration of external services: CINECA a brief overview DSpace as part of a CRIS solution Integration of external services: Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. Publishers policy: Sherpa/Romeo Make the repository an active actor: Discovering missing content Improve Fulltext presence www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Improve fulltext presence Use the Sherpa/Romeo policy database to analyze repository content Use external database API to find an actual fulltext (arXiv, pubmed, ...why not the publisher version via library subscription?) Send email to researcher to validate found PDFs or ask for an «author» versions Use statistics to encourage upload www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
Sherpa/Romeo Statistics (Example) 51% ISSN 36% Not in Sherpa 24.000 items 32% green 21.000 items 7,3% have a fulltext… 5,3% open access www.cineca.it | Integrate external services in DSpace submission process | OR2013| July 2013
SURplus: prevision 2014 50+ institutional repositories (DSpace) 10 research portals (DSpace-CRIS) www.cineca.it | Innovative Open Source Technologies for a CRIS: SURplus | euroCRIS | May 2013
Andrea Bollini a.bollini@cineca.it Thank you! Andrea Bollini a.bollini@cineca.it SURplus - http://www.cineca.it/en/content/surplus DSpace-CRIS - http://cilea.github.com/dspace-cris