Towards a Dataset for the Development of Alternative Impact Metrics: Notes from the DiggiCORE project Petr Knoth CORE (Connecting REpositories) Knowledge.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
Institutional repositories and SHERPA Stephen Pinfield University of Nottingham.
Repositories, Learned Societies and Research Funders Stephen Pinfield University of Nottingham.
Open Stirling: Open Access Publishing and Research Data Management at Stirling Monday 25 th March 2013 Michael White, Information Services STORRE Co-Manager/RMS.
OpenAIRE: the European Scholarly Communication Infrastructure OCLC Research Workshop Libraries and Research: Supporting Change/Changing Support June 10.
Throwing Open the Doors: Strategies and Implications for Open Access Heather Joseph Executive Director, SPARC October 23, 2009 Educause Live 1.
Open Access. "There are many degrees and kinds of wider and easier access to this [peer reviewed journal]
Open Access in Summary Amos Kujenga EIFL-FOSS National Coordinator, Zimbabwe Lupane State University, October 2013 Lesotho College.
Open Access to Research in the United Kingdom Organic.Edunet Conference, Budapest Jackie Wickham Open Access Adviser Centre for Research Communications.
11th euroCRIS Strategic Seminar Brussel, Sep 9 – Discovery Metadata Friedrich Summann COAR / Bielefeld University Library.
OPEN ACCESS PUBLICATION ISSUES FOR NSF OPP Advisory Committee May 30, /24/111 |
1/36 CORE: Improving access and enabling re-use of open access content using aggregations Petr Knoth CORE (Connecting REpositories) Knowledge Media institute.
Institutional repositories for research materials Sally Rumsey Project Manager: Institutional Repository University of Oxford.
Institutional repositories and libraries : being visible Nor Edzan Che Nasir Library University of Malaya.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
WORLD BANK Publications The reference of choice on development The Promise, and Challenge, of Implementing Open Access at the World Bank Carlos Rossel.
What is open access (OA) publishing? Why is it important? What are the pros and cons of OA? How does it relate to library and information science?
Developing Infrastructure to Support Closer Collaboration of Aggregators with Open Repositories Dr. Nancy Pontika & Dr. Petr Knoth COnnecting Repositories.
Connecting Repositories Zdenek Zdrahal Knowledge Media Institute The Open University, UK UNESCO, Paris, 26 February 2013.
Helena Asamoah-Hassan, PhD
INFORMATION SOLUTIONS Mary L. Van Allen 21 September 2005 Open Access Journals and citation patterns International Seminar on Open Access for Developing.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
Open Access Catherine Boden, Health Sciences Liaison Librarian David Fox, Head of Monographs Presentation to the Musculoskeletal Journal Club College of.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
Digital/Open Access repositories Paul Sheehan Director of Library Services DCU HEAnet National Networking Conference Athlone 11 th November 2005.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Creating Change in Scholarly Communications Heather Joseph Executive Director, SPARC September 21, 2009 TCAL, Austin, TX.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Open Access - an introduction, Aleppo, December Open Access – an introduction Ian Johnson.
Open Access: Maximizing the Impact of Research and Scholarship Heather Joseph Executive Director, SPARC February 21, 2013.
Open access and subscription journals: implications for low- and middle-income countries Moderated by Subhasree Raghavan Presented by Emma Veitch and Paul.
The Current Landscape of Open Access Heather Joseph Executive Director, SPARC ALA Midwinter Meeting Seattle, WA January 26, 2013.
10/23/03 Trieste Round Table Meeting Jörgen Eriksson Lund University Libraries Head Office Directory of Open Access Journals DOAJ.
Brian Hole COASP, Riga, 20 September 2013.
REF: Open access requirements Directorate of Academic Support December 2015.
Emerging Trends in Scholarly Communication Heather Joseph Executive Director, SPARC ALA Midwinter Meeting Philadelphia, PA January 26, 2014.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
RepNet/RSP Event: “Supporting and Enhancing Your Repository” 21 st January, 2013 – BCS, London Andrew Dorward.
Open Access (OA) : a summary for 2006 Joanne Yeomans CERN Scientific Information Group (Presentation for the CESSID students 12 th May 2006)
Digital Repository DDUB Learning and Research Resources Center (CRAI) University of Barcelona 2016.
Open Access to Scholarly Publications A Brief Introduction.
Introduction to SHERPA RoMEO and its Significance for Publishers
NRF Open Access Statement
The OpenAIRE Catalogue of Services
OpenAIRE in 8 Minutes Tony Ross-Hellauer State and University Library,
Promoting and Preserving FIU Research and Scholarship
Impact of the Alternative e-Publishing Model: From Open Access Resources & Self-Publishing toward Librarian’s New Challenges 溫達茂 飛資得資訊 中華民國九十三年十一月.
The OpenAIRE infrastructure
An Open Knowledge & Research Information Infrastructure
Athabasca University’s Institutional Repository
Education of a scientist video
Open is not enough! Sustainability and Innovation in Scholarly Communication
University of Nigeria, Nsukka
Jay Bhatt Drexel University Libraries
How are Jisc-managed services/solutions supporting the HEFCE policy?
Access  Discovery  Compliance  Identification  Preservation
Find support in.
Open Access in the humanities research in Finland and Sweden
Digitometric Services for Open Archives Environments
Open in order to maximise visibility
Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.
OPEN ACCESS POLICY Larshan Naicker Rhodes University Library
The Next Generation of IRs – enabling closer cooperation & networking
OpenDOAR and ROAR RSP Services Day, Nottingham, 23rd Apr.2008
Lars Björnshauge, Lund University Libraries
Presentation transcript:

Towards a Dataset for the Development of Alternative Impact Metrics: Notes from the DiggiCORE project Petr Knoth CORE (Connecting REpositories) Knowledge Media institute The Open University @petrknoth, #diggicore In the previous keynote speaches , we have heard a lot about how reuse of open access is important and what benefits it can provide. Heather Joseph said in her keynote clearly that open access = access + reuse. I believe that the the Open Access community, has been over the last years very succesful in the former, but much less sucessful in the later (reuse). I have a background in natural language processing and started the development of the CORE open access aggregation system out of frustration …. As I couldn’t get hold of content that was open. In this presentation, I will talk about a dataset we have created in the DiggiCORE project and want to share with the community. I am hoping this dataset will be used by people to do text-mining and also for research on new impact metrics.

What is Open Access exactly? By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. [BOAI, 2002] Let me know revisit some of the goals of OA, and I am sure this is familiar. I would like to show on this the importance of building an infrastructure that supports the reuse of OA content.

COAR: About harvesting and aggregations … “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto] COAR and CORE are two diferent things.

SPARC’s position paper on IRs “For the repository to provide access to the broader research community, users outside the university must be able to find and retrieve information from the repository. Therefore, institutional repository systems must be able to support interoperability in order to provide access via multiple search engines and other discovery tools. An institution does not necessarily need to implement searching and indexing functionality to satisfy this demand: it could simply maintain and expose metadata, allowing other services to harvest and search the content. This simplicity lowers the barrier to repository operation for many institutions, as it only requires a file system to hold the content and the ability to create and share metadata with external systems.”

Outline The CORE aggregation system and the DiggiCORE project How to data mine open access dataset from CORE and why

Outline The CORE aggregation system and the DiggiCORE project How to data mine open access data from CORE and why In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

The objective of DiggiCORE Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR). DiggiCORE is part of the CORE damily of projects. The project will finish in April next year, but the services will be provided further. This applies also for Open Access journals.

The DiggiCORE project Partners Advisory Board

The CORE aggregator

CORE statistics Content 13M+ records 500+ repositories 1M+ full-texts UK national aggregator, project started 2011 Full-text aggregator (not just metadata) 0.5 million monthly visits, but only 150k six months ago Placed among Top 10 search engines for research that go beyond Google [JISC, 2013] Listed among Top 100 Thesis and Dissertation Resources Used by many researchers and organisaitons, including the European Library and UNESCO If you think this number is small you are right, but this is not necessarily the issue of CORE but the lack of interoperability and standardisaiton in his area. We are working with repositories to improve the situation.

CORE supports a three access levels architecture Raw data access. Transaction information access. Analytical information access.

CORE supports a three access levels architecture Raw data access. Developers, DLs, DL researchers, companies … Transaction information access. Researchers, students, life-long learners … Analytical information access. Funders, government, bussiness intelligence … Add a figure here

CORE supports a three access levels architecture Raw data access. Developers, DLs, DL researchers, companies … Apps: CORE API, CORE Data Dumps Transaction information access. Researchers, students, life-long learners … Apps: CORE Portal, CORE Mobile, CORE (recommendation) Plugin Analytical information access. Funders, government, bussiness intelligence … Apps: Repository Analytics, CORE Policy Compliance Analytics Add a figure here

CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories Offers faceted search

CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories

CORE Applications CORE Plugin – A plugin to system that recommendations for related items.

CORE Applications CORE Plugin – A plugin to system that recommendations for related items.

CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers).

CORE Applications Policy Compliance Analytics (under development) – Tool to support the implementation and monitoring of the UK HEFCE OA policy.

Outline The CORE aggregation system and the DiggiCORE project How to data mine open access data from CORE and why In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Two ways to access the CORE dataset CORE API - To develop software on top of CORE Data dumps – To experiment with the aggregated data In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

CORE API Enables external systems to interact with OA data (JSON or XML) Search, download metadata and cotent Content recommendation Citation references Statistics … Used by: Libraries, Institutional repositories, developers In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Data dumps Cleaned and enriched with additional information Distributed as two large zip files: metadata + full-texts Created as part of the Digging into Connected Repositories (DiggiCORE) project In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

DiggiCORE networks Three networks: (a) semantically related papers, (b) citation network, (c) author citation network

Data model

Data model – expressivness Basic article metadata (Dubli Core, BIBO) Related articles information (MuSim) Information about authors (BIBO, FOAF) Identification of the full-text content (MPEG21-DIDL) Citation infomration (BIBO)

Examples of usage Author disambiguation Mining URLs from papers to detect trends Tagging of chemical compounds for image retrieval Citation analysis Content recommendation Detecting collaboration patterns of scientific communities Monitoring of OA growth Any form of text or data mining … API useful for services and data dumps for offline experiments In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Why to use it? It is only OA, thus you can legally mine it … You can redistribute it: essential for reproducible research Very large and growing Kept up-to-date Ability to rerun experiments with new data In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Why to use it? Open infrastructure for open science Not owned or managed by a for profit company => Ability to run your own services = new opportunities and no give away of your research to commercial companies In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

How to be harvested by CORE? OAI-PMH compliant repository/journal Repository content should be OA (BOAI definition of OA) The repository ideally registered in at least one of the OA registries (OpenDOAR, ROAR), journal registered in DOAJ Repository compliant with good practises of content referencing In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Conclusions Researchers who want to mine content or build services that can rely on aggregators can get access to the CORE dataset Researchers can deploy their solutions, not just rely on commercial providers. In this paper, we use the term institutional repositories (which are the main interest of the Open Repositories conference), to refer also to subject-based repositories or archives or systems for depositing research outputs used by open access publishers. As a result, the conclusions of this paper and recommendations are equally valid for both the green (self-archiving) and gold (OA publishing) routes to OA.

Thank you! Open access needs freely exploitable datasets

References 1/2 [BOAI, 2002] Budapest Open Access Initiative. (2002) http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations [Crow, 2002] Crow, R. (2002). The case for institutional repositories: a SPARC position paper. ARL Bimonthly Report 223. [Knoth & Zdrahal, 2012] Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives, http://dx.doi.org/10.1045/november2012-knoth [Konkiel, 2012] Konkiel, S. (2012) Are Institutional Repositories Doing Their Job? https://blogs.libraries.iub.edu/scholcomm/2012/09/11/are-institutional-repositories-doing-their-job/ [Laakso & Bjork, 2012] Laakso, M., & Björk, B. C. (2012). Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Medicine, 10(1), 124. I can specify the differences between this and RIOXX

References 2/2 [Morrison, 2012] Morrison, Louise (2012) 5 reasons why I can’t find Open Access publications. http://mmitscotland.wordpress.com/2012/08/06/5-reasons-why-i-cant-find-open-access-publications-2/ [OAI-PMH v2.0, 2008] The Open Archives Initiative Protocol for Metadata Harvesting Version 2.0 (OAI-PMH), Impementation Guidelines (2008). http://www.openarchives.org/OAI/openarchivesprotocol.html [ResourceSync draft, 2013] ResourceSync protocol draft. 2013 http://www.niso.org/workrooms/resourcesync/ [Salo, 2008] Salo, D. (2008). Innkeeper at the roach motel. Library Trends, 57(2), 98-123. [Van de Sompel et al, 2004] Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S. (2004). Resource harvesting within the OAI-PMH framework. D-lib magazine, 10(12), 1082-9873. I can specify the differences between this and RIOXX