Building on the shoulders of Giants: the Scholarly Web

Slides:

Advertisements

Similar presentations

Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.

Advertisements

28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.

Tim Brody University of Southampton CiteBase Services 13/07/2001.

Institutional Repositories and Self-Archiving Crisis? What Crisis? Bill Hubbard SHERPA Project Manager University of Nottingham.

Repositories, Learned Societies and Research Funders Stephen Pinfield University of Nottingham.

Creating Institutional Repositories Stephen Pinfield.

Enlighten: Glasgows Universitys online institutional repository Morag Greig University Library.

Building Repositories of eprints in UK Research Universities Bill Hubbard SHERPA Project Manager University of Nottingham.

The Open Access landscape (and what might be over the horizon) Alma Swan Key Perspectives Ltd Truro London LEAP Open Access conference, 11 June 2007.

The Open Access landscape (and what might be over the horizon) Alma Swan Key Perspectives Ltd Truro London LEAP Open Access conference, 11 June 2007.

German Physical Society (DPG): Open Access in physics Regensburg March 2007.

Open Stirling: Open Access Publishing and Research Data Management at Stirling Monday 25 th March 2013 Michael White, Information Services STORRE Co-Manager/RMS.

Sunday October 28, www.eprints.org Tim Brody - Stevan Harnad -

Open Access. "There are many degrees and kinds of wider and easier access to this [peer reviewed journal]

Copyright Reform Should Not Be Made A Precondition For Mandating Open Access Stevan Harnad UQAM & U Southampton Berlin 14 nov 2008.

PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.

Electronic publishing: issues and future trends Anne Bell.

P. Boyce 1 Use of Astronomy’s Info System : The Highly Productive User Peter B. Boyce Maria Mitchell Association and Past Executive Officer American Astronomical.

Web of Science: An Introduction Peggy Jobe

National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center

Internet Research Finding Free and Fee-based Obituaries Online.

An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.

Imagine the Potential The Benefits of JAMA & Archives Online.

Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.

Maynooth’s ePrints & eTheses archive Health Sciences Libraries Group Suzanne Redmond Maloco eprints.nuim.ie.

Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.

4th March 2002Tim Brody 1 A joint JISC/NSF project.

How Scientists Use Journals: Electronic and Print Carol Tenopir Donald W. King

Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi

REF: Open access requirements Directorate of Academic Support December 2015.

Open Access, the next REF and the CRIS Rowena Rouse Scholarly Communications Manager March 2016.

Databases vs the Internet Coconino Community College Revised August 2010.

Open Access to Scholarly Publications A Brief Introduction.

Introduction to SHERPA RoMEO and its Significance for Publishers

Getting Academic Works Published in Peer-Reviewed Journals

Opening access to quality research materials

Databases vs the Internet

BIO1130 Lab 2 Scientific literature

Using Open Access to Increase Personal Internet Presence

Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals Wikis are collections of searchable,

Promoting and Preserving FIU Research and Scholarship

Databases vs the Internet

Researching for your Literature Review

Open Access, Research Funders, Research Data, and the REF

The Hosted Model Charl Roberts Good morning again,

Institutional Repository and Friends

Education of a scientist video

Linking persistent identifiers at the British Library

Evaluating Sources.

Open Access to your Research Papers and Data

Getting your research noticed

Zetoc: Electronic Table of Contents from the British Library

Funding body requirements

DPubS: An Open Source Electronic Publishing System

Open access in REF – Planning Workshop

Using The Troy Library for Academic Research

Zetoc: Electronic Table of Contents from the British Library

Digitometric Services for Open Archives Environments

Benefits and Problems Facing Them

IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.

Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.

BIO1130 Lab 2 Scientific literature

….part of the OSU Libraries' suite of digital library tools…

Interoperable Repository Statistics

Open access in the post-2014 REF: an overview

Networked Information Resources

Lars Björnshauge, Lund University Libraries

RCSI institutional repository rcsi

Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani

Presentation transcript:

Building on the shoulders of Giants: the Scholarly Web Tim Brody Intelligence, Agents, Multimedia Group University of Southampton 28/11/2018

This Talk The “Research Literature” The Open Access Literature Why Open Access? Open Archives Initiative Citebase Search Open Access effect on research Summary & URLs This talk is my impression of how the research world could be improved, and tools that I have produced to do that. What I am concerned with is the research literature: reports written by researchers, that are published as part of the public record. While the majority of research literature is accessible only through paying access charges (to cover the cost of printing, distribution, etc.), the Internet gives us a new possibility for Open Access literature. Very little of the current research literature is Open Access. By Open Access I mean that it is free at the point at which the user views the full-text. But it is the authors that need to be convinced of the benefits in order to get Open Access to the research literature. The Open Archives Initiative provides a technical framework for the way we would like to see Open Access literature implemented, down scaling the IPR holders to simply providing the information, while others develop the services and add-ons (although, of course the IPR holder may also provide services, but they shouldn’t use the IPR monopoly to gain a service monopoly). Citebase Search is my own OAI service provider, that provides a citation database add-on to e-print archives. Using the citation database in Citebase we can look at the effect of Open Access e-print archives on how research is reported (and perhaps how research is done). And lastly I’ll give some URLs for more information on what we do. 28/11/2018

The Research Literature The grey literature Technical reports Monographs Presentations Royalty literature Books Refereed journal corpus What is the Research Literature? The research literature is the way that researchers communicate ideas and results with one another. How this communication is done depends a lot on the subject. For example Physicists make use of a lot of technical reports (hence the success of the LANL tech report server). This informal communication – or ‘grey’ literature – also includes monographs and presentations, but are all widely used means to communicate ideas and results. Much of the grey literature is already Open Access. While the Sciences tend to rely on reports, Humanities produce significant numbers of books. It is unlikely that books – where the author derives royalty payments – will ever be Open Access, as it isn’t in the author’s interest. However, where an author’s academic assessment may rely on their research output, we still need the ability to assess the impact of royalty literature (perhaps by providing Open Access to the book’s references). Of primary interest to researchers – certainly within the Sciences – is the peer reviewed literature. This is mainly the journal literature, but also, particularly in my own field of Computer Science, conference proceedings. And it is this literature that we are most interested in converting to an Open Access model. 28/11/2018

The Refereed Journal Literature Written without the expectation of royalties Akin to “Advertising” for the authors and their work Reviewed for free by peers Est. 20,000 Peer-reviewed Journals B.L. archives 60,000 serials Est. 2,000,000 Articles Annually What makes the refereed literature different to other published material is that the authors do not expect to be paid for the reports that they produce. In fact, it is quite the opposite. While authors of novels and artists are concerned about theft, it is in the interest of the research authors to have their work as widely read and copied as possible (and hence be more recognised, greater impact and so on). In this way the refereed journal literature is more akin to advertising for the author, than other forms of publication. But what distinguishes the refereed literature from advertisements is that it is reviewed and certified by peer experts, and without that certification (and the reputation of the journal it is published in), the report will be unlikely to be read and built on by other researchers. The journal literature is a huge volume of work, a volume of work that researchers only have partial access to (given an estimated 20,000 journals most Universities only have access to 4 or 5 thousand). 28/11/2018

Pre-Print reviewed by Peer Experts – “Peer-Review” 12-18 Months Impact cycle begins: Research is done Researchers write pre-refereeing “Pre-Print” Submitted to Journal Pre-Print reviewed by Peer Experts – “Peer-Review” 12-18 Months Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Although I’m sure everyone is familiar with the ping-pong of journal publication, it probably helps to define the system that I’m talking about, and how Open Access may help to improve it. Researchers can access the Post-Print if their university has a subscription to the Journal New impact cycles: New research builds on existing research

Open Access Literature Research Archives (“self-archiving”) 250,000 arXiv.org 500,000 citeseer 1,000s in institutional & other repositories Open Access Journals BioMed Central Time-delayed access PubMed Central 500,000 HighWire Press Personal Web pages So how much of the journal literature is Open Access? The most high profile of the Open Access repositories given by the self-archiving/open access proponents are arXiv.org and Citeseer. As these are collections built from “self-archived” reports (i.e. author contributed), it is difficult to know whether the author contributed report has been published in a peer-reviewed journal – certainly my experience of Citeseer is that it contains a lot of informal technical reports. Even so, these archives provide a hint to how Open Access literature could work. Open Access publication falls under two branches, either upfront charges of which Bio-Med Central’s e-journals are a good example, or time-delayed access. (It’s worthwhile noting that virtually no money is made charging access to literature 6 months after it is released, hence many publishers are happy to release their articles after a time delay as a way to reduce the pressure to provide Open Access) A large and unknown amount of literature exists in informal, personal archives. But without some structure behind personal archives it is difficult for users and services alike to tell the difference between technical reports and peer-reviewed literature. Regardless of how access may be provided, the current literature available free at the point of access is a fraction of the literature produced annually. 28/11/2018

“Skywriting”: All research, accessible to all potential users, anywhere, anytime Impact cycle begins: Research is done Post-Print self-archived to University’s Eprint Website Researchers write pre-refereeing “Pre-Print” Pre-Print self-archived to University’s Eprint Website Submitted to Journal Pre-Print reviewed by Peer Experts – “Peer-Review” New impact cycles: Self-archived research impact is greater (and faster) because access is maximized (and accelerated) 12-18 Months Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Now adding on one method of Open Access – author self-archiving – we can see how it fits into the existing publication cycle. Researchers can access the Post-Print if their university has a subscription to the Journal New impact cycles: New research builds on existing research 28/11/2018

Why Open Access? Maximise research impact through maximised access Efficiency ADS Est. to provide $250 million benefit to astronomy Continuous and comprehensive assessment Periphery benefits Publicly funded research publicly accessible 3rd World Access (and even some 1st World!) Easier to identify plagiarism (do a Google search!) So how do we convince the stakeholders that Open Access is a good idea? If authors publish papers for impact (hence improve their reputation, get grants, tenure and so on), then that impact must be limited by the amount of access there is to their work. So to maximise impact authors need to maximise access, so both publishing in the high impact journals (which certifies the work as being high quality), but also allowing as many eyeballs as possible to see their work. Speed and access is an important component in communication. The slower the communication, the less access, the more difficult it is for ideas to spread and develop. A recent paper estimated that the Astrophysics Data System provided a 250 million dollar benefit to worldwide astronomy, a system that joins up all astronomy publications and data sources. There are also strong arguments for other stakeholders for Open Access. Currently there is the anomaly that tax payers fund public research, and yet can not access the results of that research (the peer-reviewed literature). Open Access would allow researchers in the 3rd World to have equal access to the research literature. 28/11/2018

Separate Content from Services On the web: use a full-text search engine Research literature: A&I, publisher, library, aggregator, journal contents, society … Create the Scholarly Web: Many of the new e-journals and corporate publishers only provide access to their literature through their own services. That means if you want access to the literature, you must also pay for search engines, Web editors, and so on. I think it would be a much more honest, and competitive, system if the content was separated from services. While the bodies that serve research papers would charge for the peer-review service and to maintain the quality of writing and formatting, the add-on services (search engines, navigation etc.) can be created by anyone and competed with on the basis of quality rather than content. Given this “Scholarly Web” all services would have access to all the literature. Doing a comprehensive literature search would be as simple as searching the Web with Google, compared to having to search many sources because every service has access to different subsets of the research literature. Coupled with citation links and the world could have a comprehensive “Google” for the research literature. 28/11/2018

OAI Protocol for Metadata Harvesting Service requests metadata records by All records Created/updated between given dates Subset Repository returns metadata records Metadata record is XML To be OAI compliant must support at least Dublin Core The OAI-PMH is a Web protocol for transferring metadata between repositories (for example e-print archives) and services. A service harvests records by requesting records, with the option of only getting new or updated records since a certain date, or by a subset (if the repository exposes sets). The records returned are a combination of a header (with repository-unique identifier and datestamp) and a metadata record in XML. Any metadata that can be XML encoded can be exported, although to be OAI compliant the repository must support at least Dublin Core (as well as any other formats). 28/11/2018

OAI-PMH Separates content (repositories) from services Open protocol Cheap to implement, flexible in use Scalable? 4 million records from OCLC HTTP-like caching techniques (OAI-PMH can be used in closed systems) 28/11/2018

Citebase Search: OAI Service 1000 users per-day (“visits”) 7000 hits 260,000 full-text records 6 million references Of which 1.3 million linked to full-text 3 million Web download hits (uk.arXiv.org) Some general information on Citebase. Users are predominately following links from arXiv.org abstract pages to get the reference and citations-to for the article they are looking at. The Web download logs are used to provide a “reading” measure for arXiv articles, and to compare between that read impact and citation impact scores. 28/11/2018

Citebase Search This is an overview of what Citebase does. Metadata is harvested using the OAI-PMH (the “discovery” phase). The metadata harvested via OAI is the title, abstract, authors, and citation. For each article found the full-text is retrieved, reference list extracted and stored. The references are then linked by looking up the equivalent citation in the metadata store. Users can then search and navigate the citation database through a Web interface. Citebase also provides an OAI interface that allows the metadata with linked references to be harvested. 28/11/2018

Citebase Search This is the user interface for Citebase, which has the familiar meta-field search but also the ability to rank results by a number of criteria including citation impact. 28/11/2018

Citebase Search The abstract page shows the usual title/authors/abstract and some analysis of the current article. The graph shows over time when the paper has been cited and when it has been downloaded. 28/11/2018

Citebase Search: Navigation by Citation Links Article with reference list Future Reference link Related Current Article Co-cited Following the abstract are links to related pages by citations. These links can go backwards in time using the reference list, forwards in time by what has cited me, and sideways by either related or co-citation. Related papers are papers that have a similar reference list – often where an author has used the same references more than once! Co-cited is where two papers have been cited next to each other, the same as author co-citation. However co-cited papers can only be found for articles that have been cited, hence can’t be used for new articles. Past 28/11/2018

Citebase Search cites cites 28/11/2018 This is the reference list, as parsed from the full-text. “eprint” takes the user to the Citebase abstract page of the cited article, journal are bespoke links for the American Physical Society journals. 28/11/2018

Citebase Search cites cites 28/11/2018 Articles that have cited the current article, following these links will take the user towards newer papers. 28/11/2018

Citebase Search “Co-cited” 28/11/2018 And co-cited articles. The development version of Citebase also includes Related articles. 28/11/2018

The Effect of Open Access Based on arXiv.org Oldest and most comprehensive online archive Correlation of Citation Impact with Web Impact (downloads) Effect of Open Access on citation behaviour As well as using Citation links to navigate the literature, they can be used to find patterns in the research literature. 28/11/2018

This graph divides papers into 3 sets by citation impact: bottom, middle 50%, and top quartile. This is a fairly arbitrary division as the distribution of citations is a exponential decay, that is decreasing numbers of papers get ever increasing numbers of citations. The hits to papers in each subset is then plotted against the time delay between the paper being uploaded and then downloaded. High impact papers receive more downloads, and over a longer period of time. That they receive more hits over time is a reflection of users following citations to articles, whose delay is determined by the research cycle. 28/11/2018

Citation Latency The duration of the research article cycle can be estimated by looking at the delay between articles appearing, and then being cited. We refer to this as “Citation Latency”. Within arXiv.org this is measured by taking the repository generated datestamps. Of course authors will read articles outside of arXiv.org, leading to different real-world latencies. 28/11/2018

We can look at how the citation latency has changed over the lifetime of arXiv.org by separating the papers into subsets by the year of the cited article, so articles deposited in 2002 will have at most 19 months of citations. If we compare an early, 1994 (the 4th line up from the bottom), to a recent year, 2001 (4th line from top), you can see that the ramp up to the highest rate of citations (the “peak” citation rate), falls from a period of 12 months to only 2-3 months. This suggests that increased use of arXiv.org has reduced the duration of the research cycle between an article being posted and then cited. As this graph is un-normalised, the lines get higher as the number of articles increases in arXiv. 28/11/2018

Conclusions High impact papers are read more (and this can be measured online) Web downloads may be an pre-indicator of impact Faster access leads to reducing Citation Latencies Hence faster research cycles, higher impact, and more productivity 28/11/2018

Summary The Web makes Open Access research literature possible, and hence more effective scholarship Services compete without holding the literature hostage OAI allows repositories to concentrate on getting and storing the literature Citebase Search provides citation navigation for OAI archive(s) Or anyone else who wants to provide a service 28/11/2018

The Last Slide Tim Brody tdb01r@ecs.soton.ac.uk Citebase Search http://opcit.eprints.org/ (papers & presentations) Citebase Search http://citebase.eprints.org/ EPrints.org (advocacy, answers & software) http://www.eprints.org/ I am a doctoral student in the Intelligence, Agents, Multimedia Group at the University of Southampton working with digital library systems: Citebase Search, E-Prints UK, TARDIS & OAI. Prof. Stevan Harnad <harnad@ecs.soton.ac.uk> 28/11/2018