Download presentation
Presentation is loading. Please wait.
Published byJoleen Waters Modified over 9 years ago
1
Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz 1, Peter Burnhill 1 & Herbert Van de Sompel 2 1 EDINA, University of Edinburgh, 2 Los Alamos National Laboratory
2
The Project Team 2013 – 2015, funded by the Andrew W. Mellon Foundation Los Alamos National Laboratory: Research Library: Herbert Van de Sompel Harihar Shankar, [Martin Klein, Rob Sanderson] University of Edinburgh: Language Technology Group: Claire Grover, Beatrice Alex, Colin Matheson, Richard Tobin, [Ke “Adam” Zhou] EDINA * : Peter Burnhill, Muriel Mewissen (Project Manager), Tim Stickland, Richard Wincewicz, [Neil Mayo] Centre for Service Delivery & Digital Expertise
3
Overview 1.Introduction 2.Evidence 3.Remedy
4
1. Introduction
5
Reference Rot Links to Web at Large resources are subject to Reference Rot. This is a combination of two factors: Link Rot: Link stops working e.g. HTTP 404 “Not Found” Content Drift: Linked content changes over time Possibly to the extent that it is no longer representative of the content that was initially referenced
6
2. Evidence
7
Articles that Link to Articles & to Web At Large Resources (PMC) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
8
Articles that Link to Articles & to Web At Large Resources (Elsevier) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
9
Articles with URI References (PMC) Articles 479,194 with URI references 399,005 with URI references to articles 240,857 with URI references to Web at Large 156,160 Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
10
Link Rot (PMC) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
11
Link Rot (Elsevier) Martin Klein et al. (2014) Scholarly context not found http://dx.doi.org/10.1371/journal.pone.0115253
12
Links from arXiv, Elsevier, PMC to TLD Targets Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
13
Grey is Link Rot – Referenced Content Not Accessible Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
14
Grey is Not Archived - Referenced Content Lost Martin Klein et al. (2014) Scholarly context not found. In: PLOS ONE http://dx.doi.org/10.1371/journal.pone.0115253
15
Content Drift – http://dl00.org 20002004 20052008 (a) Dynamic content values on webpage change over time (b) Static content but very different (often unrelated) web pages
16
3. Remedy
17
Create Snapshots of Referenced Resources Various web archives support on-demand creation of snapshots of URIs (manual, API): archive.today Internet Archive perma.cc webcitation.org When creating snapshots, maintain: Original URI Snapshot URI Date/Time of snapshot
18
Create Snapshots of Referenced Resources Snapshots can be created at various stages. The closer to the moment of referencing, the better the image captured. StageActorSnapshot Quality PreparationAuthor/reference toolbest Submission /Issue Editor/manuscript system good Publication Aggregator/ publisher platform ok Post-publication Librarian/IR, journal archive better than nothing
19
Authoring - Zotero Plugin Demonstrator Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero for pro-active archiving and temporal references https://www.youtube.com/v/ZYmi_Ydr65M%26vq
20
Publication - OJS
24
Publication - HiberActive Service Demonstrator Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
25
Reference Resources Robustly When referencing resources include: Original URI – Allows the user to revisit the URI as it is at the time of reading, if the URI is still operational Snapshot URI – Allows the user to visit the snapshot, if one was created, and if the web archive in which it was created is still operational Date/Time – with the original URI allow the user to visit any snapshot created around the Date/Time in any web archive around the world (using Memento infrastructure) (2015) Robust Links - Motivation http://robustlinks.mementoweb.org/about/
26
Reference Resources Actionably When referencing resources, use Link Decorations to convey Original URI, Snapshot URI, Date/Time < a href=“http://www.stanford.edu” data-versionurl=“http://archive.is/FAy6o” data-versiondate=“2014-08-15” > <a href=“http://www.stanford.edu” data-versiondate=“2014-08-15” > Herbert Van de Sompel et al. (2015) Robust Links - Link Decorations http://robustlinks.mementoweb.org/spec/ <a href=“http://archive.is/FAy6o” data-originalurl=“http://www.stanford.edu” data-versiondate=“2014-08-15” >
27
Robust Links Using Link Decorations, JavaScript, Memento API Demo - http://robustlinks.mementoweb.org/demo/uri_references_js.html robustlinks.js - https://github.com/mementoweb/robustlinks
28
Activate Robust Links There are no Link Decorations, currently. But there is an article publication date: Express the article publication date in an actionable manner (‘datePublished’ or ‘dateModified’ Schema.org properties) in HTML pages that contain URI references Tailor robustlinks.js to exclude links to articles Inject robustlinks.js in HTML pages that contain URI references
29
Users Follow Robust Links into Web Archives The combination of the referenced URI and the article publication date: Leads users to a snapshot in a web archive, created as close as possible to the article publication date Addresses link rot Addresses content drift
30
Create Archive Copies When ingesting new content into the platform: Parse for URI references Create snapshots in web archives of select URIs For these URIs, use Link Decorations in HTML to convey: original URI snapshot URI snapshot Date/Time
31
Users Follow Robust Links into Web Archives The Link Decorations: Lead users to the created snapshot, if the web archive is operational Lead users to a snapshot in any web archive, created as close as possible to the snapshot Date/Time Addresses link rot Addresses content drift
32
Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz 1, Peter Burnhill 1 & Herbert Van de Sompel 2 1 EDINA, University of Edinburgh, 2 Los Alamos National Laboratory http://hiberlink.org #hiberlink
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.