Archival HTTP Redirection Retrieval Policies Temporal Web Analytics Workshop 2013, Rio De Janiro Ahmed AlSum, Michael L. Nelson Old Dominion University.

Slides:



Advertisements
Similar presentations
A Web-Based Resource Model for eScience: Object Reuse & Exchange 2008 Microsoft eScience Conference Indianapolis, December 8, 2008.
Advertisements

Hiberlink is funded by the Andrew W. Mellon Foundation Investigating Reference Rot in Web-Based Scholarly Communication Martin Klein Los Alamos National.
Its all about sharing Sylvia van Peteghem, Ghent University.
Towards Building a Collection of Web Archiving Research Articles Brenda Reyes Ayala and Cornelia Caragea University of North Texas
Using the Memento MediaWiki Extension to Avoid Spoilers Shawn M. Jones Old Dominion University.
Using OAI-PMH for Resource Exchange OAI Metadata Harvesting Workshop, JCDL 03 Michael L. Nelson, Terry L. Harrison Old Dominion University Norfolk VA
Open Annotation Collaboration Rob Sanderson, Herbert Van de Sompel DMSS Meeting, May 14-15, Stanford, CA Robert Sanderson –
Operational Auditing--Fall Evidence = Support n Auditors collect information n Evidence should be sufficient, competent, relevant & useful n Methodology.
© 2001 Franz J. Kurfess Introduction 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
THIS IS THE TITLE OF MY POSTER. THIS IS THE TITLE OF MY POSTER NAMES, NAMES, & NAMES This is my abstract. This is my abstract. This is my abstract. This.
Prototypes of pro-active approaches to support the archiving of web references for scholarly communications Richard Wincewicz 1, Peter Burnhill 1 & Herbert.
EVALUATING THE TEMPORAL COHERENCE OF ARCHIVED PAGES SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY IIPC 2015 APRIL 27 – MAY 1, 2015 STANFORD.
The Water Cycle Task Process Evaluation Conclusion Introduction
July 25, 2012 Arlington, Virginia Digital Preservation 2012warcreate.com WARCreate Create Wayback-Consumable WARC Files from Any Webpage Mat Kelly, Michele.
Memento Update CNI Task Force Meeting, Spring Memento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.
Persistent Annotations Deserve New URIs Abdulla Alasaadi Old Dominion University Michael L. Nelson JCDL 2011 Ottawa,
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAIResource Software Her This work supported in part by the.
Web HTTP Hypertext Transfer Protocol. Web Terminology ◘Message: The basic unit of HTTP communication, consisting of structured sequence of octets matching.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013.
ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory.
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University, USA {sainswor, aalsum, hany, mweigle,
Visualizing Digital Collections at Archive-It Michele C. Weigle, Michael L. Nelson Web Sciences and Digital Libraries (WS-DL) Lab Department of Computer.
Profiling Web Archives Michael L. Nelson Ahmed AlSum, Michele C. Weigle Herbert Van de Sompel, David Rosenthal IIPC General Assembly Paris, France, May.
Web Server Design Week 4 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/03/10.
Инвестиционный паспорт Муниципального образования «Целинский район»
Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson TPDL 2011 Berlin,
(x – 8) (x + 8) = 0 x – 8 = 0 x + 8 = x = 8 x = (x + 5) (x + 2) = 0 x + 5 = 0 x + 2 = x = - 5 x = - 2.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 1 Hiberlink – Towards Time Travel for the Scholarly Web Martin.
Profiling Web Archive Coverage for Top-Level Domain & Content Language Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, and Herbert Van de Sompel International.
Open Archives Initiative Object Reuse & Exchange Resource Map Discovery Michael L. Nelson * Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Robert Sanderson,
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Thumbnail Summarization Techniques For Web Archives Ahmed AlSum * Stanford University Libraries Stanford CA, USA 1 Michael L. Nelson.
OAI Object Reuse & Exchange: Atom Serialization Nordbib Workshop, September , Stockholm, Sweden OAI-ORE: Atom Serialization The ORE Editors are:
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Availability and Persistence of Web References in D-Lib Magazine Frank McCown, Sheffan Chan, Michael L. Nelson and Johan Bollen Old Dominion University.
Hiberlink is funded by the Andrew W. Mellon Foundation Investigating Reference Rot in Web-Based Scholarly Communication Martin Klein Los Alamos National.
Hiberlink is funded by the Andrew W. Mellon Foundation The Missing Link Proposal #hiberlink #memento Herbert.
CNI Task Force Meeting April 7, 2008 OAI-ORE Project Briefing David Reynolds Tim DiLauro Sayeed Choudhury Library Digital Programs Sheridan Libraries Johns.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
OAI Object Reuse & Exchange: Discovery Nordbib Workshop, September , Stockholm, Sweden OAI-ORE: Discovery The ORE Editors are: Carl Lagoze (Cornell.
The New Subject Coding Scheme project Alan Paull (APS Ltd, working with CETIS)
照片档案整理 一、照片档案的含义 二、照片档案的归档范围 三、 卷内照片的分类、组卷、排序与编号 四、填写照片档案说明 五、照片档案编目及封面、备考填写 六、数码照片整理方法 七、照片档案的保管与保护.
공무원연금관리공단 광주지부 공무원대부등 공적연금 연계제도 공무원연금관리공단 광주지부. 공적연금 연계제도 국민연금과 직역연금 ( 공무원 / 사학 / 군인 / 별정우체국 ) 간의 연계가 이루어지지 않고 있 어 공적연금의 사각지대가 발생해 노후생활안정 달성 미흡 연계제도 시행전.
Web Programming Week 1 Old Dominion University Department of Computer Science CS 418/518 Fall 2007 Michael L. Nelson 8/27/07.
Жюль Верн ( ). Я мальчиком мечтал, читая Жюля Верна, Что тени вымысла плоть обретут для нас; Что поплывет судно громадней «Грейт Истерна»; Что.
Introduction to Digital Libraries Assignment #1 Old Dominion University Department of Computer Science CS 751/851 Spring 2015 Michael L. Nelson 01/22/15.
Resurrecting My Revolution Using Social Link Neighborhood in Bringing Context to the Disappearing Web Hany SalahEldeen & Michael Nelson Resurrecting My.
Introduction to OAI Static Repositories By Thomas G. Habing Grainger Engineering Library.
Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool Justin F. Brunelle Michael L. Nelson Lyudmila Balakireva Robert Sanderson.
Profiling Web Archives
When Should I Make Preservation Copies of Myself?
Who and What Links to the Internet Archive
Profiling Web Archive Coverage for Top-Level Domain & Content Language
Signposting the Scholarly Web: An Overview
Lazy Preservation, Warrick, and the Web Infrastructure
Introduction to medical Terminolog MEDICAL TERMINOLOGY.
Title (Case Study) Abstract Clinical Evaluation Treatment Outcomes
Introduction to Digital Libraries Assignment #3
Visualizing Digital Collections at Archive-It
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Characterization of Search Engine Caches
Persistent Annotations Deserve New URIs
Introduction to Digital Libraries Week 13: Reference Linking & OpenURL
Title Introduction: Discussion & Conclusion: Methods & Results:
SUPERPOWERS POLICIES EVALUATION BY REGION
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #1
Old Dominion University Computer Science IIPC New Member
Presentation transcript:

Archival HTTP Redirection Retrieval Policies Temporal Web Analytics Workshop 2013, Rio De Janiro Ahmed AlSum, Michael L. Nelson Old Dominion University Norfolk VA, USA Robert Sanderson, Herbert Van de Sompel Los Alamos National Laboratory Los Alamos NM, USA

Agenda Introduction Abstract Model Experiment And Results Retrieval Policies

Memento Terminologies URI-R, R URI-M, M URI-T, TM Original Resource Memento TimeMap

Live Redirect redirects to % curl -I HTTP/ Moved …. Location: …

Live Redirect redirects to

Archived Redirect redirects redirects

Abstract Model

M1M1M1M1 M1M1M1M1 M2M2M2M2 M2M2M2M2 M3M3M3M3 M3M3M3M3

URI Stability URI’s stability is a count of the change in HTTP responses across time (200, 3xx, or 4xx) and the number of different URIs in the “Location” for 3xx status code. High Stability = 1 No Stability = 0

Timemap Redirection Categories All Mementos have 200 HTTP status code All Mementos have redirection to the same URI. All Mementos have redirection to different URIs. Mementos have different HTTP status code. Stability =1 Stability ≈ 0

URI Reliability M1M1M1M1 M1M1M1M1 3xx3xx M2M2M2M2 M2M2M2M2 3xx3xx M3M3M3M3 M3M3M3M3 3xx3xx rel=original R` MM rel=original R` MM rel=original R` MM Stability =1 ?? ?? ?? xx3xx

HTTP Redirection Relationship between URI-R & URI-M Case 1 Case 2Case 3Case 4Case 5

Experiment & Results

Experiment Dataset: 10,000 sample URIs from HTTP Status/Code Percentage (10,000 URI-R) OK (200) 82.83% Redirection (3xx) 14.71% Redirection (301) 8.4% Redirection (302) 6.1% Redirection (others) 0.2% Not-Found (4xx) 1.18% Others 1.28% HTTP Status/Code Percentage (894,717 URI-M) OK (200)93.46% Redirection (3xx)5.69% Not-Found (4xx)0.26% Others0.59% URIs Live HTTP status code Memento HTTP status code

Time spanNumber of Mementos

URI Stability Stability in semi-log scale Stability for |TM(R)| < 300

URI Reliability Reliabilityin semi-log scale Reliabilityfor |TM(R)| < 300

HTTP Redirection Relationship between URI-R & URI-M Case 1 Case 2Case 3Case 4Case % 2.74% 1.34% 1.33% 13.7%

RETRIEVAL POLICIES ARCHIVED HTTP REDIRECTION RETRIEVAL POLICIES

Current Wayback Machine Policy

Policy one: URI-R with HTTP redirection Retrieve the memento M for R. Status(M) =200 Status(M) =3xx Stop Go to Policy 2 Stop Yes No

Policy one: URI-R with HTTP redirection Evaluation: o Policy scope has: 1471 URIs (that have live redirection) o 77 out of 1471 have no mementos at all o 17 out of 77 have been retrieved mementos based on live redirection

Policy two: URI-M with HTTP redirection Accept-Datetime: Sun, 13 May

Policy two: URI-M with HTTP redirection Evaluation: o Policy scope: 2980 TimeMap (that showed HTTP redirection status code in at least one memento) o Success criteria: Using policy two contributed to the original TimeMap o Success percentage: 58% of the cases

Conclusion Quantitative study with 10,000 URIs. 48% were not fully stable through time. 27% were not perfectly reliable through time. New archival retrieval policy: o Policy one: successfully retreived mementos for17 out of 77 o Policy two: Expanded the timemap for 58% of