Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University Web Science and Digital Libraries Research Group ws-dl.blogspot.com
Web archives capture a lot but not everything Individuals’ interests may not be captured Timely capture is important Capture capability must be enabled for all What’s the Problem? 2 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Use Case: Capturing Breaking Stories 3
November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Use Case: Capturing Breaking Stories 4
November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting 5 Use Case: Capturing Breaking Stories
Users take ad hoc approaches – – Screenshots of Pages Why? Tools are hard. – Build more accessible tools – Appeal to standards (e.g., WARC) – Make interoperable The Amateur Archivist’s Approach 6 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting 28500:2009
Safety of Archives Requires $ No $, No Institution Users Hard Drives Fail – No Access to Save-As files and Screenshots A hybrid approach is needed to leverage institutional safety, formats, and tech while still allowing direct user deposits The Institutional Dilemma 7 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
Show use case where other tools cannot capture – e.g., behind authentication – Juxtapose to Archive.is, Webcite, Save webpage As Video Here 8 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
Scratch Slide 9 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
So we built it! 10 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting WARCreate – Google Chrome extension Create web archives from browser Capture personalized content Preserve on a whim 1.Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp Mat Kelly, Michele C. Weigle, Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC.
WARCreate – How it Works 11 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
Preserving the Original Context 12 Facebook-Supplied Data Dump Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Liberated Data Doesn’t Give The Whole Picture
Preserving the Original Context 13 Using Scraping Tools (e.g. wget) Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting The Target Controls What is Allowed
Preserving the Original Context 14 A Crawler Has No Context Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting No Credentials No Entry No Archiving
Preserving the Original Context 15 IA/HERITRIX OBEY ROBOTS Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting No Means No, if They Say and you Obey
PROBLEM: Users don’t know what to do with WARCs So we built it! 16 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting WARCreate – Google Chrome extension Create web archives from browser Capture personalized content Preserve on a whim 1.Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp Mat Kelly, Michele C. Weigle, Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC.
So, again, we built it! 17 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Web Archiving Integration Layer (WAIL) Heritrix, Wayback, etc. packaged for PC GUI front-end allows “One-Click Preservation” Provides means to replay WARCs 1.Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD. 2.Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA
PROBLEM: Users want to preserve but store at institutions for safe keeping So, again, we built it! 18 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Web Archiving Integration Layer (WAIL) Heritrix, Wayback, etc. packaged for PC GUI front-end allows “One-Click Preservation” Provides means to replay WARCs 1.Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD. 2.Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA PROBLEM: Even with replay, not everyone wants to use Chrome
The Plan 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 19 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &
Disjoint extension/add-on APIs – Little logic can be re-used Problems with HTTP header capture in Chrome are trivial in Firefox – Chrome = highly asynchronous fetching JavaScript code to save to local file system from Chrome for WARCreate is re-usable Porting WARCreate to Firefox 20 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
The Plan 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 21 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & & ✓ In βeta now!
The Plan 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 22 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &
Working with Archive-It to determine feasibility of user-provided WARCs Consideration of data integrity Should data be merged with A-IT crawled WARCs? – How do we account for your vs. my Privacy? Uploading WARCs: An Open Question 23 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
The Plan 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 24 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &
personal streamwallpostsmy tweets global streamnews feedstreamsfollowees’ tweets multimedia-photosphotos N/A multimedia-videosvideos N/A photo collectionalbumsN/A postsnotesN/A friends circlesfollowing Sequential Archiving? 25 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting The Digital Libraries Approach ★ Discovery & Scraping: The Information Retrieval Approach - versus -
Only (and optionally) applied on recognized sites with scraping as fallback for establishing hierarchy Lives online, tools allude to and are always updated Standardized spec* prototype is live online Sequential Archiving = Lots of Maintenance 26 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting * M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012
Firefox WARCreate in Beta – Chrome WARCreate Users Can Currently Archive What They See Now Sequential Archiving Implemented in Chrome WARCreate, needs porting Next Big Hurdle: Working with Archive-It in WARC upload logistics Summary 27 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting
Download Our Archiving Tools! Share Your Use Cases for Capturing the Unpreserved and the Unpreservable Help Us Improve Our Tools, Give Feedback! Archive What I See Now 28 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting In Beta Available Soon! Web Archiving Integration Layer (WAIL) One-Click Preservation Heritrix, Wayback and Others On Your PC! WARCreate for Chrome Create WARC files form any web page from your browser