Download presentation
Presentation is loading. Please wait.
Published byAstrid Marcussen Modified over 6 years ago
1
Lazy Preservation, Warrick, and the Web Infrastructure
Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA Internet Archive Tutorial JCDL 2007 Vancouver, BC June 19, 2007
2
Available at http://warrick.cs.odu.edu/
McCown, et al., Brass: A Queueing Manager for Warrick, IWAW 2007. McCown, et al., Factors Affecting Website Reconstruction from the Web Infrastructure, ACM IEEE JCDL 2007. McCown and Nelson, Evaluation of Crawling Policies for a Web-Repository Crawler, HYPERTEXT 2006. McCown, et al., Lazy Preservation: Reconstructing Websites by Crawling the Crawlers, ACM WIDM 2006. Available at
3
What Types of Websites Are Lost?
Marshall, McCown, and Nelson, Evaluating Personal Archiving Strategies for Internet-based Information, IS&T Archiving 2007.
4
Success of website recovery each week
*On average, we recovered 61% of a website on any given week.
5
Overlap with Internet Archive
Overall, IA contained only 46% of the resources available in SE caches
6
Web Server Recoverable Not Recoverable
Static files (html files, PDFs, images, style sheets, Javascript, etc.) Web Infrastructure Recoverable config Perl script Dynamic page Database Not Recoverable
7
Injecting Server Components into Crawlable Pages
Erasure codes HTML pages Recover at least m blocks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.