Download presentation
Presentation is loading. Please wait.
Published byCarol Hancock Modified over 9 years ago
1
An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science http://www.ics.uci.edu/~jfeise/ jfeise@ics.uci.edu
2
Motivation Web resources change often Previous versions are no longer accessible Only the webmaster may know the resource history The Web doesn’t have a memory Who needs a history of Web resources? Organizations Development teams Historians Journalists
3
Current Approaches Search Engines Only one version Stored versions often outdated The Internet Archive Currently 14-15 TB of Web resources Starting October 1996 No metadata storage Related resources may be scattered across files Online access probably infeasible Low collection frequency
4
Configuration Management System Proxy/Cache Web Our Architecture
5
Resource Storage and Access Modified Squid Proxy/Web Cache Piggybacking on cache functionality Access of historical versions Detection of date/revision selection Navigational features: next/previous day/month/revision Connection to Configuration Management System Retrieval of requested revision from CMS Possibility of distribution of the CMS storage
6
Transparency Transparent access through Proxy Browser usage doesn’t change Comparison of last stored version with current version User’s selection is stored in CMS with current date/time
7
Example
8
Limitations Resource location changes Resource deletion Collection frequency Difficulty of capturing highly dynamic resources Only pages visited get collected Link consistency problems
9
Legal Issues Intellectual Property Rights and Privacy Configuration for opt-out/opt-in strategies Granularity: group-wide/company-wide settings Deleting all old revisions? Copyright issues Access rights Who can view what? Rights may change over time Censorship Bypassing with P2P technology e.g., Freenet
10
Conclusions New approach to access histories of Web resources Designed for online access with standard browser Prototype implementation Used for performance tests Scalability remains to be tested Considering backend storage replacement, e.g., with a DeltaV server Legal issues exist
11
Thank You Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.