Download presentation
Presentation is loading. Please wait.
1
Stanford Archival Vault (SAV)
Brian Cooper Hector Garcia-Molina Department of Computer Science Stanford University
2
Problem: Preserving Data
Data decays over time Media decay System failure Human error/malicious actions “Preserving the bits” Goal: Ensure bits survive failures Not deal with harder problem of meaning (yet) E.g., formats, natural language, etc. Redundancy + periodic verification = reliability
3
SAV Architecture Data Creation/Import User Interface Upper Layers
Collects data for archiving E.g., a web crawler Allows direct access to archived data Allows SAV configuration Upper Layers E.g., security, indexing, metadata, etc. Reliability Layer Remote SAV Sites Ensures objects survive failures Objects are replicated to remote sites to provide reliability Object Store “Core” SAV components Basic object storage and retrieval Manages references between objects Unimplemented upper layers Application/user level
4
Replication: Site networks
Sites form “replication agreements” Agree to replicate data Specify data to replicate in agreement May be a subset of all of the data in the archive Periodically connect and compare data, looking for errors SAV site Replication Agreement Strongly connected Weakly connected
5
Replication: Data sets
SAV replicates different data sets separately E.g., web pages under agreement A, Usenet articles under agreement B “Replication sets” should grow without human intervention Traverse link structure to find objects in set SAV SAV Start traversal Start traversal Object in replication set Object not in replication set New object added to SAV Object reference A new object automatically becomes part of the correct replication set
6
Write-once repository
Deletions/modifications disallowed Any object deleted or modified must have been corrupted, and is replaced Challenges Constructing structures of objects Objects references constrained to point from new to old objects Representing modifications Archive new version of objects = version chain Finding objects Indexes
7
Write once repository: Indexes
Key to performance Locate an object quickly using its signature, “Who points to me?” problem, etc. Disposable indexes Can be rebuilt at any time from SAV objects “Bookmarks” used to find collections of objects using indexed name Related objects, e.g. from the same web site SAV Bookmark (with well-known name) Other, unrelated objects
8
Implementation SAV built using Java and CORBA
Tested on Stanford Database Group website Basic user interface for archive management
9
Future work Archiving the whole Internet Other replication models?
Scalability Defining meaningful subsets Other replication models? Preserving meaning Security Preserving sensitive documents Protecting intellectual property
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.