Download presentation
Presentation is loading. Please wait.
Published byCarol Warren Modified over 8 years ago
1
3/17/2005 CS 791/891 Digital Preservation 1 LOCKSS: A Permanent Web Publishing and Access System V. Reich & D. S. H. Rosenthal Presented By Roopa D. Vegesna Graduate Student
2
3/17/2005CS 791/891 Digital Preservation2 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
3
3/17/2005CS 791/891 Digital Preservation3 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
4
3/17/2005CS 791/891 Digital Preservation4Introduction For centuries libraries and publishers have stable roles. For centuries libraries and publishers have stable roles. –Publishers produced information. –Libraries provided access to this information. Problems: Problems: –Building digital collections. –Access to future generations. –Publishers asked to assure persistency.
5
3/17/2005CS 791/891 Digital Preservation5 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
6
3/17/2005CS 791/891 Digital Preservation6 Specific Needs Future generations of scientists need access to this literature for research, teaching, and learning. Future generations of scientists need access to this literature for research, teaching, and learning. Current and future librarians need an inexpensive, robust mechanism, which they control, to ensure their communities maintain long-term access to this essential literature. Current and future librarians need an inexpensive, robust mechanism, which they control, to ensure their communities maintain long-term access to this essential literature. Current and future publishers need assurances that their journals' editorial values and brands will be available only to authorized and authenticated readers. Current and future publishers need assurances that their journals' editorial values and brands will be available only to authorized and authenticated readers.
7
3/17/2005CS 791/891 Digital Preservation7 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
8
3/17/2005CS 791/891 Digital Preservation8Solution Technically, any solution must satisfy three requirements: Technically, any solution must satisfy three requirements: –The content must be preserved as bits. –Access to the bits must be preserved. –The ability to parse and understand the bits must be preserved. The above issues are addressed by The above issues are addressed by –Lots Of Copies Keep Stuff Safe (LOCKSS)
9
3/17/2005CS 791/891 Digital Preservation9 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
10
3/17/2005CS 791/891 Digital Preservation10 Overview of LOCKSS LOCKSS is open source, peer-to-peer software that functions as a persistent access preservation system. LOCKSS is open source, peer-to-peer software that functions as a persistent access preservation system. LOCKSS allows libraries to run web caches for specific journals. LOCKSS allows libraries to run web caches for specific journals. LOCKSS is a digital preservation Internet appliance, not an archive. LOCKSS is a digital preservation Internet appliance, not an archive. A key difference between LOCKSS and "general library collections" is that the action of preserving material in the collection is intertwined with the provision of access to the end user. A key difference between LOCKSS and "general library collections" is that the action of preserving material in the collection is intertwined with the provision of access to the end user.
11
3/17/2005CS 791/891 Digital Preservation11 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
12
3/17/2005CS 791/891 Digital Preservation12 Data Flow Each LOCKSS cache (oval) collects journal content from the publisher's web site as it is published. Readers (circles) can get content from the publisher site. When the publisher's web site is not available (gray) to a local community, readers from that community get content from their local institution's cache. The caches "talk" to each other to maintain the content's integrity over time.
13
3/17/2005CS 791/891 Digital Preservation13 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
14
3/17/2005CS 791/891 Digital Preservation14 LOCKSS – Web Caches With the LOCKSS model, libraries run persistent web caches. With the LOCKSS model, libraries run persistent web caches. These caches collect content as it is published and are never flushed. These caches collect content as it is published and are never flushed. The LOCKSS caches challenge each other to vote in polls providing that their copies of journal volumes, issues and articles are the same. The LOCKSS caches challenge each other to vote in polls providing that their copies of journal volumes, issues and articles are the same.
15
3/17/2005CS 791/891 Digital Preservation15 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
16
3/17/2005CS 791/891 Digital Preservation16 LOCKSS - 3 perspectives Readers Perspective: Readers Perspective: –Readers expect minimum delay with no further interaction and proper search results. Librarians Perspective: Librarians Perspective: –Librarians want to provide both immediate and long- term access to the readers. Publishers Perspective: Publishers Perspective: –Publishers want to maintain journal brand and image. They want material available for future society members and other subscribers.
17
3/17/2005CS 791/891 Digital Preservation17 Reader’s Perspective LOCKSS focuses on preserving the service of having links resolve to, or searches to find, the relevant content. LOCKSS focuses on preserving the service of having links resolve to, or searches to find, the relevant content. An institution using LOCKSS to preserve access to a journal in effect runs a web cache. An institution using LOCKSS to preserve access to a journal in effect runs a web cache. At intervals the cache crawls the journal publisher's web site and pre-loads itself with newly published (but not yet read) content. At intervals the cache crawls the journal publisher's web site and pre-loads itself with newly published (but not yet read) content. The LOCKSS cache transparently supplies pages it is preserving even if those pages are no longer available from the original publisher's web site. The LOCKSS cache transparently supplies pages it is preserving even if those pages are no longer available from the original publisher's web site.
18
3/17/2005CS 791/891 Digital Preservation18 Librarian’s Perspective A library using LOCKSS pays for the equipment and staff time to run and manage a cache. A library using LOCKSS pays for the equipment and staff time to run and manage a cache. Unlike normal caches, the LOCKSS cache is never flushed and, over the long term, the full content remains accessible. Unlike normal caches, the LOCKSS cache is never flushed and, over the long term, the full content remains accessible. In normal operation, an ordinary cache will only act as a proxy but in a rough analog of inter- library loan, LOCKSS caches cooperate to detect and repair damage In normal operation, an ordinary cache will only act as a proxy but in a rough analog of inter- library loan, LOCKSS caches cooperate to detect and repair damage
19
3/17/2005CS 791/891 Digital Preservation19 Publisher’s Perspective LOCKSS enables librarians to collaborate to preserve readers' access to the content to which they subscribe, but it also addresses the publisher's concerns. LOCKSS enables librarians to collaborate to preserve readers' access to the content to which they subscribe, but it also addresses the publisher's concerns. –Because content is provided to other caches only to repair damage to content they previously held, no new leakage paths are introduced. –Because the reader is supplied preferentially from the publisher, with the cache only as a fallback, the publisher sees the same interactions they would have seen without LOCKSS.
20
3/17/2005CS 791/891 Digital Preservation20 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
21
3/17/2005CS 791/891 Digital Preservation21 LOCKSS – Content Preservation LOCKSS has two tasks in preserving content: LOCKSS has two tasks in preserving content: –It needs to detect, and if possible repair, any damage that occurs through hardware failure, carelessness or hostile action. –It must also detect, and if possible render ineffective, any attacks.
22
3/17/2005CS 791/891 Digital Preservation22 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
23
3/17/2005CS 791/891 Digital Preservation23 Damage to contents is a normal part Damage to contents is a normal part –In the absence of damage the hashes will agree. –If they disagree, one of the losers calls a sequence of polls to walk down the tree of directories to locate the damaged files. –When a damaged file is located, a new copy is fetched to replace it. –If the file is not available from the publisher, it will be requested from one of the winning caches. –If a cache receives a request for a page from another cache, it examines its memory of agreeing votes to see if the requester once agreed with it about the page in question. If the requester did, a new copy will be supplied. Detecting and Repairing Damage
24
3/17/2005CS 791/891 Digital Preservation24 How it works? Cache A Cache B bad ok If cache A determines that the content it holds is corrupted, it then asks for a new copy of that content from either the publisher or one of the other LOCKSS caches on the net. Note: Cache B will never give a copy of an article to Cache A if the requesting cache has not shown in the past that it had a copy of the content.
25
3/17/2005CS 791/891 Digital Preservation25 How it works? Cache A Cache B ok Cache B allows download of replacement data to Cache A Wide-area replication (“Lots of Copies”) ensures valid data (“Keep Stuff Safe”) ok Thus Lots Of Keep Stuff Safe and peer review of redundant content ensures data integrity.
26
3/17/2005CS 791/891 Digital Preservation26 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
27
3/17/2005CS 791/891 Digital Preservation27 A "bad guy“ is the one whose goal might be to change the consensus about some content in the system without being detected. A "bad guy“ is the one whose goal might be to change the consensus about some content in the system without being detected. –A "bad guy" who infiltrated only a few caches and made matching changes to each of their contents would appear to be random damage. The other caches would not change their contents to match. –If the "bad guy" infiltrated a substantial number of caches, even a small majority, he would cause polls whose results were close. However, the close results of the polls would alert the system's operators that something was wrong. –Only if the "bad guy" infiltrated the overwhelming majority of caches would his change be both effective and undetected. Hampering the "Bad Guy"
28
3/17/2005CS 791/891 Digital Preservation28 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
29
3/17/2005CS 791/891 Digital Preservation29 Project Status - Alpha Test Design and development of LOCKSS started in 1999. Design and development of LOCKSS started in 1999. An "alpha" version of the software, without a user interface or any precautions against the "bad guy", ran from May 2000 through March 2001 with around 15 caches. An "alpha" version of the software, without a user interface or any precautions against the "bad guy", ran from May 2000 through March 2001 with around 15 caches. Alpha sites were Stanford University, the University of California, Berkeley, the Los Alamos National Laboratory (LANL), the University of Tennessee, Harvard University, and Columbia University. Alpha sites were Stanford University, the University of California, Berkeley, the Los Alamos National Laboratory (LANL), the University of Tennessee, Harvard University, and Columbia University. This test established that the basic mechanisms worked. The system was able to collect the test content and repair both deliberate and accidental damage to it. This test established that the basic mechanisms worked. The system was able to collect the test content and repair both deliberate and accidental damage to it.
30
3/17/2005CS 791/891 Digital Preservation30 Project Status – Beta Test The worldwide "beta" test began in April 2001, using an almost complete implementation of the system. The worldwide "beta" test began in April 2001, using an almost complete implementation of the system. Approximately 35 publishers are endorsing the test. Over 40 libraries, with about 60 widely distributed and varyingly configured caches, have signed on to the project. Approximately 35 publishers are endorsing the test. Over 40 libraries, with about 60 widely distributed and varyingly configured caches, have signed on to the project. They include major institutions, such as the Library of Congress and the British Library, and smaller institutions, such as the University of Otago in New Zealand. They include major institutions, such as the Library of Congress and the British Library, and smaller institutions, such as the University of Otago in New Zealand.
31
3/17/2005CS 791/891 Digital Preservation31 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion
32
3/17/2005CS 791/891 Digital Preservation32 Conclusion Thus LOCKSS has shown it can affordably: Thus LOCKSS has shown it can affordably: –Get permission to preserve copyright content. –Automatically collect it from the publisher. –Prove that the collection is OK. –Preserve it despite hardware/software failure and attack by powerful adversaries. –Disseminate it transparently to users. "...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." – Thomas Jefferson, 1791 "...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." – Thomas Jefferson, 1791
33
3/17/2005CS 791/891 Digital Preservation33 References Up-to-date project status is available at http://lockss.stanford.edu Up-to-date project status is available at http://lockss.stanford.edu http://lockss.stanford.edu Awesome QuickTime presentation Awesome QuickTime presentation http://www.lockss.org/news/LOCKSS_SHOW.mov Current list of Participating libraries. Current list of Participating libraries. http://lockss.stanford.edu/about/users.htm Current Publishers and Titles Current Publishers and Titles http://lockss.stanford.edu/about/titles.htm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.