3/17/2005 CS 791/891 Digital Preservation 1 LOCKSS: A Permanent Web Publishing and Access System V. Reich & D. S. H. Rosenthal Presented By Roopa D. Vegesna.

Slides:



Advertisements
Similar presentations
Current State of Play in Digital Preservation Peter B. Hirtle Cornell University Library Society of American Archivists.
Advertisements

Archiving Electronic Journals. Aims and objectives To get an overview of the challenges of archiving electronic journals To consider who can take responsibility.
HINARI – Accessing Articles: Problems and Solutions.
HINARI – Access Problems and Solutions. Full-text Article Access Problems Using the Journals by title A-Z list, we are attempting to access a full-text.
Overview of LOCKSS. Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
College ICT Committee An Overview of DARS Stewart Watson 29 th April 2008 Copy for distribution.
Project Report1 Dave Inman Project report. Project Report2 Ways to write a report Top down: Write the structure of the report (maybe use the web templates.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 Stanford Archival Repository Project Brian Cooper Arturo Crespo Hector Garcia-Molina Department of Computer Science Stanford University.
Maintaining and Updating Windows Server 2008
E-journals: opportunities and challenges Bharati Banerjee.
American Medical Association Journals include: JAMA (journal of the American Medical Association.), Archives of surgery, Archives of ophthalmology and.
Internet and Social Networking Research Tools for Academic Writing Copyright © 2014 Todd A. Whittaker
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
Open Access: An Introduction Edward Shreeves Director, Collections and Content Development University of Iowa Libraries
October Challenges for Scientific Editors in the Electronic Era Hooman Momen Editor Bulletin of the World Health Organization.
Alternative Models of Scholarly Communication: The "Toddler Years" for Open Access Journals and Institutional Repositories Greg Tananbaum President The.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Web Site Content Protection Solution. Protecting Web Site Content with.
What Do Faculty Think of the Changing Environment? Kevin Guthrie Roger C. Schonfeld April 17, 2007.
PubMed/History, Advanced Search and Review (module 4.3)
Core Issues in Digital Preservation: Storage and Maintenance Jacob Nadal, Preservation Officer UCLA Library.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
CITATION LINKING AND THE E-JOURNAL LANDSCAPE April, 2000.
Preserving Peer Replicas by Rate-Limited Sampled Voting Petros Maniatis Mema Roussopoulos TJ Giuli David S. H. Rosenthal Mary Baker Yanto Muliadi Stanford.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
1 Keeping stuff safe: how can libraries maintain their e-journal collections in the long-term? Richard Gartner King's College London International conference.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
2/08/2006 2:56 pm Introduction to the Digital LibrarySlide 1 of 40 Introduction to The Digital Library.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
DOAJ Directory of Open Access Journals Berlin March 2006.
JISC Collections 01/11/2015 | ICOLC Paris 2009 | Slide 1 Comparative Study of e-Journal Archiving Solutions Safeguarding post cancellation access to e-journals.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
LOCKSS: Lots of Copies Keeps Stuff Safe UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
We now will use Advanced Search Builder option. Access to Advanced is from the initial PubMed page or the Search Results page. Advanced Search.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
The Story of at the Alaska State Library Presented by Sheri Somerville Alaska State Library March 14, 2009.
HINARI – Accessing Articles: Problems and Solutions (Appendix 1)
Persistent Digital Archives and Library System (PeDALS)
Lesson 23: Configure File Recovery
Journals can be accessed by title from an alphabetical list. For this exercise, click on ‘L’ from the A-Z list. Note: there also is a View complete list.
HINARI Basic Course Module 3 Appendix HINARI – Accessing Articles: Problems and Solutions HINARI – Printing, Copying, Saving and ing Articles: Problems.
Digital repositories and scientific communication challenge Radovan Vrana Department of Information Sciences, Faculty of Humanities and Social Sciences,
11 Jan Preserving Peer Replicas By Rate-Limited Sampled Voting Peer to Peer Seminar Prof. Dr.-Ing. Gerhard Weikum Presentation by: Renata Dividino.
ACS PUBLICATIONS Over a Century of Essential Chemistry on Your Desktop H I G H Q U A L I T Y. H I G H I M P A C T. A C S P U B L I C A T I O N S Andrew.
Catherine Fournier ICOLC October LOCKSS: FEEDBACK FROM INIST’s EXPERIENCE Foreword Preservation-Why? LOCKSS overview LOCKSS at INIST Conclusion.
Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch The LOCKSS Project: an overview Open Access Conference, Pretoria, July.
We now will sample several of the resources from the Other Free Collections drop down menu.
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
Maintaining and Updating Windows Server 2008 Lesson 8.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Transparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. Morabito Lib Magazine, 11(1), 2005
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
CLOCKSS Controlled LOCKSS (Lots of Copies Keep Stuff Safe) CEIRC Datasets Coordinators Forum Melbourne 4 February 2008.
Lots Of Copies Keep ‘Stuff’ Safe

Impact of the Alternative e-Publishing Model: From Open Access Resources & Self-Publishing toward Librarian’s New Challenges 溫達茂 飛資得資訊 中華民國九十三年十一月.
An Overview of Data-PASS Shared Catalog
Denise Koufogiannakis Chair, Steering Committee
Introduction to Implementing an Institutional Repository
Hinari Basic Course Module 3 Appendix 1
PASIG LOCKSS Seminar Agenda
Presentation transcript:

3/17/2005 CS 791/891 Digital Preservation 1 LOCKSS: A Permanent Web Publishing and Access System V. Reich & D. S. H. Rosenthal Presented By Roopa D. Vegesna Graduate Student

3/17/2005CS 791/891 Digital Preservation2 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation3 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation4Introduction For centuries libraries and publishers have stable roles. For centuries libraries and publishers have stable roles. –Publishers produced information. –Libraries provided access to this information. Problems: Problems: –Building digital collections. –Access to future generations. –Publishers asked to assure persistency.

3/17/2005CS 791/891 Digital Preservation5 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation6 Specific Needs Future generations of scientists need access to this literature for research, teaching, and learning. Future generations of scientists need access to this literature for research, teaching, and learning. Current and future librarians need an inexpensive, robust mechanism, which they control, to ensure their communities maintain long-term access to this essential literature. Current and future librarians need an inexpensive, robust mechanism, which they control, to ensure their communities maintain long-term access to this essential literature. Current and future publishers need assurances that their journals' editorial values and brands will be available only to authorized and authenticated readers. Current and future publishers need assurances that their journals' editorial values and brands will be available only to authorized and authenticated readers.

3/17/2005CS 791/891 Digital Preservation7 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation8Solution Technically, any solution must satisfy three requirements: Technically, any solution must satisfy three requirements: –The content must be preserved as bits. –Access to the bits must be preserved. –The ability to parse and understand the bits must be preserved. The above issues are addressed by The above issues are addressed by –Lots Of Copies Keep Stuff Safe (LOCKSS)

3/17/2005CS 791/891 Digital Preservation9 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation10 Overview of LOCKSS LOCKSS is open source, peer-to-peer software that functions as a persistent access preservation system. LOCKSS is open source, peer-to-peer software that functions as a persistent access preservation system. LOCKSS allows libraries to run web caches for specific journals. LOCKSS allows libraries to run web caches for specific journals. LOCKSS is a digital preservation Internet appliance, not an archive. LOCKSS is a digital preservation Internet appliance, not an archive. A key difference between LOCKSS and "general library collections" is that the action of preserving material in the collection is intertwined with the provision of access to the end user. A key difference between LOCKSS and "general library collections" is that the action of preserving material in the collection is intertwined with the provision of access to the end user.

3/17/2005CS 791/891 Digital Preservation11 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation12 Data Flow Each LOCKSS cache (oval) collects journal content from the publisher's web site as it is published. Readers (circles) can get content from the publisher site. When the publisher's web site is not available (gray) to a local community, readers from that community get content from their local institution's cache. The caches "talk" to each other to maintain the content's integrity over time.

3/17/2005CS 791/891 Digital Preservation13 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation14 LOCKSS – Web Caches With the LOCKSS model, libraries run persistent web caches. With the LOCKSS model, libraries run persistent web caches. These caches collect content as it is published and are never flushed. These caches collect content as it is published and are never flushed. The LOCKSS caches challenge each other to vote in polls providing that their copies of journal volumes, issues and articles are the same. The LOCKSS caches challenge each other to vote in polls providing that their copies of journal volumes, issues and articles are the same.

3/17/2005CS 791/891 Digital Preservation15 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation16 LOCKSS - 3 perspectives Readers Perspective: Readers Perspective: –Readers expect minimum delay with no further interaction and proper search results. Librarians Perspective: Librarians Perspective: –Librarians want to provide both immediate and long- term access to the readers. Publishers Perspective: Publishers Perspective: –Publishers want to maintain journal brand and image. They want material available for future society members and other subscribers.

3/17/2005CS 791/891 Digital Preservation17 Reader’s Perspective LOCKSS focuses on preserving the service of having links resolve to, or searches to find, the relevant content. LOCKSS focuses on preserving the service of having links resolve to, or searches to find, the relevant content. An institution using LOCKSS to preserve access to a journal in effect runs a web cache. An institution using LOCKSS to preserve access to a journal in effect runs a web cache. At intervals the cache crawls the journal publisher's web site and pre-loads itself with newly published (but not yet read) content. At intervals the cache crawls the journal publisher's web site and pre-loads itself with newly published (but not yet read) content. The LOCKSS cache transparently supplies pages it is preserving even if those pages are no longer available from the original publisher's web site. The LOCKSS cache transparently supplies pages it is preserving even if those pages are no longer available from the original publisher's web site.

3/17/2005CS 791/891 Digital Preservation18 Librarian’s Perspective A library using LOCKSS pays for the equipment and staff time to run and manage a cache. A library using LOCKSS pays for the equipment and staff time to run and manage a cache. Unlike normal caches, the LOCKSS cache is never flushed and, over the long term, the full content remains accessible. Unlike normal caches, the LOCKSS cache is never flushed and, over the long term, the full content remains accessible. In normal operation, an ordinary cache will only act as a proxy but in a rough analog of inter- library loan, LOCKSS caches cooperate to detect and repair damage In normal operation, an ordinary cache will only act as a proxy but in a rough analog of inter- library loan, LOCKSS caches cooperate to detect and repair damage

3/17/2005CS 791/891 Digital Preservation19 Publisher’s Perspective LOCKSS enables librarians to collaborate to preserve readers' access to the content to which they subscribe, but it also addresses the publisher's concerns. LOCKSS enables librarians to collaborate to preserve readers' access to the content to which they subscribe, but it also addresses the publisher's concerns. –Because content is provided to other caches only to repair damage to content they previously held, no new leakage paths are introduced. –Because the reader is supplied preferentially from the publisher, with the cache only as a fallback, the publisher sees the same interactions they would have seen without LOCKSS.

3/17/2005CS 791/891 Digital Preservation20 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation21 LOCKSS – Content Preservation LOCKSS has two tasks in preserving content: LOCKSS has two tasks in preserving content: –It needs to detect, and if possible repair, any damage that occurs through hardware failure, carelessness or hostile action. –It must also detect, and if possible render ineffective, any attacks.

3/17/2005CS 791/891 Digital Preservation22 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation23 Damage to contents is a normal part Damage to contents is a normal part –In the absence of damage the hashes will agree. –If they disagree, one of the losers calls a sequence of polls to walk down the tree of directories to locate the damaged files. –When a damaged file is located, a new copy is fetched to replace it. –If the file is not available from the publisher, it will be requested from one of the winning caches. –If a cache receives a request for a page from another cache, it examines its memory of agreeing votes to see if the requester once agreed with it about the page in question. If the requester did, a new copy will be supplied. Detecting and Repairing Damage

3/17/2005CS 791/891 Digital Preservation24 How it works? Cache A Cache B bad ok If cache A determines that the content it holds is corrupted, it then asks for a new copy of that content from either the publisher or one of the other LOCKSS caches on the net. Note: Cache B will never give a copy of an article to Cache A if the requesting cache has not shown in the past that it had a copy of the content.

3/17/2005CS 791/891 Digital Preservation25 How it works? Cache A Cache B ok Cache B allows download of replacement data to Cache A Wide-area replication (“Lots of Copies”) ensures valid data (“Keep Stuff Safe”) ok Thus Lots Of Keep Stuff Safe and peer review of redundant content ensures data integrity.

3/17/2005CS 791/891 Digital Preservation26 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation27 A "bad guy“ is the one whose goal might be to change the consensus about some content in the system without being detected. A "bad guy“ is the one whose goal might be to change the consensus about some content in the system without being detected. –A "bad guy" who infiltrated only a few caches and made matching changes to each of their contents would appear to be random damage. The other caches would not change their contents to match. –If the "bad guy" infiltrated a substantial number of caches, even a small majority, he would cause polls whose results were close. However, the close results of the polls would alert the system's operators that something was wrong. –Only if the "bad guy" infiltrated the overwhelming majority of caches would his change be both effective and undetected. Hampering the "Bad Guy"

3/17/2005CS 791/891 Digital Preservation28 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation29 Project Status - Alpha Test Design and development of LOCKSS started in Design and development of LOCKSS started in An "alpha" version of the software, without a user interface or any precautions against the "bad guy", ran from May 2000 through March 2001 with around 15 caches. An "alpha" version of the software, without a user interface or any precautions against the "bad guy", ran from May 2000 through March 2001 with around 15 caches. Alpha sites were Stanford University, the University of California, Berkeley, the Los Alamos National Laboratory (LANL), the University of Tennessee, Harvard University, and Columbia University. Alpha sites were Stanford University, the University of California, Berkeley, the Los Alamos National Laboratory (LANL), the University of Tennessee, Harvard University, and Columbia University. This test established that the basic mechanisms worked. The system was able to collect the test content and repair both deliberate and accidental damage to it. This test established that the basic mechanisms worked. The system was able to collect the test content and repair both deliberate and accidental damage to it.

3/17/2005CS 791/891 Digital Preservation30 Project Status – Beta Test The worldwide "beta" test began in April 2001, using an almost complete implementation of the system. The worldwide "beta" test began in April 2001, using an almost complete implementation of the system. Approximately 35 publishers are endorsing the test. Over 40 libraries, with about 60 widely distributed and varyingly configured caches, have signed on to the project. Approximately 35 publishers are endorsing the test. Over 40 libraries, with about 60 widely distributed and varyingly configured caches, have signed on to the project. They include major institutions, such as the Library of Congress and the British Library, and smaller institutions, such as the University of Otago in New Zealand. They include major institutions, such as the Library of Congress and the British Library, and smaller institutions, such as the University of Otago in New Zealand.

3/17/2005CS 791/891 Digital Preservation31 List of Topics Introduction Introduction Specific Needs Specific Needs Solution Solution LOCKSS Overview LOCKSS Overview Data Flow Data Flow Web Caches Web Caches 3 Perspectives 3 Perspectives Content Preservation Content Preservation Detecting and Repairing Damage Detecting and Repairing Damage Hampering the “Bad Guy” Hampering the “Bad Guy” Project Status Project Status Conclusion Conclusion

3/17/2005CS 791/891 Digital Preservation32 Conclusion Thus LOCKSS has shown it can affordably: Thus LOCKSS has shown it can affordably: –Get permission to preserve copyright content. –Automatically collect it from the publisher. –Prove that the collection is OK. –Preserve it despite hardware/software failure and attack by powerful adversaries. –Disseminate it transparently to users. "...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." – Thomas Jefferson, 1791 "...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." – Thomas Jefferson, 1791

3/17/2005CS 791/891 Digital Preservation33 References Up-to-date project status is available at Up-to-date project status is available at Awesome QuickTime presentation Awesome QuickTime presentation Current list of Participating libraries. Current list of Participating libraries. Current Publishers and Titles Current Publishers and Titles