CLOCKSS, LOCKSS & Barrels of Stuff: Libraries and Publishers share the Task Peter Burnhill EDINA, University of Edinburgh ICOLC, Rome, 13 October 2006
Overview 1.Introduction 2.What is CLOCKSS? 3.CLOCKSS Project 4.Progress Report 5.Recap & look to the future 6.How you can engage with CLOCKSS
1. Introduction Director, EDINA National Data Centre (1 of 2 in UK) Based at University of Edinburgh Designated & largely funded by the JISC I could say a lot more about EDINA (if given half a chance…) SUNCAT (national serials union catalogue), OpenURL Router, Onix for Serials, etc Member of Information Services Directorate (my other 50%), University of Edinburgh Research-led university in Scotland’s capital city Active in CURL & SCURL Contributes to JISC work Keen to engage in international initiatives Could also speak of the Digital Curation Centre … But here to speak of CLOCKSS, a University, Research Library, Learned Society and Publisher initiative
2. What is CLOCKSS? Public good solution to problem of global significance How to preserve & ensure continuing access to electronic scholarly content Partly an organisational solution Collaboration and shared governance between Libraries & Publishers Partly a technical solution LOCKSS technology In short, it’s Controlled use of LOCKSS
So, what is LOCKSS? “Lots of Copies Keep Stuff Safe” Digital Preservation Infrastructure Decentralized, Peer to Peer, Continuous Content Audit & Repair “computers chattering away to one another across the Internet” Open Source
3. The CLOCKSS Project Two-year (2006 -) demonstrator project that intends: open reporting of progress & outcome a public demonstration that this solution really can be trusted for the long term scalability in terms of publisher content & library deployment The project was first funded by its participants, now with additional NDIPP grant support from the Library of Congress to assist reporting.
Who is in CLOCKSS? Consortium acting on behalf of the wider community of libraries and publishers was 6 Libraries and 6 Publishers –Including learned societies acting as publishers now 7 Libraries and 12 Publishers will be …. sufficient to cover the bases Commitment based on stewardship of libraries & responsibility of publishers
Libraries University of Edinburgh New York Public Library Indiana University Rice University Stanford University University of Virginia + OCLC (recently joined as the 7th) Aim to add more to cover ‘tecktonic plates’ of all types of geography
Publishers Blackwell Publishing Elsevier Nature Publishing Group Oxford University Press SAGE Publications Springer Taylor and Francis John Wiley & Sons American Chemical Association American Medical Association American Physiological Society Institute of Physics + aim to add all the rest …
Equal Partners Librarians have made a strategic decision, with publishers, that retains their role as stewards, as memory institutions Publishers have made strategic decision to trust and engage those libraries, committing to prospect of continuing access Both are exploring social and technical models over an initial two-year period, working to build a full-scale production system Costs of the initiative are shared equally between the parties, with additional funds to support for audit & reporting from NDIPP National Digital Information Infrastructure and Preservation Program administered by the US Library of Congress
Agreed Mission “CLOCKSS is a not-for-profit community partnership between publishers and libraries that is developing a distributed, validated, comprehensive archive that preserves and ensures continuing access to electronic scholarly content”
Community Governance Governed by both library and publisher partners Each partner represents an organization but collectively represents each sector Libraries & Publishers (& Learned Societies?) No one-single point of failure or institutional interest will prevent long-term governance Consensus driven, united for support of scholarly communication over the long term Complementary to territorial arrangements for legal deposit
Format Migration Ingest format from publishers (during the project) of both/either: 1.as delivered to the Web 2.as XML source files Access format is “on the fly” When content is requested Process is transparent to the reader rosenthal/01rosenthal.html
Reduce the cost of ingest allowing more material to be preserved value for money Postpone costs of migration taking advantage both of the time value of money, and of the technology cost curve. Migrate material upon reader request vastly lowering amount of content that needs to be processed Allow what the reader sees to be the result of best available technology at time of access Preserve the original look-and-feel, (which can be a large part of the value)
4. Progress Report We are up and running, with two LOCKSS boxes per Library Partner, and one for observation at each Publisher We are ingesting content from the Publishers The (C)LOCKSS boxes are chattering away We meet via teleconference on weekly basis We are readying to simulate our first ‘trigger event’ We are beginning to report and preparing to audit The CLOCKSS is ticking …
Incoming , 9 October 2004 Dear CLOCKSS technical group, We are pleased to announce the release of more content to the CLOCKSS network. Today we have released additional content for the four previously released titles, three are from Oxford University Press and one from SAGE Publications: Age and Ageing (OUP) Environment & Urbanization (SAGE) Journal of Experimental Botany (OUP) Toxicological Sciences (OUP)CLOCKSS libraries The CLOCKSS system works differently from the LOCKSS system. The titles will be automatically configured for your CLOCKSS boxes. You NEED NOT DO ANYTHING -- the content will be automatically ingested and preserved. … we will be paying close attention to make sure that harvesting and auditing is going smoothly. In the future we will transition some of this responsibility [to] each institution hosting CLOCKSS boxes. We will send details on how to do that in the future. We are working to develop and test plugins for titles that currently have manifest pages. We will be releasing these in the near future. Thomas S. Assistant Director & Technical Manager,
5. Recap & look to future What’s the Problem? What are the Threats? Need for Public Good Trust in Library Stewardship Purpose of CLOCKSS Strategies Transition to full production
What’s the Problem? Coming of the digital and the Web accidentally changed the business relationship between librarians and publishers. With rare exceptions, libraries no longer take physical custody of the content, but provide access to web materials This has disrupted the role libraries have played in society for hundreds of years as trusted keepers of information and culture There is concern that was is now digital may cease to be available Our digital cultural and intellectual heritage is at risk.
What are the Threats? Continuous and Abrupt changes Technology storage media, hardware, software, formats Commerce here one day, gone the next Organizations (even Institutions) shifting priorities, politics, staffing Natural disasters Human folly: errors and attacks
Need for Public Good To ensure Content kept safe on behalf of scholarly community Global access to content on a continuing basis The ‘trigger event’ Establish a self-sustaining large dark archive That keep costs low, with revenue to sustain operations and access
Trust in Library Stewardship Decentralized, to gain leverage from existing infrastructure of libraries The libraries hold content, act as custodians On trigger event Board insures content becomes available again No one-single point of failure, nor institutional interest will prevent provision of access Libraries are here for the long term
Transition to full production
Review strategies Replication - more copies are safer Migration - move copies forward in time Transparency - open source software Diversity - no single point of failure Audit – to confirm data really is preserved Sustainable economics - cost effective processes, more materials preserved per Euro/Dollar/Pound
Review Vision Comprehensive How much is all? Need to define scope and ambition Once content is in the archive it stays in the archive Stock and flow Will be available in perpetuity for use by the (dark) archive Need to ensure access by ‘loss’ trigger Need to investigate ‘end-licence’ trigger of back runs? Will serve as a secure backup to world-wide e-copies of material Implications of these 4 statements are being investigated during the two-year project
Purpose of CLOCKSS to preserve content over time, & ensure there is always prospect of service access Being.. comprehensive of all electronic scholarly content with keen focus on published journal articles & the like globally secure against regional disaster of all types: natural, commercial, political through deployment across ‘tecktonic plates’ of all forms of geography
6. How you can engage with CLOCKSS Be a clockss-watcher! Let us have ideas & feedback; register interest to be an Associate ; undertake advocacy In turn, we will use your support to build the community archive to preserve and ensure continuing access to electronic scholarly content.
(Controlled) Lots of Copies (to) Keep Stuff Safe Publisher buy-in, shared governance Many geographically distributed sites guarding against catastrophic failure natural or man-made Many independently administered repositories guard against “insider attacks” Many open source software contributor's (eyes and minds) guard against technical arrogance
I should end here I do have additional slides on: Look and Feel to Readers Fancy graphics But I should close & invite questions
Join Us
Look and Feel to Readers When content is served to the user from a LOCKSS & CLOCKSS Box Look and feel is as close as possible to what the publisher published Preserve content & presentation
Format Migration Ingest format (during the project) both/either: 1.As delivered to the Web 2.XML source files Access format is “on the fly” When content is requested Process is transparent to the reader rosenthal/01rosenthal.html
Reduce the cost of ingest allowing more material to be preserved (VFM) Postpone costs of migration taking advantage both of the time value of money, and of the technology cost curve. Migrate material upon reader request vastly lowering amount of content that needs to be processed Allow what the reader sees to be the result of best available technology at time of access Preserve the original look-and-feel, which can be a large part of the value
Comprehensive Once content is in the archive it stays in the archive Will be available in perpetuity for use by the archive Will serve as a secure backup to world- wide e-copies of the material