Download presentation
Presentation is loading. Please wait.
Published byHilda Leonard Modified over 9 years ago
1
The Archival Problem & Infrastructure for Solutions What needs to be archived and what needs to be done? Richard Boulderstone Director eStrategy February 2010
2
2 2 What needs to be archived? most things... at least a sample of most things...
3
3 3 What needs to be done? Ingest Transition from print to digital information resources Heterogeneity, complexity and scale of digital content Interactive items Should we validate? Storage Long term authenticity of items Loss or corruption External References Access Securely share content with other legal deposit libraries Long term access – Beyond life of original hardware & software platform (aka Digital Preservation) Controlled access Public domain, legal deposit, licensed Content must be easy to find – Important!!!
4
4 4 Digital Library Architecture & Design Considerations Started with Long-Term Storage problem Wanted cost effective, highly resilient store (highly unlikely to lose items or have items corrupted), long term integrity Analysis showed that magnetic tape solutions had limitations: As size of store grows (petabytes) total recovery time can be long Cost of tape not much less than commodity spinning disk Wanted continuous validation to ensure content retained integrity Disk storage market tends to focus ‘value-added’ products on high-transaction rates, large capacity and high reliablity – our requirement is low-cost, large capacity and reasonable performance Needed to share some of the archived content with other legal deposit libraries Architect resilience into system – is ‘backup’ part of the architecture?
5
5 5 The Digital Library System Store
6
6 6 What needs to be done? Ingest Transition from print to digital information resources Heterogeneity, complexity and scale of digital content Interactive items Should we validate? Storage Long term authenticity of items Loss or corruption External References Access Securely share content with other legal deposit libraries Long term access – Beyond life of original hardware & software platform (aka Digital Preservation) Controlled access Public domain, legal deposit, licensed Content must be easy to find – Important!!!
7
7 7 CONTENT STREAMS with operational DIGITAL LIBRARY INGEST IDNameDescription 1 DIGITISED BOOKS & JOURNALS Microsoft-funded digitised nineteenth century books accessible from DLS to the Reading Rooms via ILS. Ingest is complete. 2 VOLUNTARY ELECTRONIC LEGAL DEPOSIT (VELD) Deposited hand held media and offline electronic media submitted ‘in lieu’ of electronic legal deposit legislation; includes journals, books etc (formerly known as VDEP material; excludes scholarly e-journals which are identified as a separate stream). A very limited amount of this content is accessible from DLS in the Reading Rooms via ILS. 3 FIELD SOUND RECORDINGS Born digital field recordings created by Sound Archive – low volume, accessible from DLS to Reading Rooms via Sound Server 4 eJOURNALSScholarly e-Journals sent to the BL as part of the Voluntary Deposit scheme. Ingest of simple eJournals (Stream 2) is live. Progress towards ingest of complex eJournals is halted while the technical options are being considered. Work has started on a project to ingest ESTAR eJournals 5a DIGITAL NEWSPAPERSContemporary newspapers to be supplied digitally & directly to the BL by newspaper publishers. A pilot of a small number of titles from a single publisher is live. Progress is halted pending agreements from publishers. 5b LEGACY DIGITISED NEWSPAPERS Scanned historical newspapers already in the BL’s current collection. Ingest of JISC-1 is currently underway
8
8 8 CONTENT STREAMS prioritised for DIGITAL LIBRARY INGEST IDNameDescription 6 WEB ARCHIVINGAn archive of web-sites gathered after gaining permission from rights holders. Following Legal Deposit Regulations the BL will be able to harvest sites from the uk domain without asking for permission. The project to ingest this collection is at the shape stage 7 BORN DIGITAL SOUNDBorn digital sound recordings, acquisitions & voluntary deposits. Expanding, with implications for Gateway project & links to Moving Image stream. The project to ingest this collection is at the shape stage
9
9 9 OTHER CONTENT STREAMS for DIGITAL LIBRARY INGEST IDNameDescription 8 LEGACY DIGITISED MASTERS All existing image-based digitisation products largely on hand-held media with significant preservation risk 9 NEW DIGITISED MASTERSDigitised images from forthcoming projects including Single-Sheet Digitisation, Vulnerable Items Imaging, possibly Greek Manuscripts. Fast Track to Safety has been used to process the Vulnerable Collection Items (VCI) images – these are now DLS-ready. The same process can be used for the single- sheet digitised objects & other Aleph catalogued content 10 DIGITISED SOUNDDigitised archival sound recordings funded by JISC (ASR1 & ASR2) 11 eMANUSCRIPTSHybrid collections, comprising paper, computer & other media with significant issues (technical, privacy, rights management etc) 12 DATABASESIncluding large numerical datasets 13 DIGITAL MAPSIncluding OS/OSNI MasterMap data and other contacts / agreements (actually a dataset rather than a ‘map’) 14 E-THESESDigitised & born digital theses funded by JISC (eThOS Project) 15 ELECTRONIC GREY LITERATURE Scientific technical & business documents, conference papers, newsletters, e-govt documents i.e. not readily available through commercial channels. Some of this content is already ingested to DLS via the VELD route (see 2 above) 16 eBOOKSAssumes born-digital material. Some of this content is already ingested to DLS via the VELD route (see 2 above) 17 MOVING IMAGESDigital recordings of television programmes, online podcasts etc
10
10 1. Ingest – Some remaining issues ‘Dynamic’ Content – Update after initial deposit Currently use snapshot, version-based approach Other generic solutions? Should we archive published outputs, underlying data or both? Growing diversity of content Should we validate to ensure long-term access? Container formats may hide significant complexity (3D pdf) Scale
11
11 What needs to be done? Ingest Transition from print to digital information resources Heterogeneity, complexity and scale of digital content Interactive items Should we validate? Storage Long term authenticity of items Loss or corruption External References Access Securely share content with other legal deposit libraries Long term access – Beyond life of original hardware & software platform (aka Digital Preservation) Controlled access Public domain, legal deposit, licensed Content must be easy to find – Important!!!
12
12 Edinburgh -2010 Aberystwyth Boston Spa St. Pancras Cambridge Univ. Oxford Univ. Legal Deposit Libraries Shared Infrastructure Large scale, highly resilient digital store Complete copies of content at each node Continuous validation & correction Long term digital storage for BL content & eLegal deposit distribution Distribution of eLegal deposit content (NLW, NLS and Oxford & Cambridge)
13
13 Agreement between UK Legal Deposit Libraries Use of single IT infrastructure, based on BL Digital Library System, to share legal deposit content Use of single ingest point (Boston Spa) for legal deposit content Deployment of ‘nodes’ at BL, NLW & NLS for resilience, operational efficiency, autonomy of operation. Oxford and Cambridge to access content from BL node. Consistent approach to preservation, metadata standards, SLAs (service level agreements), infrastructure operations. Access controls Trinity College Dublin will be included when legislation allows
14
14 Digital Library System Contents Live Content Streams Sound Archives (BL) Voluntary Digital Donations (Vol. Scheme) Nineteenth Century Digitised Books (BL) Born Digital Newspapers (BL Pilot) eJournals (Vol. Scheme) Digitised Newspapers (BL) Storage >500,000 Digital Items ~50 Terabytes of Content
15
15 Long-Term Access (aka Digital Preservation) Dedicated digital preservation team at BL Digital Library System currently supports Bit-level Preservation – long term integrity of ingested ‘bits’. Also need to support Content-level Preservation, where the DLS is able to provide long-term access to the content, ensuring that users can render and use preserved content. The Planets Project will deliver preservation modules for DLS in summer 2010. Identification of at risk content Support for file format migrations Technology watch service
16
16 2. Storage – Some remaining issues Ongoing cost Storage Can we share common costs (Tools, Technology watch, Test-beds) Can ‘dynamic’ items be frozen and more importantly unfrozen? How many file formats/software will become obsolete requiring heroic efforts to recreate original user experience? How are external references maintained over time?
17
17 What needs to be done? Ingest Transition from print to digital information resources Heterogeneity, complexity and scale of digital content Interactive items Should we validate? Storage Long term authenticity of items Loss or corruption External References Access Securely share content with other legal deposit libraries Long term access – Beyond life of original hardware & software platform (aka Digital Preservation) Controlled access Public domain, legal deposit, licensed Content must be easy to find – Important!!!
18
18 Digital Policy & Rights Management To provide the widest possible access to our digital collections while respecting the terms and conditions of licenses, voluntary schemes and regulations. Most content controlled by copyright/legal deposit restrictions – will this change? Current access control supports: Embargoed (no access), Authorised staff only, Reading room only To be developed: Internet Single consecutive use at legal deposit libraries Secure container so that readers can use own PCs to access legal deposit content Mobile (anywhere) access
19
19 Content Navigation & Discovery The most important issue Catalogue model designed for two levels of hierarchy (Title & holdings) Using Ex Libris Primo product as initial solution (Lucene full-text search engine embedded in product) Much more needed – need help! Persistent links Full featured commercial search engines Semantic web/Linked data/RDF Triples Text mining, entity extraction Information visualisation techniques Hardware developments, mobile technologies, large displays
20
20 3. Access – Some remaining issues With huge quantity of content how can people find what they want? How can we support the development of sophisticated content navigation tools? Where should we invest in resource discovery?
21
21 Conclusion We have developed a highly-resilient, scalable store for digital items We will need to archive a very broad range of content. The BL Digital Library System will be used by the legal deposit libraries to share legal deposit content However, this feels like the beginning of a very long journey! We will need considerable help along the way Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.