IBM Experiences in Digital Collection Building Virtual Libraries

Slides:



Advertisements
Similar presentations
Permanent access to the records of science: The e-Depot at the Koninklijke Bibliotheek Current Status & Developments Erik Oltmans Manager e-Depot Koninklijke.
Advertisements

Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Permanent access to digital resources Digital Archiving at the national library of the Netherlands Erik Oltmans Head, Acquisitions & Cataloguing Division.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Preservation of Software Barbara Sierman (digital preservation manager) E-Humanities Software and Tools Sustainability,
Digital Archiving at the national library of the Netherlands Hans Jansen Director, Research & Development Kansai-kan, Japan, 16 March 2007.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Preservation of e-journals at the Koninklijke Bibliotheek Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library.
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
All Things to All People Combining Resources to Build an Integrated Digital Repository Preservation and Access for Electronic College and University Records.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Merging the National Library and the National Archives LIBER General Annual Conference, Tartu, June 2012 Els van Eijck van Heslinga, Head Finance and Corporate.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
The KB e-Depot Deposit practice for electronic journals Erik Oltmans, Head Acquisitions & Processing UK Serials Group June 7, 2005.
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Deposit of Electronic Publications in the Netherlands Johan Steenbakkers e-ICOLC Conference 2002.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Implementation and partnership with industry: a case Koninklijke Bibliotheek & IBM Netherlands Hans Jansen Head Research & Development Koninklijke Bibliotheek.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
OAIS (archive) OAIS (archive) Producer Management Consumer.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Ingest and Dissemination with DAITSS
BIL 424 NETWORK ARCHITECTURE AND SERVICE PROVIDING.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
OAIS Producer (archive) Consumer Management
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
DAITSS and the Florida Digital Archive
An Introduction to Tessella and The Safety Deposit Box Platform
Statewide Digitization and the FCLA Digital Archive
Introduction to Implementing an Institutional Repository
An Overview of MPEG-21 Cory McKay.
Implementing an Institutional Repository: Part II
Research data preservation in Canada
An Open Archival Repository System for UT Austin
Open Archival Information System
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Robin Dale RLG OAIS Functionality Robin Dale RLG
The Reference Model for an Open Archival Information System (OAIS)
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

IBM Experiences in Digital Collection Building Virtual Libraries

Night Watch by Rembrandt, 1642 A few year ago the painting was damaged by a mentally disturb man with a knife and had to be repaired Digital Collection Building 19-8-2019

Long Term Preservation What is digital preservation? Digital preservation can be defined as: “The act of maintaining information in a correct and independently understandable form, over the long term (OAIS)”. Safe storage and permanent access are both part of digital preservation Connection between the research area of digital preservation and a national digital repository for digitized cultural heritage Digital Collection Building 19-8-2019

Long Term Preservation challenge Sense of urgency to resolve problem is increasing: More and more objects are only available in digital form (“born digital”) There is no mechanism for systematic preservation of information published on the Web national digital repository for digitized cultural heritage No complete integrated solutions exist yet, but: there are many initiatives in the Academic world the topic is on the agenda of the European Commission The DIAS system can be a solution: to structure the complex problem to resolve – already today - a part of the problem to facilitate gradual incorporation of further partial solutions Digital Collection Building 19-8-2019

Technology Preservation Within Long Term Preservation (LTP) three areas of interest can be identified: Medium-, Technology-, and Intellectual Preservation Medium Preservation Medium preservation is the concern for preserving the medium on which information is stored, such as tapes, disks, optical disks, CD-ROMs and the like. Technology Preservation We need to be aware of technology obsolescence as even more of a problem than medium decay, and undertake steps of technology preservation. Rather than simply refreshing, we also need migration and emulation: migrating information forward through technology / format stages as they become available and as the old technologies / formats cease being supported by vendors and the user community. emulating old and obsolete technologies / formats on current technology platforms Intellectual Preservation There remains a third preservation requirement, intellectual preservation, which addresses the integrity and authenticity of the information as originally recorded. Digital Collection Building 19-8-2019

Introduction Koninklijke Bibliotheek – Library of the Netherlands KB was one of the first libraries to develop an electronic deposit with long-term preservation as the key objective Koninklijke Bibliotheek (KB) Medium-sized national library, founded in 1798 Financed by Ministry of Education, Culture and Science Annual budget € 50 million Digital archiving and R&D Permanent Preservation: € 1,1 million structurally allotted to staff, system maintenance € 1,2 million permanently dedicated to research Digital archiving and preservation embedded in organization Departments: e-Depot (electronic publications) Digital Preservation IT Digital Collection Building 19-8-2019

Mission The KB is the National Library of the Netherlands We give researchers and students access to research information  We enable everyone to share in the riches of our cultural heritage We foster the national infrastructure for scientific information We further permanent access to digital information within an international context. Digital Collection Building 19-8-2019

E-Depot – Digital Information Archiving System KB is responsible for all publications appearing in the Netherlands (and international publications from 2002) At the heart of the e-Depot system is a technical component called DIAS (Digital Information and Archiving System) based on the OAIS (Open Archival Information System) Reference Model. This Reference Model establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It is also a technical recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information. e-Depot system at the KB is one of the first operational digital archives based on the OAIS, worldwide. Digital Collection Building 19-8-2019

Digital preservation research KB’s research initiatives are geared towards ensuring long-term access - on the digital preservation strategies emulation and migration. Emulation aims to render the digital objects in their original form and to preserve all functionality of the objects. Migration strategy is focused on converting the digital objects to provide access to them in a future computer environment. The digital preservation department is looking into properties of file formats, preservation metadata and the development of a module called the Preservation Manager. The Preservation Manager contains information about how to render digital objects on the file format level. The Preservation Manager will provide the information necessary to be able to plan actions before a specific file type becomes inaccessible. Digital Collection Building 19-8-2019

Scale Current activities are focused on e-publications based on PDF Volume 8 million e-publications currently 10 million e-publications eventually (based on current agreements) Size 1 e-publication equals 1 Mb on average 1 Terabyte for every 1 million publications Capacity 5,000 – 50,000 e-publications ingested per day Digital Collection Building 19-8-2019

Requirements of a national digital repository A function for depositing collections and corresponding metadata A way of searching in deposited collections A request function for specific TIFF files or a complete collection of TIFF files A delivery function for the requested files A security system that should prevent the unwarranted use of collections by other institutions or third parties. The service should be available on a daily basis and independent of the location of the institutions. Development of a data model that facilitates long-term preservation and permanent access to digital images. Digital Collection Building 19-8-2019

Implementation of research results: Upcoming challenges Growth will increase dramatically with the itroduction of new collections Increasing capacity Ingest of digitised master images Ingest of websites 10 TB  500 TB Implementation of research results: Characterisation Operational Preservation Manager Migration module Emulator Digital Collection Building 19-8-2019

Digitization The KB is involved in many national and international digitization projects The Memory of the Netherlands Metamorfoze IMPACT: IMProving ACcess to Text Atlantic World: Dutch-American relationship since the 17th Century Atlas Van der Hagen and Atlas Beudeker Bibliotheca Universalis- Linschoten Bookbindings The Anglo-Dutch heritage Transatlantic Digital Library Digital Collection Building 19-8-2019

Digitization KB has looked into the potential strategies to store digitized material into the their electronic deposit system Each digitized page results in: One high quality master image (TIFF or PNG or JPEG2000) One or more derived lower quality images (e.g. JPG for access) Multiple machine-readable text files (e.g. obtained through OCR) Descriptive metadata Technical metadata about the digitization process Structural metadata describing layout of page (e.g. separate articles) In most cases, each digitized page is part of a larger whole Most books, newspapers are multipage items Requires additional structural metadata to describe images as part of the whole Digital Collection Building 19-8-2019

Digitization Open questions How to link and manage the different representations of the same object One physical representation, one digital master, multiple derived files – only one bibliographic description MPEG-21 DIDL Which fileformat to choose TIFF is very inefficient for storage, JPEG2000 and PNG are more efficient Certainly not JPG (lossy compression) Is lossless compression allowed to reduce storage volume At what level? Compressed file formats, zip/tar packages, storage hardware compression Which files constitute an AIP? A single master image (not a correct representation of a multipage intellectual entity; relations to be resolved outside AIP) Full set of master images for each intellectual entity Including OCR text Are access copies to be considered a new AIP obtained through migration Access copies vs. Preservation copies Storing full set of access copy vs. generating access copies on the fly Digital Collection Building 19-8-2019

Component Business Model Conceptual Approach: A Business Component Map is a is a tabular view of the business components in the scope of interest (Source; Internal 2004) Columns are Business Competencies, defined as large business areas with characteristic skills and capabilities, for example, product development or supply chain. A Business Component is a part of an enterprise that has the potential to operate independently, in the extreme as a separate company, or as part of another company. controlling executing directing Business Planning Business Unit Tracking Sales Management Credit Assessment Reconciliation Compliance Staff Appraisals Relationship Management Sector Management Product Management Production Administration Product Fulfillment Sales Marketing Campaigns Product Directory Credit Administration Customer Accounts General Ledger Document Management Customer Dialogue Contact Routing Staff Administration Business New Business Development Servicing & Sales Product Fulfilment Financial Control and Accounting Sector Planning Portfolio Planning Account Planning Sales Planning Fulfilment Planning An Accountability Level characterizes the scope and intent of activity and decision-making. The three levels used in CBM are Directing, Controlling and Executing. Directing is about strategy, overall direction and policy. Controlling is about monitoring, managing exceptions and tactical decision making Executing is about doing the work Digital Collection Building 19-8-2019

KB´s Component Business Model The competencies are clustered around 5 business areas Service Management includes all the competencies needed to deliver collection objects and associated services to the customers of the KB across the supported channels Acquisitions Management involves the competencies needed to acquire, process and catalogue all publications, both for the research collection and the deposit collection. Preservation Management focuses on the competencies needed to facilitate access to the different collections over-time addressing decay or obsolete technology support associated with a collection. Business Management identifies all the competencies associated with the management of any business IT Management relates to the competencies needed to manage the overall IT infrastructure. KB differs from other organizations because their electronic deposit solution should be operational for hundreds of years without major interruptions Digital Collection Building 19-8-2019

KB´s Component Business Model 51 competencies have been identified in order for the KB to operate Digital Collection Building 19-8-2019

Digital Information Archiving System Architecture Digital Collection Building 19-8-2019

Complexity and data types Web Sites: Dynamic generated pages Volatile external references No general accepted notion of versioning Security hurdles on web sites Needed Web server environment Applications / CDROMS: Dependent on operating system and peripheral devices Require user interaction Potential dependencies with additional software: driver, dlls Static data format Only dependent upon viewer application Format migration will preserve digital object Autonomous digital object Data type Digital Collection Building 19-8-2019

Depending on the main focus different implementation strategies can be identified: Supplier Focus Quality assurance Automatic ingest Security (identification, authentication, authorization) Consumer Focus Delivery channels Content metadata Security (identification, authentication, authorization) Decompose into components AIP composition SIP requirements Self-describing Content (bibliographical) metadata Preservation (technical) metadata Preservation Focus Media preservation Technical metadata Migration tooling Emulation tooling Digital Collection Building 19-8-2019

Descriptive Information about the Package Information Packages are the building blocks of every electronic deposit Submission Information Package (SIP) The Information Package identified by the producer in the submission agreement with the OAIS. Archival Information Package (AIP) Content Information and the associated Preservation Description Information required to preserve the Content Information over the long term. This information includes the related Packaging Information. Dissemination Information Package (DIP) An Information Package that contains part or all of one or more AIPs and that is distributed to the consumer as requested. Information Package Content Information Preservation Description Descriptive Information about the Package Digital Collection Building 19-8-2019

Producer Focus: Infrastructure to support different ingest streams (Batch Builder) Stable SIP and DIP definitions Alignment QA processes between supplier and deposit organization Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building 19-8-2019

Consumer Focus: Solid identification, authentication and authorization infrastructure Support for multi-channel delivery (Packaging & Delivery) Collection building (value-add of deposit organization) Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building 19-8-2019

Preservation Focus: Management of technical metadata Strong focus on providing accessibility (identification, performing technical preservation, building rendering environment) Definition of authenticity criteria Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building 19-8-2019

Typical DIAS implementation scenario Requirements analysis Project starts with requirements analysis, a fit-gap analysis and see how far DIAS-Core fits all requirements Application development Possible new applications components have to be designed, built and tested Infrastructure design and implementation Definition and implementation into the customer’s environment, conforming to the DIAS compliant infrastructure Delivery Through Fast Deploy deliver the DIAS solution in the different environments (DTA) Acceptance Acceptance of the system by the customer Digital Collection Building 19-8-2019

Interoperability Interoperability will be key to the success of the next generation long-term electronic deposit systems Digital Collection Building 19-8-2019

Reference Model for an „Open Archival Information System" – ISO 14721 Digital Collection Building 19-8-2019

IBM´s DIAS is based on some OAIS modifications developed in the Networked European Deposit Library (EU-Projekt 2000 – 2002) Delivery & Capture handles the pre-processing of digital objects to be ingested. It receives or captures digital objects and offers a working space for verification in conformance with the specifications for ingestion into the electronic deposit system. Packaging & Delivery is the output interface of the deposit system. It handles the post-processing of digital objects retrieved from the electronic deposit system. It negotiates access requests, delivers and installs electronic publications along with the appropriate software for viewing or running the electronic publication and handles the metadata for direct access by the requestor. Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building 19-8-2019

DIAS-Core: Technical Components ISIP IDIP Ingest Access Archival Storage Preservation Planning Administration CM Resource Manager TSM Storage Server Loader Retriever Data Management AccessManager CM Library Server Logging Reporting Preservation Manager Monitoring and Control Digital Collection Building 19-8-2019

DIAS-Core: Application Architecture Client Layer Midtier Layer Server Layer Physical Storage Layer Admin Client Admin Access Manager Server Retriever Client Retriever DB2 CM LS CM RM1..n Tape Magnetic Disk SAN Loader TSM Optical Access Manager Logger Persistent Identifier Generator Monitoring & Control Digital Collection Building 19-8-2019

WebSphere Application Server Content Manager together with other members of the DB2, Tivoli and WebSphere product families covers the majority of components in the e-Depot architecture Archival Storage Ingest Preservation Data Management Access Delivery & Capture Packaging Administration External System Monitoring Content Manager DB2 WebSphere Application Server TSM Digital Collection Building 19-8-2019

Delivery&Capture and Package&Delivery Java applications running on Websphere Application Server take care of specialized DIAS functions and customer specific requirements Ingest and Access: Implement SIP and DIP interfaces Translate and validate customer metadata to DIAS metadata Delivery&Capture and Package&Delivery Customer functions to create SIPs and extract DIPs Customer functions for Preservation Manager Implement functions to monitor file formats and support migration of file formats Digital Collection Building 19-8-2019

Reference on long-term preservation Raymond Lorie and Raymond J. van Diessen: Long-Term Preservation of Complex Processes in IS&T Archiving Conference, Washington, DC, April 26-29, 2005. Hoeven, J.R. van der, Diessen, R.J. van en Meer, K. van der, Development of a Universal Virtual Computer (UVC) for long-term preservation of digital objects, Journal of Information Science, vol. 31(3), p. 196-208, 2005. Raymond van Diessen and Raymond Lorie, UVC: A Universal Computer for Long-Term Preservation of Digital Information, RJ 10338, IBM Almaden Research Center, San Jose, CA, 2005 Eric Oltmans, Raymond J. van Diessen, Hilde van Wijngaarden: Preservation Functionality in a Digital Archive in ACM/IEEE Joint Conference on Digital Libraries,Tucson, AZ, June 7-11, 2004 Raymond J. van Diessen and Titia van der Werf - Davelaar: Authenticity in a Digital Environment, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: 90-6259-155-8 Raymond J. van Diessen: Preservation Requirements in a Deposit System, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: 90-6259-156-6 Raymond J. van Diessen and Ben J. van Rijnsoever: Managing Media Migration in a Deposit System, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: 90-6259-158-2 Raymond J. van Diessen and Johan F. Steenbakkers: The Long-Term Preservation Study of the DNEP Project - an Overview of the Results, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: 90-6259-154-X Digital Collection Building 19-8-2019

DIAS Solution: http://www.ibm.com/nl/dias/ Important sites related to DIAS and IBM´s long-term preservation effort: DIAS Solution: http://www.ibm.com/nl/dias/ IBM Alphaworks UVC: http://www.alphaworks.ibm.com/tech/uvc KB: http://www.kb.nl/site/sitemap-en.html Kopal: http://kopal.langzeitarchivierung.de/ Digital Collection Building 19-8-2019