A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008
Outline Background of MAAT From Website to Institutional Repository Long Term Preservation & OAIS The Hybrid Approach Future
MAAT – Background The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)
MAAT – Motivation A number of documents have been created and can be categorized into –questionnaires, –work sheets, –meeting records, –metadata mapping tables, –system specifications, –best practices of metadata standards, –technical reports, –research papers, –briefings, and –tutorial materials. Most documents of the MAAT website are arranged in a static manner.
MAAT Website Academia Sinica
MAAT - Consideration 1 Document management and repository –over 1,000 documents and URL links have been arranged and served at the MAAT website. –the MAAT website needs an effective system of document management. Access control –The MAAT website still lacks access control for document access.
MAAT - Consideration 2 Workflow reengineering –the MAAT website adopts a centralized model to maintain documents and website arrangement. –This model is very complicated and labor- intensive, and the overhead cost is very high. Usage Statistics Report
MAAT - Challenge Too many publications, Too much change (that is various document versions), Too many contributors, and Too many institutions.
Implementation Level Static Website Institution Repository Phase1: from website to IR
DSpace - feature Captures –Digital research material in any format –Directly from creators (e.g. faculty) –Large-scale, stable, managed long-term storage Describes –Descriptive metadata (Dublin Core) –Technical metadata (file size, format…) –Rights metadata (licenses, creative commons…) Distributes –Via WWW, with necessary access control Preserves –Persistent ID and Handle –Bitstream format registry
DSpace - Data Model
MAAT – Content 1 Content Type – 支援計畫 (Documents from the Projects we support) – 出版與活動 (Documents of Publication and Activity) – 計畫管理 (Project Management related – restricted documents) – 研究發展 (Research & Development - restricted documents) –48 Communities, 110 collections, 783 items Document Format –User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others –System generate: Over 1900 Plain Text files (mainly DSpace License files)…
MAAT – Content 2 Access Method –DSpace user browse and search interface –Search engines (google, yahoo…etc.) –OAI-PMH harvesting
MAAT DSpace
DSpace - Consideration The Need for Extending DSpace Storage Capabilities –The amount of documents grows so fast that an enormous size storage solution is required The Lack of Risk Management Mechanism –The Reliable Backup and Disaster Recovery Systems are not included in the default DSpace Installation
Implementation Level Statis Website Institution Repository Phase1: from website to IR Institution Repository + Grid Phase2: from IR to Long Term Preservation
DSpace/SRB Approach 1 In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to –allow DSpace to use the data grid as a storage layer –permit the exchange of authentic documents between them NARA Proposal & Participants –San Diego Super Computer Center (SDSC) Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program –MIT Libraries –UC San Diego Libraries (UCSD) –Hewlett Packard Laboratories (HP) –National Archives and Records Administration (NARA)
DSpace/SRB Approach 2 In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage. Both traditional and SRB storage are specified by configuration parameters. Both traditional and SRB bitstream stores are configured in dspace.cfg
Examination of DSpace/SRB An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community
OAIS Functional Model
Workflow
OAIS Functional Model…Again DSpace & SRB Administration DSpace RDBMS & SRB MCAT DSpace Submit Interface DSpace User Interface SRB Mass Storage DSpace Ingest DSpace Batch Import
Producer, Management and Consumer Producer –DSpace may play the role of ingest SIP from producer, and generate AIP for Management & Storage Management –SRB May play the role of receive AIP then Store & Manage data, and generate AIP for Access Consumer –DSpace May Play the role of process the access request and generate the proper DIP for dissemination DSpace RDBMS & SRB MCAT DSpace Submit Interface DSpace User Interface SRB Mass Storage DSpace Ingest DSpace Batch Import SIP AIP DIP
Archives arrangement Logical Archives structure: –DSpace allow multi-level communities and one level collection –Archive’s principle Principle of provenance Principle of respect des fonds Physical Files Arrangement: –SRB Mass Storage Technology
Future 1 Best Practice & SOP for DSpace/SRB integration Deeper Check Against Activities of OAIS Preservation Planning and policy –Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan
Future 2 Feasibility Evaluation –Migrate from SRB to others advanced technology, such as SRM, iRODS… –Adopt metadata approach to enhance digital preservation, such as PREMIS and METS (ex: structural map, behavior section…)
Thank You