Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform.
1 Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
WMS: Democratizing Data
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
FOCUS: FOrmat CUration Service Advisor: Dr. Joseph JaJa Students: Sang Chul Song Muluwork Geremew.
May 23, 2007 Archiving ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced.
Archiving Digital Government Data Joseph JaJa Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University of Maryland.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
FOCUS – A Scalable and Extensible Digital Format Registry Principal Investigator: Joseph JaJa Graduate Students: Sang Song and Muluwork Geremew Lead Programmers:
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Persistent Digital Archives and Library System (PeDALS)
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
DSpace - Digital Library Software
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
NARA Report: NARA Persistent Archives Prototype Bill Underwood GTRI, Atlanta CCSDS, MOIMS DAI / IPR WGs Toulouse, 2 Nov-5 Nov 2004.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Building A Repository for Digital Objects
An Overview of Data-PASS Shared Catalog
An Introduction to Tessella and The Safety Deposit Box Platform
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
VI-SEEM Data Repository
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major Collaborators: Library of Congress, National Archives, Shoah Visual History Foundation, ICDL, SDSC, Georgia Tech, SLAC

Scientific Research Objectives Development of tools and technologies for automated ingestion and management of preservation processes. Evaluation and demonstration of tools on widely different collections. Overall layered architecture based on distributed repositories using open standards, web and data grid technologies Overall approach captures all essential elements of the Open Archival Information System (OAIS) Reference Framework.

Accomplishments Development of a Global Digital Format Registry prototype based on scalable and secure web technologies – FOCUS (FOrmat CUration Service). Automated ingestion tools and testing on the ICDL collection – ICDL Book Builder. Preliminary design of policy driven integrity auditing for distributed archives. Detailed design of a deep archive based on erasure-resilient codes.

Format Obsolescence Handling of digital formats is an essential part of long-term preservation Preservation of any object must include ways to render and transform the object if necessary. Needs to preserve Different essential aspects of objects. Tools for capturing the essential format characteristics of information stored as digital objects.

FOrmat CUration Service Maintains persistent information on digital formats and applications to access and manipulate them. Accessible either Directly through LDAP Or indirectly through SOAP (Web Services) Web Service Agent Format Registry LDAP SOAP

FOCUS on LDAP/SOAP Interoperability LDAP and SOAP provide the standard models and protocols, being platform independent. Scalability LDAP is a proven scalable technology. LDAP schema can be extended and server can be replicated with ease. SOAP server side can be extended without affecting client sides. Security SOAP can be on top of SSL (https). LDAP also provides its own secure authentication and authorization methods.

FOCUS Data Model dc=umiacs, dc=umd, dc=edu ou=Format- Registry ou=Applications Adobe Acrobat v6.0 Adobe Photoshop v7.0 Jhove 1.0 ou=Formats Adobe PDF v1.4 CompuServ GIF 1989a JPEG Image Format 2000  General descriptive properties.  Processing: rendering, editing, conversion and validation services/systems.  General descriptive properties.  Processing : format taken as input and/or output.

Validation Service Conversio n Service Web Service Agent Identificatio n Service Rendering Service Rendering Service FOCUS Service Model Format Registry Identifies format of a specific DO using the internal signature Determines a verification service to verify the format of a specific DO Identifies current rendering conditions for specific digital format. Locates transformation services to convert DO from source format to format of interest.

International Children’s Digital Library (ICDL) Joint project between UMD and the Internet Archive funded by NSF and IMLS (Allison Druin). Goal: efficient search, browsing, and reading of a collection of 10,000 books in 100 languages. Current holdings almost 1000 books in over 30 languages, with innovative book readers and browsing tools. Books are digitized in TIFF format, and processed in 6 sizes of JPEG2000 for each page of each book.

Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the archive. Use of web/grid technologies – platform independent Ease of integration with data grids or digital libraries. XML Representation of metadata and bitstream Self describing bitstream submissions Accountability of transfer and guarantee of data integrity Currently being used to ingest SLAC data into the National Archives.

More About PAWN Bitstream Validation Service Digital Archive Scheduler Producer1 Producer n Producer2

ICDL Book Builder Purpose: archive digital book collection of ICDL (International Children’s Digital Library). Builder allows users to: Select books from ICDL Map metadata from ICDL database Create Submission Information Packets(SIP) and transfer into PAWN

ICDL Ingestion Steps 5 step process: 1. User queries ICDL database with under given criteria (eg. Book id, title, # of pages, etc…) 2. Select books to ingest. 3. Choose mapping of ICDL metadata into Dublin Core 4. Download book contents and create SIP 5. Send packet to PAWN PAWN transfers to archive

Submission Information Packet (SIP) METS Handles all areas of a SIP except Physical Object and Descriptive Information Descriptive Information can be embedded into METS as 3 rd party XML schema Submission agreement constrains how a SIP is structured and described.

Collaboration Examples and Success Stories A prototype Producer-Archive Workflow Network (PAWN) is currently being used to ingest SLAC collection at NARA-II. Several parties have expressed interest in collaborating with us to further develop the design and implementation of the Global Digital Format Registry. Designed and built a “grid brick” for NARA-I, which is currently in use for demonstrating the distributed pilot persistent archive linking UMD, SDSC, NARA-II, and Georgia Tech.

Broad Impact A workshop organized by R. Moore, J. JaJa, and A. Rajasekar to assess the suitability of the SRB for long term preservation was held on Dec 8-9, Over 70 people from the archiving, digital library, and grid communities participated in the workshop. Interactions with NDIIPP partners, NARA partners, Don Sawyer’s group at NASA Goddard, etc.

Challenges We have to work with constantly changing requirements and assumptions as most of the non- technical issues are still open-ended in addition to the core problem of dealing with technology evolution. Graduate students would rather work in core disciplines. Open-ended research issues – no rigorous methodology to distinguish between different approaches, and no clear way to measure progress.