Archive Ingest and Handling Test: ODU’s Perspective Michael L. Nelson Department of Computer Science Old Dominion University

Slides:



Advertisements
Similar presentations
Using OAI-PMH for Resource Exchange OAI Metadata Harvesting Workshop, JCDL 03 Michael L. Nelson, Terry L. Harrison Old Dominion University Norfolk VA
Advertisements

Depositing e-material to The National Library of Sweden.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
Information Retrieval in Practice
WMES3103 : INFORMATION RETRIEVAL
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAIResource Software Her This work supported in part by the.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
U.S. Department of the Interior U.S. Geological Survey Information Technology Exchange Meeting May 24 – 28, 2010 A New Decade in Support of Science Thinking.
Drag and Drop Display and Builder. Timofei B. Bolshakov, Andrey D. Petrov FermiLab.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Dynamic Web File Format Transformations with Grace Daniel S. Swaney, Frank McCown, and Michael L. Nelson Old Dominion University Computer Science Department.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland A New Model for Web Resource Harvesting Her This work supported.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Van de Sompel, Herbert Los Alamos National Laboratory – Research Library OAI-PMH for Resource Harvesting.
Fedora and OAIS Glen Robson OAIS Model.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Archive Ingest and Handling Test: ODU’s Perspective Michael L. Nelson Department of Computer Science Old Dominion University
Vidispine Data Model Vidispine Bootcamp. Overview Collection Storage File Item Shape Item Component Shape Component Metadata abstract entity physical.
Repository Synchronization Using NNTP and SMTP Michael L. Nelson, Joan A. Smith, Martin Klein Old Dominion University Norfolk VA
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Challenges in the Nursery: Linking a Finding Aid with Online Content Elizabeth Johnson, Lilly Library Jenn Riley, Digital Library Program DL Brown Bag,
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
UKOLN is supported by: Content packaging and MPEG-21 DID Andy Powell, UKOLN, University of Bath JISC Joint Programmes Meeting, July.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
MathArc Co-operating Preservation Archives Sharing Collections Among Dissimilar OAIS Repositories William Kehoe, Adam Smith, Marcy Rosenkrantz Cornell.
Evaluating Ingest Success: Using the AIHT Michael L. Nelson, Joan A. Smith Department of Computer Science Old Dominion University Norfolk VA DCC.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Managing ETDs with Associated Complex Digital Objects Gabrielle V. Michalek Director, Scholarly Publishing, Archives and Data Services Carnegie Mellon.
Mod_oai: Metadata Harvesting for Everyone Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Aravind Elango
Transparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. Morabito Lib Magazine, 11(1), 2005
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
Building Digital Archives Mark Phillips Cathy Hartman June 6, 2008.
Ingest and Dissemination with DAITSS
DAITSS and the Florida Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Building Search Systems for Digital Library Collections
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
NASA Technical Report Server (NTRS) Project Overview April 2, 2003
Introduction to Digital Libraries Week 6: Complex Objects, Part 1
Just-In-Time Recovery of Missing Web Pages
Introduction to Digital Libraries Week 8: Complex Objects, Part 1
Introduction to Digital Libraries Assignment #3
Metadata to fit your needs... How much is too much?
Characterization of Search Engine Caches
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #2
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #4
Introduction to Digital Libraries Assignment #2
Presentation transcript:

Archive Ingest and Handling Test: ODU’s Perspective Michael L. Nelson Department of Computer Science Old Dominion University NDIIPP Partners Meeting, Airlie House, VA, July

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Fortress Model 1.Get a lot of $ 2.Buy a lot of disks, machines, tapes, etc. 3.Hire an army of staff 4.Load a small amount of data 5.“Look upon my archive ye Mighty, and despair!” image from: Five Easy Steps for Preservation:

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July ODU’s Research Goals We’re in the CS department, not the library –Less infrastructure (bad) –More freedom (good) Interested in repository/object interaction –Long-range vision: repositories fade away; objects are responsible for their own preservation –Could we accomplish this with our “bucket” technology? Significant questions about archive granularity Transition to MPEG-21 Digital Item Declaration Language (DIDL) based buckets New models for digital preservation?

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Buckets Buckets: self-contained, web-accessible objects –Grew out of research for serving NASA documents, esp. NACA Reports – implicit assumptions: 1 bucket = 1 logical item (N physical items) Display is for human use Bucket contents are DOM-parsable

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Which Interface? Display based on web useDisplay based on archival use

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Bucket / MPEG-21 Model MPEG-21 DIDL Payload Bucket Infrastructure methods logs support libraries

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July MPEG-21 DIDL A generic, powerful complex object metadata format –Based on an abstract data model –Semantics separated from syntax i.e. the tags don’t mean anything -- a little disconcerting at first glance –Digital library use championed by LANL

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July MPEG-21 DIDL Data Model How to encode Archive? 1 file = 1 DID 1 archive = 1 container 1 archive = 1 component 1 file = 1 component

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July File = 1 Component 8 file archive for demo purposes…

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Looking Inside the Archive

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Looking at a Single File…

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Design Decisions: File Storage Store each file as a –Big: each file is base64’d into the DIDL –Small: each file is ref’d from the DIDL to a directory Filename = MD5 hash of the original file name (not contents!) + a version number Example:

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Archive Sizes

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Design Decisions: Ingestion For every program/process to apply to a file, create a corresponding –Jhove –Unix “file” –Fred URI –MD5 of file contents Expandable, scriptable list of metadata extraction / analysis programs Ingestion is parallelized over a workstation cluster

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Example Output: MD5 perl/Digest::M D a1bcd2b e7cf05f36066d4cdc9cf

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Conversion: AVI -> VOB Investigated PDF -> SVG, but tools were not mature Selected “transcode” for AVI -> VOB conversion – Also implemented ImageMagick based rules for standard graphics conversion

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Conversion: Linking Old to New If the previous version of the Resource was specified as: then the new version of the resource is specified as:

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July Harvard Ingest Harvard’s model was the most similar to our MPEG-21 model Ingesting from another archive is (roughly) the same as initial ingest –Save any metadata that was delivered in the original METS file as a We don’t trust it, but it might be useful for future forensics –Re-ingest in the normal way Our export is part of the bucket API: – External Metadata image/jpeg Canon Canon EOS D

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July “In Vivo” Preservation As part of the ingest process, we looked for copies of the ingested web page in the “living web” –Idea: find all replicated / similar pages and maintain pointers to them –Problem: We could find related documents, but finding copies was difficult Term Frequency (TF) – easy to compute Inverse Document Frequency (IDF) – difficult to compute Solution: lexical signatures, Phelps & Wilensky: – –Spinoff research: Terry Harrison’s MS thesis Frank McCown’s Ph.D. dissertation Joan Smith’s Ph.D. dissertation NSF proposal on “in vivo” preservation

Archive Ingest and Handling Test: ODU’s Perspective NDIIPP Partners Meeting, Airlie House, VA, July The DIP is the TMD* Using METS or MPEG-21, there is no need for a separate transfer metadata format METS & MPEG-21 can be the lumps of XML exchanged between harvesters & repositories – sompel/12vandesompel.html Web servers can be made to automatically expose their contents via OAI-PMH – * Eat your heart out, Marshal McLuhan Figure 1, Bekaert & Van de Sompel