JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford.

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

European Commission DG Information Society Info Day Brussels, 2 June 2005 Focal points: 1. Concepts, methods and core services 2. Tools in Rich Environments.
Characterisation Adrian Brown The National Archives, UK.
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Williams Family Photo Album. Photo Album Project.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
Facilitating Data Exchange for Community Health Resilience Stephen Benedict Design + Construction Strategies November 30, 2011.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
Preservation and Long-term access through Networked Services Adam Farquhar, The British Library iPres2006 Cornell University, October 2006.
1 Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Software Factory Assembling Applications with Models, Patterns, Frameworks and Tools Anna Liu Senior Architect Advisor Microsoft Australia.
Durable Digital Repositories: The DSpace Project Bill Jordan University Libraries.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Repositories collect lots of technical metadata, but lack tools to use it to better understand the objects in their care, and to apply it precisely in.
Fundamentals of XML Management Greg Alexopoulos Systems Engineer Documentum.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Update on UDFR (Unified Digital Format Registry) NDIIPP Meeting June 25, 2009 Andrea Goethals.
TYX CORPORATION Page 1 © Copyright TYX Corporation 2006 TYX TestBase Development of Diagnostics with DSI eXpress and TYX TestBase For eXpress versions.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
HyperContent 2.0 Common Solutions Group September 21, 2005 Alex Vigdor, Columbia University.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
JH VE 2 The Fifth International Conference on Preservation of Digital Objects British Library, September 2008 What? So What? The Next-Generation.
HUB AND SPOKE TOOL SUITE PREMIS Implementation Fair – 7 October 2009 Bill Ingram Visiting Research Programmer University of Illinois at Urbana-Champaign.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Grid programming with components: an advanced COMPonent platform for an effective invisible grid © 2006 GridCOMP Grids Programming with components. An.
Using XML to store Descriptive Metadata Richard Murphy Rosarie O’Riordan Central Statistics Office Ireland.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
What is NDIIPP doing?. July 7 th, Web-At-Risk is opening its archives for public access, having captured nearly 6 TB of data—the entire CA State Government.
interactive logbook Paul Kiddie, Mike Sharples et al. The Development of an Application to Enhance.
Block 7: Reports Back to Plenary Group on CE and CI Working Group Activities Tasks and Activities -- October 22 DataONE Kick-off Meeting October 20-22,
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
1 Team Members: William Busby, Lindsey Gray, & David Meffe Sponsor: Lockheed Martin Reconnaissance Systems Bill Rawlings and Marvin Kliene.
MDD approach for the Design of Context-Aware Applications.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
1 Ontolog OOR-BioPortal Comparative Analysis Todd Schneider 15 October 2009.
Content Transfer NDIIPP Meeting July 9, 2008 Jane Mandelbaum, LC.
JH VE 2 JHOVE2 A Next-Generation Architecture for Format-Aware Characterization British Library, 1 October 2008 Stephen Abrams California Digital Library.
JH VE 2 Digital Library Federation Fall Forum Providence, November 12-14, 2008 JH VE 2 Needs Assessment and Functional Requirements Stephen Abrams California.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Utilizing the Benefits of Native XML Database Technologies Alan Cornish Systems Librarian Washington State University Libraries.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Nancy J. Hoebelheinrich, Metadata Coordinator, Stanford University 1 Metadata for the NGDA: Developing a Shared Approach Joint UCSB / Stanford meeting.
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Durable Digital Repositories: The DSpace Project Bill Jordan University Libraries.
Metadata Workflows. Metadata Specialist Scenario The typical digital library development situation facing the metadata specialist: –We have some functional.
Identifying Barriers To File Rendering In Bit-level Preservation Repositories A Preliminary Approach Kyle R. Rimkus, University Library Scott D. Witmer,
Scholarly Workflow: Federal Prototype and Preprints
An Introduction to Tessella and The Safety Deposit Box Platform
Using CuCMS: a workshop
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Presentation transcript:

JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007

JHOVE2 project Two year NDIIPP-funded collaborative project to develop “next generation” architecture for format-aware preservation processing –Harvard University Stephen Abrams, Gary McGath, Robin Wendler –Portico Evan Owens, John Meyer, Sheila Morrissey –Stanford University Tom Cramer, Richard Anderson, Hannah Frost, Rachel Gollub, Nancy Hoebelheinrich, Keith Johnson Open source –Educational Community License (ECL) –SourceForge

JHOVE2 project goals Refactor the existing architecture –Rectify known inefficiencies and idiosyncrasies –Simplify the process of integration –Encourage third-party extensions Provide enhancements –Separate identification from validation –Standardized error handling –Standardized handling of validation profiles –Standardized reporting using METS, with XSL transform –More sophisticated data model –Arbitrary processing modules

JHOVE2 project goals Develop modules –Signature-based identification using DROID –Validation and characterization –Symbolic display of selected binary formats –API-level editing capability –Policy-based assessment

Data model Implicit assumption in JHOVE –1 object = 1 file = 1 format But what about… –TIFF with embedded ICC profile and XMP metadata 1 object = 1 file = 3 formats –JPEG 2000 JPX fragmentation 1 object = n files = 1 format –ESRI Shapefile 1 object = 3 files = 3 formats JHOVE2 will support processing of complex aggregate objects and nested formatted bit streams –1 object = n files = m formats

Common “backplane” Outer loop is an iteration over digital objects Inner loop of processes applied against each object, passing a common memory structure while (has-another-object) { while (has-another-process) { process (object, state); }

Validation There is a useful distinction between well-formedness, validity, renderability, and usability –Well-formedness and validity are “bright line” determinations relative to a specification –Renderability is a “bright line” determination relative to a specific rendering tool –Usability is a “fuzzy” determination relative to local policies and heuristics

Policy-based assessment Evaluate objects based on prior characterization and locally-defined policy rules and heuristics, for example: –Risk of technological obsolescence –Risk of transformative loss Codify assessment methodologies and best practice recommendations Develop a formal language in which to express policy rules Implement a rules engine

Format support AudioAIFF, WAVE ColorICC DocumentPDF GISShapefile ImageGIF, JPEG, JPEG 2000, TIFF TextASCII, HTML, SGML, UTF-8, XML

Schedule 6 months of community outreach, requirements gathering, and design 6 months implementation of core APIs and the engine 1 year implementation of modules Continual prototyping and re-factoring

Questions (for you)? Do you care about the open source license ( ECL )? Do you care about the distribution platform ( SourceForge )? Do you have functional requirements or use cases? –How do you use JHOVE today? –What needs doesn’t it meet? What types of policy assessments do you perform? –How do you quantify risk? –What is your underlying assessment model? Are you aware of existing expression languages and engines for rules-based assessment?

Questions (for you)? What can we do to facilitate integration into existing (or planned) systems and workflows? What can we do to facilitate third-party development and extension? –What help would you need to implement your own modules? –Would you be interested in a co-development arrangement with the JHOVE2 project? Do you have interesting test files that you are willing to share?

Questions (from you)?