Presentation is loading. Please wait.

Presentation is loading. Please wait.

JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford.

Similar presentations


Presentation on theme: "JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford."— Presentation transcript:

1 JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007

2 JHOVE2 project Two year NDIIPP-funded collaborative project to develop “next generation” architecture for format-aware preservation processing –Harvard University Stephen Abrams, Gary McGath, Robin Wendler –Portico Evan Owens, John Meyer, Sheila Morrissey –Stanford University Tom Cramer, Richard Anderson, Hannah Frost, Rachel Gollub, Nancy Hoebelheinrich, Keith Johnson Open source –Educational Community License (ECL) –SourceForge

3 JHOVE2 project goals Refactor the existing architecture –Rectify known inefficiencies and idiosyncrasies –Simplify the process of integration –Encourage third-party extensions Provide enhancements –Separate identification from validation –Standardized error handling –Standardized handling of validation profiles –Standardized reporting using METS, with XSL transform –More sophisticated data model –Arbitrary processing modules

4 JHOVE2 project goals Develop modules –Signature-based identification using DROID –Validation and characterization –Symbolic display of selected binary formats –API-level editing capability –Policy-based assessment

5 Data model Implicit assumption in JHOVE –1 object = 1 file = 1 format But what about… –TIFF with embedded ICC profile and XMP metadata 1 object = 1 file = 3 formats –JPEG 2000 JPX fragmentation 1 object = n files = 1 format –ESRI Shapefile 1 object = 3 files = 3 formats JHOVE2 will support processing of complex aggregate objects and nested formatted bit streams –1 object = n files = m formats

6 Common “backplane” Outer loop is an iteration over digital objects Inner loop of processes applied against each object, passing a common memory structure while (has-another-object) { while (has-another-process) { process (object, state); }

7 Validation There is a useful distinction between well-formedness, validity, renderability, and usability –Well-formedness and validity are “bright line” determinations relative to a specification –Renderability is a “bright line” determination relative to a specific rendering tool –Usability is a “fuzzy” determination relative to local policies and heuristics

8 Policy-based assessment Evaluate objects based on prior characterization and locally-defined policy rules and heuristics, for example: –Risk of technological obsolescence –Risk of transformative loss Codify assessment methodologies and best practice recommendations Develop a formal language in which to express policy rules Implement a rules engine

9 Format support AudioAIFF, WAVE ColorICC DocumentPDF GISShapefile ImageGIF, JPEG, JPEG 2000, TIFF TextASCII, HTML, SGML, UTF-8, XML

10 Schedule 6 months of community outreach, requirements gathering, and design 6 months implementation of core APIs and the engine 1 year implementation of modules Continual prototyping and re-factoring

11 Questions (for you)? Do you care about the open source license ( ECL )? Do you care about the distribution platform ( SourceForge )? Do you have functional requirements or use cases? –How do you use JHOVE today? –What needs doesn’t it meet? What types of policy assessments do you perform? –How do you quantify risk? –What is your underlying assessment model? Are you aware of existing expression languages and engines for rules-based assessment?

12 Questions (for you)? What can we do to facilitate integration into existing (or planned) systems and workflows? What can we do to facilitate third-party development and extension? –What help would you need to implement your own modules? –Would you be interested in a co-development arrangement with the JHOVE2 project? Do you have interesting test files that you are willing to share?

13 Questions (from you)?


Download ppt "JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford."

Similar presentations


Ads by Google