Presentation is loading. Please wait.

Presentation is loading. Please wait.

H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006.

Similar presentations


Presentation on theme: "H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006."— Presentation transcript:

1 H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006

2 H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry “The Global Digital Format Registry (GDFR) will provide sustainable services to collect, review, store, discover, and deliver significant representation information about digital formats.” –Centrally-organized collection and review –Distributed storage, discovery, and delivery via a peer-to-peer network

3 H a r v a r d U n i v e r s i t y L i b r a r y The GDFR project Harvard University Library (HUL) funded for 2 years by the Mellon Foundation Staffing and technical work subcontracted by HUL to OCLC (June 2006) Project oversight –Steering Committee (SC) for policy oversight –Technical Working Group (TWG) for technical oversight –Active solicitation of the international stakeholder community for review and comment

4 H a r v a r d U n i v e r s i t y L i b r a r y Deliverables Functional requirements Technical specifications Implementation plan (technology platform) Inter-nodal protocol Reference software implementation for nodes –Released under LGPL Editorial process Initial population Succession plan

5 H a r v a r d U n i v e r s i t y L i b r a r y Schedule Month 1Staffing, establish public web site Months 2-6Consultation, design, prototyping Public discussion planned for DLF Fall Forum, Boston, November 2006 Months 7-12Protocol, node implementation Months 13-18Initial population, inter-nodal testing Months 19-24Integration testing

6 H a r v a r d U n i v e r s i t y L i b r a r y What is a format? “A serialization of an abstract information model” –A set of syntactic and semantic rules for mapping from an information model to a byte stream (and, in most instances, for mapping back) Encompasses the nominal sense of “file format” as well as a range of conceptual models from the micro to the macro level –IEEE 754 floating point number … File system

7 H a r v a r d U n i v e r s i t y L i b r a r y GDFR network Peer-to-peer network communicating over a common protocol Structured delegation for distribution –DNS analogy “Root” node Top-level nodes –Distribution classes Local data Unvetted data Vetted data

8 H a r v a r d U n i v e r s i t y L i b r a r y Representation Information Identifiers Responsibility Classification Relationships Specifications Signatures Grammar Tools Assessment

9 H a r v a r d U n i v e r s i t y L i b r a r y Identifiers Canonical and alias identifiers in a variety of naming systems –Common usage“TIFF” –MIME“image/tiff” –PRONOM PUID“fmt/10” –LC FDD“fdd000022” Canonical GDFR-defined identifier in the “info” URI scheme

10 H a r v a r d U n i v e r s i t y L i b r a r y Responsibility Creator Owner Maintenance agency and process Legal conditions for use

11 H a r v a r d U n i v e r s i t y L i b r a r y Classification Ontological CLASSES, abstract families, concrete formats, and relationships BYTESTREAM IMAGE STILL RASTER GIF GIF87a GIF89ais-new-version-ofGIF87a JPEG ISO 10918-1 JFIFis-subtype-ofISO 10918-1 TIFF TIFF 4.0 TIFF 5.0is-new-version-ofTIFF 4.0 TIFF 6.0is-new-version-ofTIFF 5.0 TIFF/ITis-subtype-ofTIFF 6.0 TIFF/IT/CTis-subtype-ofTIFF/IT TIFF/IT/CT/P1is-subtype-ofTIFF/IT/CT

12 H a r v a r d U n i v e r s i t y L i b r a r y Relationships Subtype ASCIIis-subtype-ofUTF-8 UTF-8has-subtypeASCII Version TIFF 6.0is-version-ofTIFF 5.0 TIFF 5.0has-versionTIFF 6.0 Encapsulation WAVEcan-containμ-law μ-lawis-contained-byWAVE Affinity JPEGis-similar-toSPIFF SPIFFis-similar-toJPEG

13 H a r v a r d U n i v e r s i t y L i b r a r y Specifications Bibliographic citation, including descriptive (e.g. ISBN) and actionable (e.g. (URI) identifiers IP considerations probably prohibit the free distribution of specification documents

14 H a r v a r d U n i v e r s i t y L i b r a r y Signatures External –Generally indicative –File extension(s) Internal –Generally dispositive –Magic number –Other well-defined internal syntactic structures

15 H a r v a r d U n i v e r s i t y L i b r a r y Grammar Formal notation of a format Typed to permit multiple parallel formulations, e.g. BNF, ABNF, BSDL, DFDL, EAST May be feasible only for relatively simple formats

16 H a r v a r d U n i v e r s i t y L i b r a r y Tools Services, systems, and tools using formats as inputs or outputs Described in terms of some functional taxonomy, e.g. edit, transform, render

17 H a r v a r d U n i v e r s i t y L i b r a r y Assessment Format-specific risk assessment Typed to permit multiple parallel formulations –LC Sustainability/Quality & Functionality (SQF) –OCLC INFORM –DSTC PANIC –Cornell Virtual Remote Control (VRC)

18 H a r v a r d U n i v e r s i t y L i b r a r y General development goals First create a generalized registry framework, then specialize it for the GDFR application –To the extent that this does not effect other goals and schedules Platform/network transport independent Full information content of GDFR is expressible in XML form GDFR network is re-instantiatable from its XML expression

19 H a r v a r d U n i v e r s i t y L i b r a r y Related Work PRONOM www.nationalarchives.gov.uk/pronom/ Representation Information Registry/Repository dev.dcc.ac.uk/twiki/bin/view/Main/DCCRegRepV04 LC Digital Formats Web www.digitalpreservation.gov/formats/ NARA GDFR governance investigation


Download ppt "H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006."

Similar presentations


Ads by Google