Presentation is loading. Please wait.

Presentation is loading. Please wait.

Global Digital Format Registry (GDFR)

Similar presentations


Presentation on theme: "Global Digital Format Registry (GDFR)"— Presentation transcript:

1 Global Digital Format Registry (GDFR)
University of Edinburgh, 4 November 2006 Global Digital Format Registry (GDFR) Stephen Abrams Harvard University Cambridge, Massachusetts, USA

2 Global Digital Format Registry
“The Global Digital Format Registry (GDFR) will provide sustainable services to collect, review, store, discover, and deliver significant representation information about digital formats.” Centrally-organized collection and review Distributed storage, discovery, and delivery via a peer-to-peer network

3 Format and digital preservation
Preservation is concerned with ensuring access to managed digital assets over time Thus, preservation activities are focused on Viability Fixity Authenticity Interpretability Renderability The last two are primarily a function of format

4 Without format typing, all content is opaque
ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f00 000002f40240ffeeffee fc d f 494d03ed0a f6c f 6e a

5 Without format typing, all content is opaque
ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f00 000002f40240ffeeffee fc d f 494d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ...

6 Without format typing, all content is opaque
ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f00 000002f40240ffeeffee fc d f 494d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ... Edward Burne-Jones (British, ) The Days of Creation: the First Day, Watercolor and gouache, 102.2×35.5 cm Fogg Art Museum, Harvard University, Bequest of Grenville L. Winthrop

7 What is a format? Informally, “a serialized encoding of an abstract information model” Encompasses the nominal sense of “file format” as well as a range of conceptual entities from the micro to the macro level IEEE 754 floating point number File system

8 GDFR project Two DLF-sponsored invitational workshops
University of Pennsylvania, January 2003 Washington, March 2003 Provisional data and service models Two independent demonstration projects FRED [John Ockerbloom, University of Pennsylvania] FOCUS [Joseph JaJa, University of Maryland] FRED, Format Registry Demonstrator, TOM (Typed Object Model) FOCUS, Format Curation Service (LDAP)

9 The GDFR project Harvard University Library (HUL) funded for 2 years by the Mellon Foundation Staffing and technical work subcontracted by HUL to OCLC (July 2006) Project oversight Steering Committee (SC) for policy oversight Technical Working Group (TWG) for technical oversight Active solicitation of the international stakeholder community for review and comment

10 General development goals
A generalized registry framework, specialized for the GDFR application Globally fault tolerant Platform independence Open source Re-use well-known products and protocols Human and machine interfaces Localization and accessibility Full information content expressible in XML form, and re-instantiatable from that expression

11 Data model ISO 11179, Information technology – Metadata registries (MDR) LC Digital Formats Web OASIS/ebXML Registry Information Model PRONOM Representation Information Registry/Repository dev.dcc.ac.uk/twiki/bin/view/Main/DCCRegRepV04 LC Caroline Arms/Carl Fleischhauer PRONOM Adrian Brown, TNA RIRR David Giaretta, JISC DCC

12 Format properties Canonical (GDFR) and alias identifiers Version
Description Classification Relationships Disclosure – open, proprietary, closed Documentation Orientation – text vs. binary Byte order Internal/external signatures – e.g. magic number/file extension

13 Format properties – taxonomy
Ontological CLASSES, abstract families, concrete formats, and relationships BYTESTREAM IMAGE STILL RASTER GIF GIF87a GIF89a is-new-version-of GIF87a JPEG ISO JFIF is-extension-of ISO TIFF TIFF 4.0 TIFF is-new-version-of TIFF 4.0 TIFF is-new-version-of TIFF 5.0 TIFF/IT is-extension-of TIFF 6.0 TIFF/IT/CT is-subtype-of TIFF/IT TIFF/IT/CT/P1 is-subtype-of TIFF/IT/CT TIFF/EP (ISO ) TIFF/IT (ISO 12639)

14 Format properties – relationships
Subtype ASCII is-subtype-of UTF-8 Extension DNG is-extension-of TIFF 6.0 Containment WAVE can-contain μ-law Equivalence DXF (ASCII) is-equivalent-to DXF (binary) Version TIFF 6.0 is-version-of TIFF 5.0 Affinity SPIFF is-similar-to JPEG

15 Format properties – documentation
Public domain specifications managed and replicated in the network For non-public domain, full bibliographic citation with actionable identifiers Mechanism for agents to register locally-held copy with terms of use

16 Format properties Grammar – ABNF, BNF, BSDL, DFDL, EAST
Assessment – LC SQF, OCLC INFORM, DSTC PANIC, VRC Dependencies – hardware, media, software Release date Withdrawal date Developer Support Rights Processes – using format as input/output Typed grammar, e.g. BNF, ABNF, BSDL, DFDL, EAST Typed assessment, e.g. LC SQF, OCLC INFORM, DSTC PANIC, Cornel VRC

17 GDFR network Peer-to-peer network communicating over a common protocol

18 What is a format? Four conceptual entities Three encodings
AIM Abstract information model CIS Coded information set (semantic) SIS Structural information set (syntactic) SBS Serialized byte stream Three encodings FEM Format encoding model FEM : AIM → CIS FEF Format encoding form FEF : CIS → SIS FES Format encoding scheme FES : SIS → SBS

19 What is a format? A format is a triple, F = (FCS, FEF, FES)
Informed by the Unicode character encoding model

20 TIFF AIM – discrete rectangular sampling of visual phenomena
CIS SIS SBS header [byte-order (‘MM’) uint8 uint8 4d4d version (42) uint a offset ] unit32 … ifd [count uint16 entry [tag type unit16 unit16 count value ] unit32 … … … offset ] unit32

21 Data set Files 1000s Bytes GB 7,422 7 38,646,601,502 36.0 9,742 10
53,715,244,688 50.0 11,592 12 68,072,456,800 63.4 14,282 14 91,196,250,606 84.9 119,584 120 109,148,953,709 101.7 119,633 110,026,051,163 102.5 119,635 110,038,664,163 119,963 111,134,003,731 103.5

22 Data set Syntactically, a tabular (delimited) set of numbers
Semantically defined by the codebook, schema, etc.

23 Summary The GDFR is an enabling technology that will support digital repository operations and preservation activities Enables the typing of digital objects at an appropriate level of granularity Enables the future recovery of the syntax and semantics associated with typed digital objects A means to pool and redistribute the expertise of the digital preservation community

24 For more information http://www.formatregistry.org/


Download ppt "Global Digital Format Registry (GDFR)"

Similar presentations


Ads by Google