Presentation is loading. Please wait.

Presentation is loading. Please wait.

The National Archives Washington DC July 10, 2008

Similar presentations


Presentation on theme: "The National Archives Washington DC July 10, 2008"— Presentation transcript:

1 The National Archives Washington DC July 10, 2008
GDFR Pilot Discussion The National Archives Washington DC July 10, 2008

2 Agenda Introductions – (All) Purpose of meeting – (Dale)
Roles – (Dale, Richard) Background/history – (Stephen) GDFR Governance Workshop – (Richard, Robert) Architecture – (Stephen) Current state – (Andrea) Relationship to PRONOM – (Andrea) Issues and observations – (Dale) Use cases – (Andrea) Discussion of pilot – (All) Review next steps from GDFR Governance Workshop Report – (Richard, Robert) Outreach to other interested parties – (All) Next steps – (All)

3 Introductions All

4 Purpose of the meeting Dale Flecker

5 Harvard – Dale Flecker NARA –Richard Steinbacher
Roles Harvard – Dale Flecker NARA –Richard Steinbacher

6 Background/History Stephen Abrams

7 Background/History Format is the key piece of representation information that permits preservation activities to be focused on interpretable/renderable content, not just opaque bit strings ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f00 000002f40240ffeeffee fc d f SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 ...

8 Background/History Traditional methods of managing format information, e.g. the IANA MIME registry, are insufficiently descriptive and granular for effective preservation planning and intervention The application/word format is essentially defined as anything produced by the Word application TIFF 6.0, TIFF/IT, TIFF/EP, GeoTIFF,…  image/tiff

9 Background/History Two DLF-sponsored invitational workshops
Univ. Pennsylvania, January 2003 Washington, March 2003 Two independent demonstration projects FRED, John Ockerbloom, Univ. Pennsylvania FOCUS, Joseph JaJa, Univ. Maryland

10 Background/History Evolving consensus on scope
A forum for documenting normative definitions of format syntax and semantics A common facility to pool and share scarce technical expertise on a global basis A channel for the distribution of that expertise to the international community of preservation practitioners A foundation for additional value-added services requiring detailed knowledge of digital formats

11 Background/History Peer-to-peer network of independent, but cooperating registries

12 Background/History Harvard University Library (HUL) funded for 2 years by the Andrew W. Mellon Foundation Technical deliverables only; no funded governance/policy activity Staffing and technical work subcontracted to OCLC (July 2006)

13 NARA Governance Workshop
Richard Steinbacher Robert Chadduck

14 Architecture Stephen Abrams

15 Architecture A generic distributed registry framework, specialized for the GDFR application Based on well-known products and protocols Human and machine interfaces Full information content expressible in XML form; can be re-instantiated from that expression Platform independence Globally fault tolerant Open source

16 Architecture Data model is an extension of PRONOM 4

17 Architecture Based on the OCLC IWSA/RFA framework

18 Architecture Java, Apache/Tomcat, Berkeley DB XML GNU LGPL license
Including technology newly-developed for the project and pre-existing OCLC technology

19 Current state Andrea Goethals

20 Current state: schedule
July 31, 2008 Contract with OCLC ends GDFR source node at Harvard goes public in beta mode August 2008 up to August 2010 Harvard maintains GDFR software, website and source node

21 Current state: GDFR Home website
It moved! Old GDFR Home: New GDFR Home: All existing GDFR docs migrated from the old GDFR Home website Over the next month Updated documentation! Demo source node?

22 Current state: architecture
Currently: One GDFR source node Where all data additions and edits are performed Many GDFR mirror nodes Replicated data Future? Multiple GDFR source nodes? Multiple interoperable format registry source nodes? “Discoverable” from GDFR Home website Each node has 2 Interfaces For humans: user interface For machines: web service interface

23 Current state: GDFR source node
Housed by Harvard for now Populated with test data- ~2000 formats from Magic database Need authorized account to add/edit data

24 Current state: GDFR mirror nodes
Test mirror nodes at OCLC and Harvard Anyone can run a mirror node Synchronize data with the source node Can brand your mirror node

25 Current state: Mirror node set-up
Dependencies Apache 2 (mod_rewrite, mod_jk, mod_perl2) Tomcat 5.5.x Berkeley DBXML Perl 5.8.x Java 1.5 Installation & configuration – half day

26 User interface Mirror node Source node Sneak preview
Search, browse, lookup/retrieve, export, manage node Source node Same as mirror node Plus: add, edit Sneak preview

27 Current state: machine interface
Web services using SRU Can do everything supported by the human user interface Except browsing Plus mirror-to-source node synchronization

28 Relationship to PRONOM
Andrea Goethals

29 Relationship to PRONOM – what’s the problem?
Two different “format” registries Overlapping but digressing data model No common format model No mechanism to exchange data PRONOM is in production, GDFR is not yet PRONOM has been publicly available for over 4 years and is used by some preservation repositories Interoperates with DROID Basis for PLANET projects How many format registries does the digital preservation community need? Depends on how different they are…

30 Relationship to PRONOM – core differences
Who governs the registry and makes policy, scope and enhancement decisions? PRONOM: TNA GDFR: community-based Who adds and edits format information? PRONOM: TNA (does take addition requests) Where is the format information physically located? PRONOM: at TNA GDFR: replicated in different geographic locations

31 Relationship to PRONOM – what’s the solution?
Recognize there is a problem – DONE Mutual willingness to resolve TNA desire to participate in a GDFR pilot Common web service API across the registries? PRONOM could become a GDFR node PRONOM and GDFR could each support a new web service API Cross-walk PRONOM PUIDs and GDFR GFIDs? Use common format identification tools (DROID, JHOVE, etc.) with either registry

32 Issues and Observations
Dale Flecker

33 Use cases Andrea Goethals

34 Use cases – 3 sets (see handout)
Higher-level use cases submitted by many institutions (early 2003) Lower-level use case model created for the software design (2006-7) Use cases arising from informal talks and meetings

35 Key use cases – discussed but not supported
Determine duplicates Notifications/warnings Determine migration/emulation pathways Determine at-risk formats (machine-actionable risk assessments) Support the registry & discovery of GDFR nodes Authentication of nodes and users (outside the UI) Storage of local profiles separate from central formats Synchronizations based on vetted or non-vetted data Determine “quality” of format information Multiple source nodes

36 Use cases- common issues
How evaluative should GDFR be? Neutral vs judgmental Are services in the scope of GDFR? Should GDFR provide services directly (notifications, validation, etc.) or should GDFR be a reference that can be used by external services?

37 Discussion of pilot All

38 Discussion of pilot Purposes

39 Discussion of pilot Pilot use cases

40 Discussion of pilot Process

41 Discussion of pilot Participants

42 Review next steps from the GDFR Governance Workshop Report
Richard Steinbacher Robert Chadduck

43 Outreach to other interested parties
All

44 Next steps? All

45


Download ppt "The National Archives Washington DC July 10, 2008"

Similar presentations


Ads by Google