Download presentation
Presentation is loading. Please wait.
Published byLee Shepherd Modified over 6 years ago
1
The National Archives Washington DC July 10, 2008
GDFR Pilot Discussion The National Archives Washington DC July 10, 2008
2
Agenda Introductions – (All) Purpose of meeting – (Dale)
Roles – (Dale, Richard) Background/history – (Stephen) GDFR Governance Workshop – (Richard, Robert) Architecture – (Stephen) Current state – (Andrea) Relationship to PRONOM – (Andrea) Issues and observations – (Dale) Use cases – (Andrea) Discussion of pilot – (All) Review next steps from GDFR Governance Workshop Report – (Richard, Robert) Outreach to other interested parties – (All) Next steps – (All)
3
Introductions All
4
Purpose of the meeting Dale Flecker
5
Harvard – Dale Flecker NARA –Richard Steinbacher
Roles Harvard – Dale Flecker NARA –Richard Steinbacher
6
Background/History Stephen Abrams
7
Background/History Format is the key piece of representation information that permits preservation activities to be focused on interpretable/renderable content, not just opaque bit strings ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f00 000002f40240ffeeffee fc d f SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 ...
8
Background/History Traditional methods of managing format information, e.g. the IANA MIME registry, are insufficiently descriptive and granular for effective preservation planning and intervention The application/word format is essentially defined as anything produced by the Word application TIFF 6.0, TIFF/IT, TIFF/EP, GeoTIFF,… image/tiff
9
Background/History Two DLF-sponsored invitational workshops
Univ. Pennsylvania, January 2003 Washington, March 2003 Two independent demonstration projects FRED, John Ockerbloom, Univ. Pennsylvania FOCUS, Joseph JaJa, Univ. Maryland
10
Background/History Evolving consensus on scope
A forum for documenting normative definitions of format syntax and semantics A common facility to pool and share scarce technical expertise on a global basis A channel for the distribution of that expertise to the international community of preservation practitioners A foundation for additional value-added services requiring detailed knowledge of digital formats
11
Background/History Peer-to-peer network of independent, but cooperating registries
12
Background/History Harvard University Library (HUL) funded for 2 years by the Andrew W. Mellon Foundation Technical deliverables only; no funded governance/policy activity Staffing and technical work subcontracted to OCLC (July 2006)
13
NARA Governance Workshop
Richard Steinbacher Robert Chadduck
14
Architecture Stephen Abrams
15
Architecture A generic distributed registry framework, specialized for the GDFR application Based on well-known products and protocols Human and machine interfaces Full information content expressible in XML form; can be re-instantiated from that expression Platform independence Globally fault tolerant Open source
16
Architecture Data model is an extension of PRONOM 4
17
Architecture Based on the OCLC IWSA/RFA framework
18
Architecture Java, Apache/Tomcat, Berkeley DB XML GNU LGPL license
Including technology newly-developed for the project and pre-existing OCLC technology
19
Current state Andrea Goethals
20
Current state: schedule
July 31, 2008 Contract with OCLC ends GDFR source node at Harvard goes public in beta mode August 2008 up to August 2010 Harvard maintains GDFR software, website and source node
21
Current state: GDFR Home website
It moved! Old GDFR Home: New GDFR Home: All existing GDFR docs migrated from the old GDFR Home website Over the next month Updated documentation! Demo source node?
22
Current state: architecture
Currently: One GDFR source node Where all data additions and edits are performed Many GDFR mirror nodes Replicated data Future? Multiple GDFR source nodes? Multiple interoperable format registry source nodes? “Discoverable” from GDFR Home website Each node has 2 Interfaces For humans: user interface For machines: web service interface
23
Current state: GDFR source node
Housed by Harvard for now Populated with test data- ~2000 formats from Magic database Need authorized account to add/edit data
24
Current state: GDFR mirror nodes
Test mirror nodes at OCLC and Harvard Anyone can run a mirror node Synchronize data with the source node Can brand your mirror node
25
Current state: Mirror node set-up
Dependencies Apache 2 (mod_rewrite, mod_jk, mod_perl2) Tomcat 5.5.x Berkeley DBXML Perl 5.8.x Java 1.5 Installation & configuration – half day
26
User interface Mirror node Source node Sneak preview
Search, browse, lookup/retrieve, export, manage node Source node Same as mirror node Plus: add, edit Sneak preview
27
Current state: machine interface
Web services using SRU Can do everything supported by the human user interface Except browsing Plus mirror-to-source node synchronization
28
Relationship to PRONOM
Andrea Goethals
29
Relationship to PRONOM – what’s the problem?
Two different “format” registries Overlapping but digressing data model No common format model No mechanism to exchange data PRONOM is in production, GDFR is not yet PRONOM has been publicly available for over 4 years and is used by some preservation repositories Interoperates with DROID Basis for PLANET projects How many format registries does the digital preservation community need? Depends on how different they are…
30
Relationship to PRONOM – core differences
Who governs the registry and makes policy, scope and enhancement decisions? PRONOM: TNA GDFR: community-based Who adds and edits format information? PRONOM: TNA (does take addition requests) Where is the format information physically located? PRONOM: at TNA GDFR: replicated in different geographic locations
31
Relationship to PRONOM – what’s the solution?
Recognize there is a problem – DONE Mutual willingness to resolve TNA desire to participate in a GDFR pilot Common web service API across the registries? PRONOM could become a GDFR node PRONOM and GDFR could each support a new web service API Cross-walk PRONOM PUIDs and GDFR GFIDs? Use common format identification tools (DROID, JHOVE, etc.) with either registry
32
Issues and Observations
Dale Flecker
33
Use cases Andrea Goethals
34
Use cases – 3 sets (see handout)
Higher-level use cases submitted by many institutions (early 2003) Lower-level use case model created for the software design (2006-7) Use cases arising from informal talks and meetings
35
Key use cases – discussed but not supported
Determine duplicates Notifications/warnings Determine migration/emulation pathways Determine at-risk formats (machine-actionable risk assessments) Support the registry & discovery of GDFR nodes Authentication of nodes and users (outside the UI) Storage of local profiles separate from central formats Synchronizations based on vetted or non-vetted data Determine “quality” of format information Multiple source nodes
36
Use cases- common issues
How evaluative should GDFR be? Neutral vs judgmental Are services in the scope of GDFR? Should GDFR provide services directly (notifications, validation, etc.) or should GDFR be a reference that can be used by external services?
37
Discussion of pilot All
38
Discussion of pilot Purposes
39
Discussion of pilot Pilot use cases
40
Discussion of pilot Process
41
Discussion of pilot Participants
42
Review next steps from the GDFR Governance Workshop Report
Richard Steinbacher Robert Chadduck
43
Outreach to other interested parties
All
44
Next steps? All
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.