NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools, and Resources Jim Tuttle, Geospatial Data Librarian
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Project Overview Partnership between university library (NCSU) and state agency (NCCGIA) One of eight projects in the first NDIIPP funding round: "Building a Network of Partners" Focus on state and local geospatial content in North Carolina Objective: engage existing state/federal geospatial data infrastructures in preservation
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Content Complexity Multi file objects Spatial databases Ancillary data files Time-versioning Diverse data sources/metadata practices
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Workflow Overview Acquisition Format Migration Submission Information Package (SIP) Creation Ingest Metadata
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Acquisition: Workflow Collection creation/declaration File Manifest Metadata Seed File Transfer data to processing machine
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Acquisition: Tools and Resources PHP/PostgreSQL form Python automation scripting Threat analysis –ClamAV –Unix ‘file’ utility file putty putty: MS-DOS executable (EXE), OS/2 or MS Windows
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Acquisition: Tools and Resources Md5 checksum md5sum O-view.vsd –69b3e2f6cff1537bd607f5522d0c5c4d O-view.vsd Jhove Format registries –PRONOM (UK National Archives), GDFR (Harvard/Mellon), Fred (LC)
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Format Migration: Workflow On-receipt migration of selected formats Object-level metadata creation/augmentation
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Format Migration: Tools and Resources Python batch process wrappers ArcCatalog metadata templates
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington SIP Item Creation: Workflow Submission Information Package grouping – Ontology logic based on defined multi-file complex format components and directory structure Repository-agnostic item grouping
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington SIP Item Creation: Tools and Resources Python scripts highly dependent on: –Explicit understanding of ontological relationships of complex format components –Logical directory structure as dictated by data- producer software Spreadsheet illustrating item assignment for manual review Automated revision of assignment based on spreadsheet modifications
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Ingest Metadata: Workflow Extraction of elements from multiple sources Crosswalk metadata to archive ingest record (DSpace Qualified Dublin Core), METS, and external Workflow Management Database
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Ingest Metadata: Tools and Resources Python XML libraries XSL/XSLT NOID (Nice Opaque Identifier) Persistent Identifier
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington Conclusion Plenty of free, open source tools The robustness of an ingest process must be inversely proportionate to the demands placed on data producers in preparation for ingest Finding the balance between cost-saving automation and the accuracy and flexibility of human intervention is difficult
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington For More Information NCGDAP – North Carolina Data Archiving Project NDIIPP – National Digital Information Infrastructure Preservation Program ClamAV Unix File utility: ‘man file’ JHOVE – JSTOR Harvard Object Validation Environment PRONOM Format Registry GDFR – Global Digital Format Registry (in planning) Fred Format Registry (proof-of-concept) bin/fred?cmd=Default&sid=ca21d10e67b269a75a98fe369d2ab670http://tom.library.upenn.edu/cgi- bin/fred?cmd=Default&sid=ca21d10e67b269a75a98fe369d2ab670 XSLT – eXtensible Stylesheet Language Transformations NOID – Nice Opaque IDentifier Jim Tuttle, Geospatial Data Librarian jim_tuttle at ncsu dot edu