NCSU Libraries 9 October 2006 EPA Meeting Preservation Partnership with Library of Congress: NDIIPP and the North Carolina Geospatial Data Archiving Project Steve Morris Jim Tuttle Rob Farrell Jeff Essic
NCSU Libraries What is NDIIPP? Why NCSU Libraries? NDIIPP = National Digital Information Infrastructure and Preservation Program Responding to concern that we might be in the middle of a “digital dark age” Congress earmarked $100 million for digital preservation efforts through 2010 Timeline – Aug. 2003: Library of Congress (LC) puts out call for proposals for “preservation partners” – Sept. 2004: LC finalizes agreements with eight principal partners, including NCSU. – Oct. 2004: the three-year projects begin A cooperative agreement … not a grant – emphasis on ongoing interaction with LC and other partners, with transfer of learning experience to LC as primary outcome
NCSU Libraries NC Geospatial Data Archiving Project (NCGDAP) Partner: NC Center for Geographic Information & Analysis (state agency) Focus: State and local agency digital geospatial data in NC as state demonstration Objective: Engage existing spatial data infrastructure (SDI) in the problem of preservation Tied to the NC OneMap initiative, which provides for seamless access to data, metadata, and inventories
NCSU Libraries Geospatial Data Types: Vector & Attribute Data Time series Parcel Boundary Changes North Raleigh, NC
NCSU Libraries Geospatial Data Types: Vector Data Time series Parcel Boundary Changes North Raleigh, NC
NCSU Libraries Geospatial Data Types: Aerial Imagery
NCSU Libraries Geospatial Data Types: Aerial Imagery
NCSU Libraries 85+ NC counties with orthophotos 1-5 flights per county gb per flight Geospatial Data Types: Aerial Imagery
NCSU Libraries Today’s Geospatial Data as Tomorrow’s Cultural Heritage Future uses of data are difficult to anticipate (as with Sanborn Maps).
NCSU Libraries Digital Preservation Points of Failure Data is not saved, or … can’t be found, or … media is obsolete, or … media is corrupt, or … format is obsolete, or … file is corrupt, or … meaning is lost Solutions: Migration Emulation Encapsulation XML
NCSU Libraries Risks to Digital Geospatial Data Producer focus on current data – Data overwrite as common practice Future support of data formats in question – No open, supported format for vector data Shift to web services-based access – Data becoming more ephemeral Inadequate or nonexistent metadata – Impedes discovery and use Increasing use of spatial databases for data management – Complex entities: the whole is greater than the sum of the parts
NCSU Libraries Technical solutions: How do we archive acquired content over the long term? – Build a data repository: not as an end in itself but as a catalyst for discussion within the data community – Develop a repository ingest workflow: create technical points of engagement with the NDIIPP partners Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production? – Engage data producer community and spatial data infrastructure through outreach and engagement; influence practice – Sell the problem to software vendors and standards development – Find overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc. – Start a discussion about roles at the local, state, and federal level NCGDAP Approach to Preservation
NCSU Libraries Repository Ingest Workflow Flexible, extensible processes Clear, documented procedures Adherence to standard practices, where they exist Automation
NCSU Libraries Technical Solution: Building a Digital Repository Three “Rights”: – Right format – Right tags (metadata) – Right relationship Oh, and of course, valid for the rest of the Digital Age! NCGDAP is about researching methodologies…
NCSU Libraries What is the “Right” Format??? Well, it’s complicated… Databases Multi-part datasets Open Source Developments Web Services
NCSU Libraries Our Format Methodology Decide on archival format(s) Migrate non-archival formats Archive both versions of the data set We need a methodology that can do this a few hundred thousand times…initially.
NCSU Libraries The Zip Codes Example
NCSU Libraries Where Is the Data Set?
NCSU Libraries Here Is One!
NCSU Libraries Needles in the Haystack Computer Programs Written – Utilize functionality of GIS – Iterate through the data sets – Create “bundles” for deposit Process Steps 1.Locate a data set 2.Determine the format 3.Make appropriate conversion 4.Create and isolate “bundle” with new and original format 5.Repeat
NCSU Libraries Custom Tools
NCSU Libraries Custom Tools
NCSU Libraries Hub-and-spoke Metadata Transformation
NCSU Libraries Hub-and-spoke Metadata Transformation
NCSU Libraries Preserving Local Collections
NCSU Libraries Preserving Local Collections
NCSU Libraries Preserving Local Collections
NCSU Libraries Preserving Local Collections
NCSU Libraries Geologic and Historic Topographic Maps: Georeferencing and Preservation
NCSU Libraries
Historic Topographic Map Preservation 165 Historic 15-minute series topographic maps for NC Date range: Documentation at Available on NCSU Libraries Geodata server
NCSU Libraries Geologic Map Preservation 290 Geologic Maps for NC Map sources are US Geologic Survey, NC Geologic Survey, theses and dissertations Documentation at Public download at
NCSU Libraries Geologic Map Preservation 1,200 – 24,000 1:500,000 – 1:2.5 M 1:31,680 – 1:430,000
NCSU Libraries
NCGS Project Summary Project came to us - workplan and intern identified Preservation risk - data was stored on external drive Content is in high demand by patrons, hardcopy only, scarce to obtain Collection acquired at no cost to Libraries Data files publicly available for download Partnership with NC Dept. of Environment and Natural Resources; increasing interest in preservation Early raster dataset for NCGDAP – test for large data volumes, ingest process, metadata creation NCGS Open File Report forthcoming
NCSU Libraries Engaging spatial data infrastructure – Evaluating metadata and content standard adherence – Cultivation of content exchange networks – Sept survey of current practice in local agencies External partnerships – Partners on JISC-funded effort in the UK (Edinburgh) Engaging software vendors – Meetings with ESRI development teams Engaging standards development processes – Nov. 2005, partnered with University of Edinburgh on presenting the preservation problem to the Open Geospatial Consortium (OGC) Technical Committee – Oct. 2006, partnered with NARA on initiating a formal working group on digital preservation within the OGC NCGDAP: Engagement with the Data Community
NCSU Libraries NCGDAP on the Road Presentations, posters, and workshops Jan Sept Highlights: O’Reilly Where 2.0 OGC Meeting (Germany) Digital Curation Center (UK) IS&T Archiving (Canada) IASSIST (UK) ESRI International Joint NDIIPP & JISC Meeting National/International: 37 State/Local: 21
NCSU Libraries Project shifting to data acquisition mode Current contract ends Oct Likely continuation of project funding through Oct Four responses to additional LC “Requests for Expression of Interest (RFEI)” – Development of content exchange networks – Development of tool for automated capture of web mapping services – Participation in repository exchange tests – Multi-state project involving State Archives … RFEI status pending NCGDAP: Future Directions
NCSU Libraries Questions? North Carolina Geospatial Data Archiving Project website Library of Congress NDIIPP website