Download presentation
Presentation is loading. Please wait.
Published byGertrude Greer Modified over 9 years ago
1
State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners (ESIP) Workshop July 8, 2009
2
One of eight initial collection building projects in the Library of Congress NDIIPP (National Digital Information Infrastructure and Preservation Program) Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA) Focus: State and local government geospatial data in NC Repository development as catalyst for discussion Goal: Engage spatial data infrastructure in data archiving Initial 3 year project extended to Dec. 2009 NC Geospatial Data Archiving Project (NCGDAP)
3
NCGDAP Data Types – Raster Digital orthophotography Satellite imagery Static data
4
NCGDAP Data Types – Vector Data Point, line, and polygon Attached attribute data Often updated
5
Note: Percentages based on the actual number of respondents to each question Downtown Raleigh Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Downtown Raleigh, NC Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly commercial formats
6
NCGDAP Data Types – Spatial Databases Vector and raster data Relationships Behaviors Annotation Data Models
7
Dynamic content Constantly updated information Data versioning Digital object complexity Spatially-enabled databases Complicated, multi-component formats Proprietary formats Geospatial Data: Compelling Issues
8
Data consists of multi-file, multi-format objects Ancillary data files can be shared by datasets Some format conversions involve one-to-many relationships Compressed archive files are common and behave unpredictably And all the usual challenges: format validation, validity checking, threat scanning,… Ingest Challenges: General
9
Where is the Dataset?
10
Here’s One! Files Multi-file dataset Georeferencing Metadata file Symbolization file Additional documentation License Disclaimer More Metadata FGDC Acquisition metadata Transfer metadata Ingest metadata Archive rights Archive processes Collection metadata Series metadata
11
Metadata is encoded in a variety or ways The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), addressed in ISO 19115/19139 North American Profile implementation XML (varied schemas), TXT, HTML Metadata is missing Only about 25% of local agencies use FGDC Metadata is wrong Metadata is commonly asynchronous with the data Inconsistent use of dataset naming, etc. e.g., “Streets” vs. “Wake County Streets” Ingest Challenges: Metadata
12
Existing geospatial metadata often needs: Remediation – to fix errors or omissions Normalization – to adhere to a standard structure Synchronization – so that the data at hand matches the metadata If no metadata then: Can build minimal metadata using templates and auto-extraction Lose key information such as data quality, lineage, data dictionaries Automating metadata for repository ingest Raster data is easy – large sets of consistently structured files Vector data is hard – each dataset is a different story Many additional administrative and technical metadata elements not accommodated by FGDC NCGDAP Metadata Summary
13
Extended Curation: Feedback and Outreach Data Receipt Format Processing Metadata Processing Ingest Processes Content Producers Industry Standards Organizations
14
Metadata standards and outreach Metadata quality, best practices Inventories Reduce “contact fatigue”, shareable information store Content exchange networks Leverage more compelling business reasons to put data in motion Automate process, add technical & administrative metadata Framework data communities Snapshot frequency, schemas, format strategies Spatial Data Infrastructure and Archiving
16
Geospatial datasets are typically complex, multi-file objects Data are often accompanied by ancillary data, which must be associated with the data item Rights information and licenses must be associated with the item Various implementations in different domains (METS, IMS-CP, XFDU, etc.) Simpler.zip-based packages also used (MEF, KMZ, etc.) Content Packaging Issues
17
Spatial Database Approaches Manage database forward over time Extract data layers to preservable form Set aside archival snapshot of database
18
Partners (NC, KY, UT, Library of Congress, NCSU): State geospatial organizations State Archives State-to-state and geo-to-Archives collaboration Organizational and technical diversity across states Archives as part of spatial data infrastructure Selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business cases GeoMAPP: Geospatial Multistate Archival and Preservation Partnership
19
NCGDAP Learning Outcomes Preservation of GIS projects is needed to support re- creation of past work Preservation of data representations is needed to document decision-making processes Validation, remediation, and conversion of data and metadata is expensive: push for improvements upstream Some repositories handle “items”: can result in “atomization” of data For vendors, frame data preservation as a “customer problem” -- must build the business case
20
Thank You! Steve Morris Head, Digital Library Initiatives North Carolina State University Libraries steven_morris@ncsu.edu North Carolina Geospatial Data Archiving Project http://www.lib.ncsu.edu/ncgdap GeoMAPP http://www.geomapp.net
21
AGRC exports data from SGID and splits out datasets by series. Metadata occasionally incomplete complete Local governments supply GIS datasets on CD/DVD to AGRC. Metadata often missing All Metadata is completed to FGDC Standards AGRC creates geoPDF files of individual datasets, plus ZIP files of the native format. One ZIP file would contain all the pieces belonging to one shapefile or, alternatively, the file would contain a geodatabase. Geodatabases would not be just one big database with everything in it (multiple series and years). Instead, the native files would be composed of a single downloadable file per series per year. AGRC copies these files to Archives’ FTP server. Example FTP Site Structure: ftp.archives-agrc.utah.gov/Archives Metadata harvested to populate Archive’s Finding Aids ftp.archives-agrc.utah.gov/Archives o Biota Dublin Core Metadata o Boundaries Dublin Core Metadata MunicipalityRecords-Series-26846 Dublin Core Metadata 2000 o MunicipalBoundaries.zip FGDC Metadata o MunicipalBoundaries.pdf FGDC Metadata 2001 2002 2003 CountyBoundaries-Series-26845 Dublin Core Metadata 2003 2004 Draft of Utah’s GIS to Archives Data Flow
22
Database with Dublin Core Descriptive and Administrative Metadata iRODS DSpace Content Files Distributed Storage Layer Single item & batch ingest into DSpace by Archivist Kentucky Metadata Workflow into DSpace and iRODS Environment UNC other KDLA Batch metadata extraction using iRODS rules Database with Administrative & Preservation Metadata Preservation metadata from iRODS rules Metadata & content entered by agencies using template and modified by Archivist
23
Source Metadata Translation Hub-and-spoke model a la Echo DEPository repository agnostic modular conversion hub facilitate repository software migration & inter-archive exchange
24
Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress Partners: State geospatial organizations of Kentucky and Utah State Archives of Kentucky and Utah NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration 2 year project: Nov. 2007-Dec. 2009 Archives as part of Spatial Data Infrastructure GeoMAPP: Geospatial Multistate Archival and Preservation Partnership
25
Introduce GIS organizations and State Archives to each other Archival selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business case GeoMAPP: Project Components
26
Repository Goal Capture at-risk data Explore technical and organizational challenges Project End Goal Data Producers: Improved temporal data management practices Archives: More efficient means of acquiring and preserving data; Progress towards best practices NC Geospatial Data Archiving Project (NCGDAP) Temporal data management vs. long-term preservation
27
Data capture Backups are common, but not long-term archives Producer focus on current data Shift to web services-based access Inadequate or non-existent metadata Consistent NC survey statistics: Only 40% of data producers create and maintain metadata Existing metadata often needs to be normalized, synchronized with the data, and remediated Geospatial Data Preservation Challenges Loss of memory about the data is also a problem
28
When to automate and when not to Learn first from human intervention Minimizing risk of error related to human intervention Accepting that ingest packages used will evolve over time (implications for archive?) Handling post-ingest migrations Ongoing Challenges
29
Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities
30
Capture “transfer set” metadata Normalize, synchronize, and remediate existing metadata, and retain original metadata record Treat contact information as archival Update metadata with format conversions Use ESRI Profile of FGDC added technical and administrative elements Has an XML schema ArcCatalog tool support Use simple rights encoding scheme Record metadata in a workflow management database Some Key Metadata Decisions
31
NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington SIP Item Creation: Workflow Submission Information Package grouping – Ontology logic based on defined multi-file complex format components and directory structure Repository-agnostic item grouping
32
Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata Version one (1994) mandated for use by federal agencies Descriptive metadata, plus some administrative and technical Extensive use at state level, spotty use at local level Problem: content standard without an encoding spec FGDC profiles: ESRI, NBII, Remote Sensing, etc. ISO Standards ISO 19115: Geospatial Information – Metadata (2003) ISO 19139: Geospatial Information – Metadata – XML (2007) North American Profile of ISO to replace FGDC CGDSM Metadata Overview
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.