State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners.

Slides:



Advertisements
Similar presentations
GeoMAPP Business Planning: Developing Materials to Get Stakeholder Buy-in Alec Bethune, North Carolinas Center for Geographic Information and Analysis.
Advertisements

GeoSpatial MultiState Archive and Preservation Partnership State and Local Agency Geospatial Resources Content Transfer, Demonstration, and Learning Project.
NDIIPP Project Update NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University Libraries North Carolina Center for Geographic Information.
The Disappearing Data Problem: Preserving Today's Geospatial Data to Meet Tomorrow's Temporal Analysis Needs Steve Morris Head of Digital Library Initiatives.
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
Collecting Digital Content Going Forward: Lessons Learned and New Initiatives NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University.
Identification, Selection, and Appraisal within the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area Steven P. Morris Head of Digital Library Initiatives North Carolina.
2006 ESRI International Users ConferenceAugust 8, 2006 Spatial Data Infrastructure and Data Preservation in North Carolina Jefferson F. Essic, Robert Farrell,
North Carolina Geospatial Data Archiving Project (NCGDAP) Project Overview Partnership –University library (NCSU) and state agency (NCCGIA) –$520,000 funding,
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
Content and Practice: Background to the NC Geospatial Data Archiving Project Steve Morris NCSU Libraries.
Twenty Years of Spatial Vision, But What Does 1987 Look Like in Your GIS? – Emerging Issues, Hindsight and Insights from the NC Preservation Partnership.
Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris.
Copyright © 2008, Open Geospatial Consortium, Inc., All Rights Reserved. NDIIPP Partnership Update: North Carolina and Multi-state Demonstration Projects.
NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User.
North Carolina Geospatial Data Archiving Project (NCGDAP) JISC/NDIIPP Joint Digital Preservation Workshop – May 2006 Presented by: Rob Farrell, Steve Morris,
Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University.
The North Carolina Geospatial Data Archiving Project Steven P. Morris North Carolina State University Libraries Maintaining Long-Term Access to Geospatial.
Why Archiving and Preserving GIS Data Is Important Maps tell a compelling story of change over time. They document movement, progress, and change to the.
Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.
Collection Building Processes within the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library.
OGC ® © 2006 Open Geospatial Consortium, Inc.1 Introduction to Archives and Geospatial Issues ( Continued ) Steve Morris Head, Digital Library Initiatives.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
GeoMAPP Project Overview and Conclusions Alec Bethune- NC Center for Geographic Information and Analysis Matt Peters- Utah Automated Geographic Reference.
Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries.
NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,
Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries.
Preserving State and Local Government Digital Geospatial Data Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries.
Collection and Preservation of At- Risk Digital Geospatial Data: North Carolina Geospatial Data Archiving Project (NDIIPP Partnership) Steve Morris Head.
Long-Term Preservation of At- Risk Digital Geospatial Data: A Cooperative Agreement with Library of Congress Steve Morris NCSU Libraries Zsolt Nagy NC.
GeoMAPP: Using Metadata to Help Preserve Geospatial Content Matt Peters, Utah’s Automated Geographic Reference Center Glen McAninch, Kentucky Department.
Preserved Digital Content: Value to Public Policy Decision Making Now and in the Future NC Geospatial Data Archiving Project (NCGDAP) North Carolina State.
Preservation of Coastal Community Geospatial Content: What's Your Long Term Care Plan For Aging Data? Jeff Essic North Carolina State University Libraries.
North Carolina Geospatial Data Archiving Project : Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Partners: NCSU.
Collection and Preservation of At- Risk Digital Geospatial Data: the North Carolina NDIIPP Project Partners: NCSU Libraries Project Lead: Steve Morris.
NCPMA Fall MeetingOctober 11, 2006 GIS Data Preservation: Partnership with Library of Congress Steve Morris North Carolina State University Libraries.
NCSU Libraries 9 October 2006 EPA Meeting Preservation Partnership with Library of Congress: NDIIPP and the North Carolina Geospatial Data Archiving Project.
Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries.
Archiving Geospatial Data: Background to the Problem Area State Government Users Committee October 16, 2008 Steve Morris, NCSU Libraries.
ESRI International Users ConferenceJune 20, 2007 Data Snapshot Archiving: A Frequency of Capture Survey Steve Morris Jeff Essic North Carolina State University.
Preserving Geospatial Data: Challenges and Opportunities Steve Morris NCSU Libraries Indo-US Workshop on Trends in Digital Preservation March 24, 2009.
Preserving Digital Geospatial Data: The NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries CRADLE.
Geospatial Data Preservation Challenges at the Sub-National Level: The North Carolina Experience Steve Morris Head of Digital Library Initiatives North.
Preservation and Access-- Can One Live Without the Other? NDIIPP Partners Meeting| July 21,2010 | Washington D.C.
NCSU Libraries 13 June 2006 JCDL 2006 NDIIPP Preservation Network: Progress, Problems, and Promise Jim Tuttle, Geospatial Data Librarian.
NDIIPP Project: North Carolina Geospatial Data Archiving Project Partners: NCSU Libraries Project Lead: Steve Morris NC Center for Geographic Information.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at- risk digital geospatial data Partners: NCSU Libraries Project.
GISC Seminar: Towards Uncharted GroundSeptember 29, 2006 North Carolina Partnership with Library of Congress on Long-term Preservation of Digital Geospatial.
NDIIPP Project: Collection and Preservation of At-Risk Digital Geospatial Data Partners: NCSU Libraries Project Lead: Steve Morris NC Center for Geographic.
The Disappearing Data Problem Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
Models for Shared Responsibility: Collaboration and Engagement with the NCGDAP and GeoMAPP Partnerships Steve Morris North Carolina State Libraries Zsolt.
Mountain Region GIS Advisory Council Meeting September 15, 2006 Long-Term Preservation of Digital Geospatial Data: A Cooperative Project with Library of.
Library of Congress Partnerships for Managing Geospatial Data North Carolina Geographic Information Coordinating Council Raleigh, NC November 7, 2007 William.
Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at-risk digital geospatial data Partners: NCSU Libraries NC Center.
Jaime Stoltenberg Map and Geospatial Data Librarian Arthur H. Robinson Map Library University of Wisconsin-Madison Wisconsin Land Information Association.
Overview: GeoMAPP Appraisal Efforts NDSA Geospatial Working Group| 27 June 2012 |
Jim Tuttle North Carolina State University Libraries
Preservation of State and Local Government Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steven P. Morris, James Tuttle,
Preserving Digital Geospatial Data: The NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries CRADLE.
Long-Term Preservation of At-Risk Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steve Morris NCSU Libraries.
Update on Geospatial Data Preservation Efforts
Collecting Digital Content Going Forward: Lessons Learned and New Initiatives NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University.
Preserved Digital Content: Collections, Value, and Stewardship NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University Libraries.
Presentation transcript:

State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners (ESIP) Workshop July 8, 2009

One of eight initial collection building projects in the Library of Congress NDIIPP (National Digital Information Infrastructure and Preservation Program) Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA) Focus:  State and local government geospatial data in NC  Repository development as catalyst for discussion  Goal: Engage spatial data infrastructure in data archiving Initial 3 year project extended to Dec NC Geospatial Data Archiving Project (NCGDAP)

NCGDAP Data Types – Raster Digital orthophotography Satellite imagery Static data

NCGDAP Data Types – Vector Data Point, line, and polygon Attached attribute data Often updated

Note: Percentages based on the actual number of respondents to each question Downtown Raleigh Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly proprietary formats Downtown Raleigh, NC Near State Capitol 2005 Wake County Ortho Imagery = Durable Static Simple structure Mostly open formats Vector data = Volatile Frequent update Complex structure Mostly commercial formats

NCGDAP Data Types – Spatial Databases Vector and raster data Relationships Behaviors Annotation Data Models

Dynamic content  Constantly updated information  Data versioning Digital object complexity  Spatially-enabled databases  Complicated, multi-component formats  Proprietary formats Geospatial Data: Compelling Issues

Data consists of multi-file, multi-format objects Ancillary data files can be shared by datasets Some format conversions involve one-to-many relationships Compressed archive files are common and behave unpredictably And all the usual challenges: format validation, validity checking, threat scanning,… Ingest Challenges: General

Where is the Dataset?

Here’s One! Files Multi-file dataset Georeferencing Metadata file Symbolization file Additional documentation License Disclaimer More Metadata FGDC Acquisition metadata Transfer metadata Ingest metadata Archive rights Archive processes Collection metadata Series metadata

Metadata is encoded in a variety or ways  The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), addressed in ISO 19115/19139 North American Profile implementation  XML (varied schemas), TXT, HTML Metadata is missing  Only about 25% of local agencies use FGDC Metadata is wrong  Metadata is commonly asynchronous with the data Inconsistent use of dataset naming, etc.  e.g., “Streets” vs. “Wake County Streets” Ingest Challenges: Metadata

Existing geospatial metadata often needs:  Remediation – to fix errors or omissions  Normalization – to adhere to a standard structure  Synchronization – so that the data at hand matches the metadata If no metadata then:  Can build minimal metadata using templates and auto-extraction  Lose key information such as data quality, lineage, data dictionaries Automating metadata for repository ingest  Raster data is easy – large sets of consistently structured files  Vector data is hard – each dataset is a different story Many additional administrative and technical metadata elements not accommodated by FGDC NCGDAP Metadata Summary

Extended Curation: Feedback and Outreach Data Receipt Format Processing Metadata Processing Ingest Processes Content Producers Industry Standards Organizations

Metadata standards and outreach  Metadata quality, best practices Inventories  Reduce “contact fatigue”, shareable information store Content exchange networks  Leverage more compelling business reasons to put data in motion  Automate process, add technical & administrative metadata Framework data communities  Snapshot frequency, schemas, format strategies Spatial Data Infrastructure and Archiving

Geospatial datasets are typically complex, multi-file objects Data are often accompanied by ancillary data, which must be associated with the data item Rights information and licenses must be associated with the item Various implementations in different domains (METS, IMS-CP, XFDU, etc.) Simpler.zip-based packages also used (MEF, KMZ, etc.) Content Packaging Issues

Spatial Database Approaches Manage database forward over time Extract data layers to preservable form Set aside archival snapshot of database

Partners (NC, KY, UT, Library of Congress, NCSU):  State geospatial organizations  State Archives State-to-state and geo-to-Archives collaboration  Organizational and technical diversity across states Archives as part of spatial data infrastructure  Selection and appraisal processes  Retention schedule development  Data transfer to archives  Development of enhanced business cases GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

NCGDAP Learning Outcomes Preservation of GIS projects is needed to support re- creation of past work Preservation of data representations is needed to document decision-making processes Validation, remediation, and conversion of data and metadata is expensive: push for improvements upstream Some repositories handle “items”: can result in “atomization” of data For vendors, frame data preservation as a “customer problem” -- must build the business case

Thank You! Steve Morris Head, Digital Library Initiatives North Carolina State University Libraries North Carolina Geospatial Data Archiving Project GeoMAPP

AGRC exports data from SGID and splits out datasets by series. Metadata occasionally incomplete complete Local governments supply GIS datasets on CD/DVD to AGRC. Metadata often missing All Metadata is completed to FGDC Standards AGRC creates geoPDF files of individual datasets, plus ZIP files of the native format. One ZIP file would contain all the pieces belonging to one shapefile or, alternatively, the file would contain a geodatabase. Geodatabases would not be just one big database with everything in it (multiple series and years). Instead, the native files would be composed of a single downloadable file per series per year. AGRC copies these files to Archives’ FTP server. Example FTP Site Structure:  ftp.archives-agrc.utah.gov/Archives Metadata harvested to populate Archive’s Finding Aids ftp.archives-agrc.utah.gov/Archives o Biota Dublin Core Metadata o Boundaries Dublin Core Metadata  MunicipalityRecords-Series Dublin Core Metadata  2000 o MunicipalBoundaries.zip FGDC Metadata o MunicipalBoundaries.pdf FGDC Metadata  2001  2002  2003  CountyBoundaries-Series Dublin Core Metadata  2003  2004 Draft of Utah’s GIS to Archives Data Flow

Database with Dublin Core Descriptive and Administrative Metadata iRODS DSpace Content Files Distributed Storage Layer Single item & batch ingest into DSpace by Archivist Kentucky Metadata Workflow into DSpace and iRODS Environment UNC other KDLA Batch metadata extraction using iRODS rules Database with Administrative & Preservation Metadata Preservation metadata from iRODS rules Metadata & content entered by agencies using template and modified by Archivist

Source Metadata Translation Hub-and-spoke model a la Echo DEPository  repository agnostic  modular conversion hub  facilitate repository software migration & inter-archive exchange

Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress Partners:  State geospatial organizations of Kentucky and Utah  State Archives of Kentucky and Utah  NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration 2 year project: Nov Dec Archives as part of Spatial Data Infrastructure GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

Introduce GIS organizations and State Archives to each other Archival selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business case GeoMAPP: Project Components

Repository Goal  Capture at-risk data  Explore technical and organizational challenges Project End Goal  Data Producers: Improved temporal data management practices  Archives: More efficient means of acquiring and preserving data; Progress towards best practices NC Geospatial Data Archiving Project (NCGDAP) Temporal data management vs. long-term preservation

 Data capture  Backups are common, but not long-term archives  Producer focus on current data  Shift to web services-based access  Inadequate or non-existent metadata  Consistent NC survey statistics: Only 40% of data producers create and maintain metadata  Existing metadata often needs to be normalized, synchronized with the data, and remediated Geospatial Data Preservation Challenges Loss of memory about the data is also a problem

When to automate and when not to  Learn first from human intervention  Minimizing risk of error related to human intervention Accepting that ingest packages used will evolve over time (implications for archive?) Handling post-ingest migrations Ongoing Challenges

Challenge: Preservation Metadata Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

Capture “transfer set” metadata Normalize, synchronize, and remediate existing metadata, and retain original metadata record Treat contact information as archival Update metadata with format conversions Use ESRI Profile of FGDC  added technical and administrative elements  Has an XML schema  ArcCatalog tool support Use simple rights encoding scheme Record metadata in a workflow management database Some Key Metadata Decisions

NCSU Libraries 27 March 2006 Digital Preservation in State Government - Wilmington SIP Item Creation: Workflow Submission Information Package grouping – Ontology logic based on defined multi-file complex format components and directory structure Repository-agnostic item grouping

Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata  Version one (1994) mandated for use by federal agencies  Descriptive metadata, plus some administrative and technical  Extensive use at state level, spotty use at local level  Problem: content standard without an encoding spec  FGDC profiles: ESRI, NBII, Remote Sensing, etc. ISO Standards  ISO 19115: Geospatial Information – Metadata (2003)  ISO 19139: Geospatial Information – Metadata – XML (2007)  North American Profile of ISO to replace FGDC CGDSM Metadata Overview