Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University.

Slides:



Advertisements
Similar presentations
GeoMAPP Business Planning: Developing Materials to Get Stakeholder Buy-in Alec Bethune, North Carolinas Center for Geographic Information and Analysis.
Advertisements

NDIIPP Project Update NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University Libraries North Carolina Center for Geographic Information.
The Disappearing Data Problem: Preserving Today's Geospatial Data to Meet Tomorrow's Temporal Analysis Needs Steve Morris Head of Digital Library Initiatives.
Collecting Digital Content Going Forward: Lessons Learned and New Initiatives NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University.
Map Portals and Geoarchiving: New Opportunities in Geospatial Information Services Steve Morris Head of Digital Library Initiatives NCSU Libraries GIS.
Identification, Selection, and Appraisal within the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital.
NATIONAL STATES GEOGRAPHIC INFORMATION COUNCIL 2105 Laurel Bush Rd. Suite 200 Bel Air, MD GIS Inventory powered by Ramona.
Planned Title: Review of Evaluation of Geospatial Search Allan Doyle.
Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area Steven P. Morris Head of Digital Library Initiatives North Carolina.
2006 ESRI International Users ConferenceAugust 8, 2006 Spatial Data Infrastructure and Data Preservation in North Carolina Jefferson F. Essic, Robert Farrell,
North Carolina Geospatial Data Archiving Project (NCGDAP) Project Overview Partnership –University library (NCSU) and state agency (NCCGIA) –$520,000 funding,
Sub Committees and Working Group Activities Historical Data Working Group Presented by Brett Abrams NARA.
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Content and Practice: Background to the NC Geospatial Data Archiving Project Steve Morris NCSU Libraries.
Twenty Years of Spatial Vision, But What Does 1987 Look Like in Your GIS? – Emerging Issues, Hindsight and Insights from the NC Preservation Partnership.
Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris.
Copyright © 2008, Open Geospatial Consortium, Inc., All Rights Reserved. NDIIPP Partnership Update: North Carolina and Multi-state Demonstration Projects.
State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners.
North Carolina Geospatial Data Archiving Project (NCGDAP) JISC/NDIIPP Joint Digital Preservation Workshop – May 2006 Presented by: Rob Farrell, Steve Morris,
Putting time into the GeoWeb: Data persistence in a web services environment Steve Morris NCSU Libraries July 23, 2008.
[Milwaukee County Enterprise GIS Migration Project] presented by: Kevin White, GIS Supervisor – Milwaukee County Scott Stocking, Systems Analyst – GeoAnalytics.
The North Carolina Geospatial Data Archiving Project Steven P. Morris North Carolina State University Libraries Maintaining Long-Term Access to Geospatial.
Why Archiving and Preserving GIS Data Is Important Maps tell a compelling story of change over time. They document movement, progress, and change to the.
Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users.
Collection Building Processes within the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library.
OGC ® © 2006 Open Geospatial Consortium, Inc.1 Introduction to Archives and Geospatial Issues ( Continued ) Steve Morris Head, Digital Library Initiatives.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries.
NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries.
Preserving State and Local Government Digital Geospatial Data Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries.
Collection and Preservation of At- Risk Digital Geospatial Data: North Carolina Geospatial Data Archiving Project (NDIIPP Partnership) Steve Morris Head.
Long-Term Preservation of At- Risk Digital Geospatial Data: A Cooperative Agreement with Library of Congress Steve Morris NCSU Libraries Zsolt Nagy NC.
GeoMAPP: Using Metadata to Help Preserve Geospatial Content Matt Peters, Utah’s Automated Geographic Reference Center Glen McAninch, Kentucky Department.
Preserved Digital Content: Value to Public Policy Decision Making Now and in the Future NC Geospatial Data Archiving Project (NCGDAP) North Carolina State.
Preservation of Coastal Community Geospatial Content: What's Your Long Term Care Plan For Aging Data? Jeff Essic North Carolina State University Libraries.
North Carolina Geospatial Data Archiving Project : Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Partners: NCSU.
Collection and Preservation of At- Risk Digital Geospatial Data: the North Carolina NDIIPP Project Partners: NCSU Libraries Project Lead: Steve Morris.
NCPMA Fall MeetingOctober 11, 2006 GIS Data Preservation: Partnership with Library of Congress Steve Morris North Carolina State University Libraries.
NCSU Libraries 9 October 2006 EPA Meeting Preservation Partnership with Library of Congress: NDIIPP and the North Carolina Geospatial Data Archiving Project.
Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries.
Archiving Geospatial Data: Background to the Problem Area State Government Users Committee October 16, 2008 Steve Morris, NCSU Libraries.
ESRI International Users ConferenceJune 20, 2007 Data Snapshot Archiving: A Frequency of Capture Survey Steve Morris Jeff Essic North Carolina State University.
Preserving Geospatial Data: Challenges and Opportunities Steve Morris NCSU Libraries Indo-US Workshop on Trends in Digital Preservation March 24, 2009.
Preserving Digital Geospatial Data: The NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries CRADLE.
Geospatial Data Preservation Challenges at the Sub-National Level: The North Carolina Experience Steve Morris Head of Digital Library Initiatives North.
NCSU Libraries 13 June 2006 JCDL 2006 NDIIPP Preservation Network: Progress, Problems, and Promise Jim Tuttle, Geospatial Data Librarian.
NDIIPP Project: North Carolina Geospatial Data Archiving Project Partners: NCSU Libraries Project Lead: Steve Morris NC Center for Geographic Information.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at- risk digital geospatial data Partners: NCSU Libraries Project.
GISC Seminar: Towards Uncharted GroundSeptember 29, 2006 North Carolina Partnership with Library of Congress on Long-term Preservation of Digital Geospatial.
NDIIPP Project: Collection and Preservation of At-Risk Digital Geospatial Data Partners: NCSU Libraries Project Lead: Steve Morris NC Center for Geographic.
The Disappearing Data Problem Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
Models for Shared Responsibility: Collaboration and Engagement with the NCGDAP and GeoMAPP Partnerships Steve Morris North Carolina State Libraries Zsolt.
Mountain Region GIS Advisory Council Meeting September 15, 2006 Long-Term Preservation of Digital Geospatial Data: A Cooperative Project with Library of.
Know the Earth…Show the Way NATIONAL GEOSPATIAL-INTELLIGENCE AGENCY Approved for Public Release PA Case NGA’s Standards Program Karl Koklauner Deputy.
Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at-risk digital geospatial data Partners: NCSU Libraries NC Center.
Overview: GeoMAPP Appraisal Efforts NDSA Geospatial Working Group| 27 June 2012 |
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Preservation of State and Local Government Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steven P. Morris, James Tuttle,
Preserving Digital Geospatial Data: The NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries CRADLE.
Long-Term Preservation of At-Risk Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steve Morris NCSU Libraries.
Update on Geospatial Data Preservation Efforts
Collecting Digital Content Going Forward: Lessons Learned and New Initiatives NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University.
Preserved Digital Content: Collections, Value, and Stewardship NC Geospatial Data Archiving Project (NCGDAP) North Carolina State University Libraries.
Presentation transcript:

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries NARA MeetingDec. 14, 2005

Note: Percentages based on the actual number of respondents to each question 2 Outline Digital Geospatial Data: Types Risks to Digital Geospatial Data Overview of NC Geospatial Data Archiving Project Preservation Challenges and Possible Solutions

Note: Percentages based on the actual number of respondents to each question 3 Geospatial data types: Vector data

Note: Percentages based on the actual number of respondents to each question 4 Geospatial data types: Satellite imagery

Note: Percentages based on the actual number of respondents to each question 5 Geospatial data types: Aerial imagery

Note: Percentages based on the actual number of respondents to each question 6 Geospatial data types: Aerial imagery

Note: Percentages based on the actual number of respondents to each question 7 Geospatial data types: Aerial imagery

Note: Percentages based on the actual number of respondents to each question 8 Geospatial data types: Tabular data (w/vector)

Note: Percentages based on the actual number of respondents to each question 9 Time series – vector data Parcel Boundary Changes , North Raleigh, NC

Note: Percentages based on the actual number of respondents to each question 10 Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport

Note: Percentages based on the actual number of respondents to each question 11 Today’s geospatial data as tomorrow’s cultural heritage

Note: Percentages based on the actual number of respondents to each question 12 Risks to Digital Geospatial Data.shp.mif.gml.e00.dwg.dgn.bsb.bil.sid

Note: Percentages based on the actual number of respondents to each question 13 Risks to Digital Geospatial Data Producer focus on current data Time-versioned content generally not archives Future support of data formats in question Vast range of data formats in use--complex Shift to “streaming data” for access Archives have been a by-product of providing access Preservation metadata requirements Descriptive, administrative, technical, DRM Geodatabases Complex functionality

Note: Percentages based on the actual number of respondents to each question 14 NC Geospatial Data Archiving Project Partnership between university library (NCSU) and state agency (NCCGIA) Focus on state and local geospatial content in North Carolina (state demonstration) Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory information Objective: engage existing state/federal geospatial data infrastructures in preservation

Note: Percentages based on the actual number of respondents to each question 15 Targeted Content Resource Types GIS “vector” (point/line/polygon) data Digital orthophotography Digital maps Tabular data (e.g. assessment data) Content Producers Mostly state, local, regional agencies Some university, not-for-profit, commercial Selected local federal projects

Note: Percentages based on the actual number of respondents to each question 16 Local Government GIS: Archival Issues Data resources are highly distributed and subject to frequent update More detailed, current, accurate than federal/state data resources North Carolina local agency GIS environment 100 counties, 95 with GIS 85 counties with high resolution orthophotography Growing number of municipal systems Value: $162 million plus investment (est. in 2003)

Note: Percentages based on the actual number of respondents to each question 17 Work plan in a Nutshell Work from existing data inventories NC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt” Partnership: work with existing geospatial data infrastructures (state and federal) Technical approach METS with FGDC, PREMIS?, GeoDRM? Dspace now; re-ingest to different environment Web services consumption for archival development

Note: Percentages based on the actual number of respondents to each question 18 NCGDAP Philosphy of Engagement Take the data as in the manner In which it can be obtained Provide feedback to producer organizations/ inform state geospatial infrastructure “Wrangle” and archive data Note the ‘Project’ in ‘North Carolina Geospatial Data Archiving Project’– the process, the learning experience, and the engagement with geospatial data infrastructures are more important than the archive

Note: Percentages based on the actual number of respondents to each question 19 Big Challenges Format migration paths Management of data versions over time Preservation metadata Harnessing geospatial web services Preserving cartographic representation Keeping content repository-agnostic Preserving geodatabases More …

Note: Percentages based on the actual number of respondents to each question 20 Vector Data Format Issues Vector data much more complicated than image data ‘Archiving’ vs. ‘Permanent access’ An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access Piles of XML need to be widely understood piles GML: need widely accepted application schemas (like OSMM?) The Geodatabase conundrum Export feature classes, and lose topology, annotation, relationships, etc. … or use the Geodatabase as the primary archival platform (some are now thinking this way)

Note: Percentages based on the actual number of respondents to each question 21 GIS Software Used: NC Local Agencies Source: NC OneMap Data Inventory 2004

Note: Percentages based on the actual number of respondents to each question 22 Vector Data Format Options Option A: use an open format and have a really unfortunate transformation and limited vendor support for the output object Option B: use closed format but retain the original content and count on short- and medium-term vendor support. Option C: do both to buy time and look for an open, ASCII-based solution. (watch GML activity) No sweet spot, just an evolving and changing mix of flawed options that are used in combination.

Note: Percentages based on the actual number of respondents to each question 23 Geography Markup Language Issues GML still more useful as a transfer format than an archival format, support limited even for transfer “Permanent access” requirements: profiles and application schemas widely understood and supported, avoid requiring “digital archaeology” role of GML Simple Features Profile? Assessing formats for preservation: sustainability factors, quality & functionality factors Apply same approach to GML profiles and application schemas?

Note: Percentages based on the actual number of respondents to each question 24 Geography Markup Language Issues Plans for environmental scan of existing GML profiles and application schemas or profiles schema name (e.g. OSMM, top10NL, ESRI GML, LandGML) responsible agency; schema has official government status? GML version; known unsupported GML components schema history; known interoperation with other schemas vendor support; translator support; stability over time

Note: Percentages based on the actual number of respondents to each question 25 Managing Time-versioned Content

Note: Percentages based on the actual number of respondents to each question 26 Managing Time-versioned Content Many local agency data layers continuously updated E.g., some county cadastral data updated daily— older versions not generally available Individual versioned datasets will wander off from the archive How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? How do we certify concurrency and agreement between the metadata and the data?

Note: Percentages based on the actual number of respondents to each question 27 Managing Time-versioned Content Can we manage the relationship loosely using a persistent identifier link to a parent object? version Persistent ID Resolver Parent Object Manager version

Note: Percentages based on the actual number of respondents to each question 28 Preservation Metadata Issues FGDC Metadata Many flavors, incoming metadata needs processing Cross-walk elements to PREMIS, MODS? Metadata wrapper/Content packaging METS (Metadata Encoding and Transmission Standard) vs. other industry solutions Need a geospatial industry solution for the ‘METS- like problem’ GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)

Note: Percentages based on the actual number of respondents to each question 29 Metadata Availability

Note: Percentages based on the actual number of respondents to each question 30 Harnessing Geospatial Web Services

Note: Percentages based on the actual number of respondents to each question 31

Note: Percentages based on the actual number of respondents to each question 32

Note: Percentages based on the actual number of respondents to each question 33

Note: Percentages based on the actual number of respondents to each question 34

Note: Percentages based on the actual number of respondents to each question 35

Note: Percentages based on the actual number of respondents to each question 36 Geospatial Web Service Types Image services Deliver image resulting from query against underlying data Limited opportunity for analysis Feature services Stream actual feature data, greater opportunity for data analysis Other Geocoding services Routing.etc.

Note: Percentages based on the actual number of respondents to each question 37

Note: Percentages based on the actual number of respondents to each question 38 Geospatial Web Services Rights Issues Example: Desktop GIS-accessible ArcIMS 39 of 100 NC counties have desktop GIS-accessible ArcIMS services It is difficult to know how many of these counties actually expect users to either: A) access data through desktop GIS for viewing only, or B) extract and download data

Note: Percentages based on the actual number of respondents to each question 39 Harnessing Geospatial Web Services Automated content identification ‘capabilities files,’ registries, catalog services WMS (Web Map Service) for batch extraction of image atlases last ditch capture option preserve cartographic representation retain records of decision-making process … feature services (WFS) later. Rights issues in the web services space are ambiguous

Note: Percentages based on the actual number of respondents to each question 40 “Web mash-ups” and the New Mainstream Geospatial Web Services

Note: Percentages based on the actual number of respondents to each question 41 Preserving Cartographic Representation

Note: Percentages based on the actual number of respondents to each question 42 Preserving Cartographic Representation The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: Intellectual choices about symbolization, layer combinations Data models, analysis, annotations Cartographic representation typically encoded in proprietary files (.avl,.lyr,.apr,.mxd) that do not lend themselves well to migration Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

Note: Percentages based on the actual number of respondents to each question 43 Preserving Cartographic Representation Image-based approaches Generate images using Map Book or similar tools Harvest existing atlas images Capture atlases from WMS servers Export ‘layouts’ or ‘maps’ to image Vector-based approaches Store explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2) Archive and upward-migrate existing files.avl,.apr,.lyr,.mxd, etc. SVG, VML or other XML approaches Other?

Note: Percentages based on the actual number of respondents to each question 44 Preserving Cartographic Representation

Note: Percentages based on the actual number of respondents to each question 45 Preserving Cartographic Representation

Note: Percentages based on the actual number of respondents to each question 46 Interest in how geospatial content interacts with widely available digital repository software Focus on salient, domain-specific issues Challenge: remain repository agnostic Avoid “imprinting” on repository software environment Preservation package should not be the same as the ingest object of the first environment Tension between exploiting repository software features vs. becoming software dependent Repository Architecture Issues

Note: Percentages based on the actual number of respondents to each question 47 Preserving Geodatabases Spatial databases in general vs. ESRI Geodatabase “format” Not just data layers and attributes—also topology, annotation, relationships, behaviors ESRI Geodatabase archival issues XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication Some looking to Geodatabase as archival platform (in addition to feature class export)

Note: Percentages based on the actual number of respondents to each question 48 Geodatabase Availability Local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format.

Note: Percentages based on the actual number of respondents to each question 49 Evolving Geodatabase Handling Approaches Project StagePlanned Approach Original Proposal (Nov. 2003) Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size

Note: Percentages based on the actual number of respondents to each question 50 Content replication also needed for: Disaster preparedness State and federal data improvement projects Aggregation by regional geospatial web service providers WFS, e.g.: efficiency in complete content transfer? Rsync-like function, plus: rights management, inventory processes, metadata management, informed by data update cycles Archiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future Efficient Content Replication

Note: Percentages based on the actual number of respondents to each question 51 GML for archiving GeoDRM -- Adding preservation use cases Content Packaging -- Industry solution? Web Services Context Documents Can we save data state as well as application state? Content Replication Is this layer in the architecture? Persistent Identifiers Points of Engagement with the Open Geospatial Consortium (OGC)

Note: Percentages based on the actual number of respondents to each question 52 Demonstration archive Outreach activity – planting seeds International, national, state, local, commercial Learning experience, informing: Spatial data infrastructure Commercial vendors (data/software/consulting) Repository software communities Metadata practice (both GIS & preservation) Rights management developments Data and interoperability standards Project Outcomes

Note: Percentages based on the actual number of respondents to each question 53 Content Identification and Selection Work from NC OneMap Data Inventory Combine with inventory information from various state agencies and from previous NCSU efforts Develop methodology for selecting from among “early,” “middle,” and “late” stage products Develop criteria for time series development Investigate use of emerging Open Geospatial Consortium technologies in data identification

Note: Percentages based on the actual number of respondents to each question 54 Content Acquisition Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”) Secure individual agreements (the “quilt”) Investigate use of OGC technologies in capture Explore use of METS as a metadata wrapper Ingest FGDC metadata; Xwalk to MODS? PREMIS? Maybe METS DRM short term; GeoDRM long term Consider links to services; version management Get the geospatial community to tackle the content packaging problem (maybe MPEG 21?)

Note: Percentages based on the actual number of respondents to each question 55 Partnership Building Work within context of the NC OneMap initiative State, local, federal partnership State expression of the National Map Defined characteristic: “ Historic and temporal data will be maintained and available” Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees Seek external partners National States Geographic Information Council FGDC Historical Data Committee … more

Note: Percentages based on the actual number of respondents to each question 56 Content Retention and Transfer Ingest into Dspace Explore how geospatial content interacts with existing digital repository software environments Investigate re-ingest into a second platform Challenge: keep the collection repository-agnostic Start to define format migration paths Special problem: geodatabases Purse long term solution Roles of data producing agencies, state agencies; NC OneMap; NCSU

Note: Percentages based on the actual number of respondents to each question 57 Project Status Completing inventory analysis stage Storage system and backup deployed DSpace deployed to production Metadata workflow finalized Ingest workflow near finalization Content migration workflow near finalization Regional site visits planned for coming months Wide range of outreach/collaboration: FGDC, ESRI, EDINA (JISC), USGS, OGC, TRB, etc. Pilot project, georegistering digital archival geologic maps

Note: Percentages based on the actual number of respondents to each question 58 Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries ph: (919)