Download presentation
Presentation is loading. Please wait.
Published byAndrew Ward Modified over 9 years ago
1
Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries NARA MeetingDec. 14, 2005
2
Note: Percentages based on the actual number of respondents to each question 2 Outline Digital Geospatial Data: Types Risks to Digital Geospatial Data Overview of NC Geospatial Data Archiving Project Preservation Challenges and Possible Solutions
3
Note: Percentages based on the actual number of respondents to each question 3 Geospatial data types: Vector data
4
Note: Percentages based on the actual number of respondents to each question 4 Geospatial data types: Satellite imagery
5
Note: Percentages based on the actual number of respondents to each question 5 Geospatial data types: Aerial imagery
6
Note: Percentages based on the actual number of respondents to each question 6 Geospatial data types: Aerial imagery
7
Note: Percentages based on the actual number of respondents to each question 7 Geospatial data types: Aerial imagery
8
Note: Percentages based on the actual number of respondents to each question 8 Geospatial data types: Tabular data (w/vector)
9
Note: Percentages based on the actual number of respondents to each question 9 Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC
10
Note: Percentages based on the actual number of respondents to each question 10 Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002
11
Note: Percentages based on the actual number of respondents to each question 11 Today’s geospatial data as tomorrow’s cultural heritage
12
Note: Percentages based on the actual number of respondents to each question 12 Risks to Digital Geospatial Data.shp.mif.gml.e00.dwg.dgn.bsb.bil.sid
13
Note: Percentages based on the actual number of respondents to each question 13 Risks to Digital Geospatial Data Producer focus on current data Time-versioned content generally not archives Future support of data formats in question Vast range of data formats in use--complex Shift to “streaming data” for access Archives have been a by-product of providing access Preservation metadata requirements Descriptive, administrative, technical, DRM Geodatabases Complex functionality
14
Note: Percentages based on the actual number of respondents to each question 14 NC Geospatial Data Archiving Project Partnership between university library (NCSU) and state agency (NCCGIA) Focus on state and local geospatial content in North Carolina (state demonstration) Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory information Objective: engage existing state/federal geospatial data infrastructures in preservation
15
Note: Percentages based on the actual number of respondents to each question 15 Targeted Content Resource Types GIS “vector” (point/line/polygon) data Digital orthophotography Digital maps Tabular data (e.g. assessment data) Content Producers Mostly state, local, regional agencies Some university, not-for-profit, commercial Selected local federal projects
16
Note: Percentages based on the actual number of respondents to each question 16 Local Government GIS: Archival Issues Data resources are highly distributed and subject to frequent update More detailed, current, accurate than federal/state data resources North Carolina local agency GIS environment 100 counties, 95 with GIS 85 counties with high resolution orthophotography Growing number of municipal systems Value: $162 million plus investment (est. in 2003)
17
Note: Percentages based on the actual number of respondents to each question 17 Work plan in a Nutshell Work from existing data inventories NC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt” Partnership: work with existing geospatial data infrastructures (state and federal) Technical approach METS with FGDC, PREMIS?, GeoDRM? Dspace now; re-ingest to different environment Web services consumption for archival development
18
Note: Percentages based on the actual number of respondents to each question 18 NCGDAP Philosphy of Engagement Take the data as in the manner In which it can be obtained Provide feedback to producer organizations/ inform state geospatial infrastructure “Wrangle” and archive data Note the ‘Project’ in ‘North Carolina Geospatial Data Archiving Project’– the process, the learning experience, and the engagement with geospatial data infrastructures are more important than the archive
19
Note: Percentages based on the actual number of respondents to each question 19 Big Challenges Format migration paths Management of data versions over time Preservation metadata Harnessing geospatial web services Preserving cartographic representation Keeping content repository-agnostic Preserving geodatabases More …
20
Note: Percentages based on the actual number of respondents to each question 20 Vector Data Format Issues Vector data much more complicated than image data ‘Archiving’ vs. ‘Permanent access’ An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access Piles of XML need to be widely understood piles GML: need widely accepted application schemas (like OSMM?) The Geodatabase conundrum Export feature classes, and lose topology, annotation, relationships, etc. … or use the Geodatabase as the primary archival platform (some are now thinking this way)
21
Note: Percentages based on the actual number of respondents to each question 21 GIS Software Used: NC Local Agencies Source: NC OneMap Data Inventory 2004
22
Note: Percentages based on the actual number of respondents to each question 22 Vector Data Format Options Option A: use an open format and have a really unfortunate transformation and limited vendor support for the output object Option B: use closed format but retain the original content and count on short- and medium-term vendor support. Option C: do both to buy time and look for an open, ASCII-based solution. (watch GML activity) No sweet spot, just an evolving and changing mix of flawed options that are used in combination.
23
Note: Percentages based on the actual number of respondents to each question 23 Geography Markup Language Issues GML still more useful as a transfer format than an archival format, support limited even for transfer “Permanent access” requirements: profiles and application schemas widely understood and supported, avoid requiring “digital archaeology” role of GML Simple Features Profile? Assessing formats for preservation: sustainability factors, quality & functionality factors Apply same approach to GML profiles and application schemas?
24
Note: Percentages based on the actual number of respondents to each question 24 Geography Markup Language Issues Plans for environmental scan of existing GML profiles and application schemas or profiles schema name (e.g. OSMM, top10NL, ESRI GML, LandGML) responsible agency; schema has official government status? GML version; known unsupported GML components schema history; known interoperation with other schemas vendor support; translator support; stability over time
25
Note: Percentages based on the actual number of respondents to each question 25 Managing Time-versioned Content
26
Note: Percentages based on the actual number of respondents to each question 26 Managing Time-versioned Content Many local agency data layers continuously updated E.g., some county cadastral data updated daily— older versions not generally available Individual versioned datasets will wander off from the archive How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? How do we certify concurrency and agreement between the metadata and the data?
27
Note: Percentages based on the actual number of respondents to each question 27 Managing Time-versioned Content Can we manage the relationship loosely using a persistent identifier link to a parent object? version Persistent ID Resolver Parent Object Manager version
28
Note: Percentages based on the actual number of respondents to each question 28 Preservation Metadata Issues FGDC Metadata Many flavors, incoming metadata needs processing Cross-walk elements to PREMIS, MODS? Metadata wrapper/Content packaging METS (Metadata Encoding and Transmission Standard) vs. other industry solutions Need a geospatial industry solution for the ‘METS- like problem’ GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)
29
Note: Percentages based on the actual number of respondents to each question 29 Metadata Availability
30
Note: Percentages based on the actual number of respondents to each question 30 Harnessing Geospatial Web Services
31
Note: Percentages based on the actual number of respondents to each question 31
32
Note: Percentages based on the actual number of respondents to each question 32
33
Note: Percentages based on the actual number of respondents to each question 33
34
Note: Percentages based on the actual number of respondents to each question 34
35
Note: Percentages based on the actual number of respondents to each question 35
36
Note: Percentages based on the actual number of respondents to each question 36 Geospatial Web Service Types Image services Deliver image resulting from query against underlying data Limited opportunity for analysis Feature services Stream actual feature data, greater opportunity for data analysis Other Geocoding services Routing.etc.
37
Note: Percentages based on the actual number of respondents to each question 37
38
Note: Percentages based on the actual number of respondents to each question 38 Geospatial Web Services Rights Issues Example: Desktop GIS-accessible ArcIMS 39 of 100 NC counties have desktop GIS-accessible ArcIMS services It is difficult to know how many of these counties actually expect users to either: A) access data through desktop GIS for viewing only, or B) extract and download data
39
Note: Percentages based on the actual number of respondents to each question 39 Harnessing Geospatial Web Services Automated content identification ‘capabilities files,’ registries, catalog services WMS (Web Map Service) for batch extraction of image atlases last ditch capture option preserve cartographic representation retain records of decision-making process … feature services (WFS) later. Rights issues in the web services space are ambiguous
40
Note: Percentages based on the actual number of respondents to each question 40 “Web mash-ups” and the New Mainstream Geospatial Web Services
41
Note: Percentages based on the actual number of respondents to each question 41 Preserving Cartographic Representation
42
Note: Percentages based on the actual number of respondents to each question 42 Preserving Cartographic Representation The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: Intellectual choices about symbolization, layer combinations Data models, analysis, annotations Cartographic representation typically encoded in proprietary files (.avl,.lyr,.apr,.mxd) that do not lend themselves well to migration Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem
43
Note: Percentages based on the actual number of respondents to each question 43 Preserving Cartographic Representation Image-based approaches Generate images using Map Book or similar tools Harvest existing atlas images Capture atlases from WMS servers Export ‘layouts’ or ‘maps’ to image Vector-based approaches Store explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2) Archive and upward-migrate existing files.avl,.apr,.lyr,.mxd, etc. SVG, VML or other XML approaches Other?
44
Note: Percentages based on the actual number of respondents to each question 44 Preserving Cartographic Representation
45
Note: Percentages based on the actual number of respondents to each question 45 Preserving Cartographic Representation
46
Note: Percentages based on the actual number of respondents to each question 46 Interest in how geospatial content interacts with widely available digital repository software Focus on salient, domain-specific issues Challenge: remain repository agnostic Avoid “imprinting” on repository software environment Preservation package should not be the same as the ingest object of the first environment Tension between exploiting repository software features vs. becoming software dependent Repository Architecture Issues
47
Note: Percentages based on the actual number of respondents to each question 47 Preserving Geodatabases Spatial databases in general vs. ESRI Geodatabase “format” Not just data layers and attributes—also topology, annotation, relationships, behaviors ESRI Geodatabase archival issues XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication Some looking to Geodatabase as archival platform (in addition to feature class export)
48
Note: Percentages based on the actual number of respondents to each question 48 Geodatabase Availability Local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format.
49
Note: Percentages based on the actual number of respondents to each question 49 Evolving Geodatabase Handling Approaches Project StagePlanned Approach Original Proposal (Nov. 2003) Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size
50
Note: Percentages based on the actual number of respondents to each question 50 Content replication also needed for: Disaster preparedness State and federal data improvement projects Aggregation by regional geospatial web service providers WFS, e.g.: efficiency in complete content transfer? Rsync-like function, plus: rights management, inventory processes, metadata management, informed by data update cycles Archiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future Efficient Content Replication
51
Note: Percentages based on the actual number of respondents to each question 51 GML for archiving GeoDRM -- Adding preservation use cases Content Packaging -- Industry solution? Web Services Context Documents Can we save data state as well as application state? Content Replication Is this layer in the architecture? Persistent Identifiers Points of Engagement with the Open Geospatial Consortium (OGC)
52
Note: Percentages based on the actual number of respondents to each question 52 Demonstration archive Outreach activity – planting seeds International, national, state, local, commercial Learning experience, informing: Spatial data infrastructure Commercial vendors (data/software/consulting) Repository software communities Metadata practice (both GIS & preservation) Rights management developments Data and interoperability standards Project Outcomes
53
Note: Percentages based on the actual number of respondents to each question 53 Content Identification and Selection Work from NC OneMap Data Inventory Combine with inventory information from various state agencies and from previous NCSU efforts Develop methodology for selecting from among “early,” “middle,” and “late” stage products Develop criteria for time series development Investigate use of emerging Open Geospatial Consortium technologies in data identification
54
Note: Percentages based on the actual number of respondents to each question 54 Content Acquisition Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”) Secure individual agreements (the “quilt”) Investigate use of OGC technologies in capture Explore use of METS as a metadata wrapper Ingest FGDC metadata; Xwalk to MODS? PREMIS? Maybe METS DRM short term; GeoDRM long term Consider links to services; version management Get the geospatial community to tackle the content packaging problem (maybe MPEG 21?)
55
Note: Percentages based on the actual number of respondents to each question 55 Partnership Building Work within context of the NC OneMap initiative State, local, federal partnership State expression of the National Map Defined characteristic: “ Historic and temporal data will be maintained and available” Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees Seek external partners National States Geographic Information Council FGDC Historical Data Committee … more
56
Note: Percentages based on the actual number of respondents to each question 56 Content Retention and Transfer Ingest into Dspace Explore how geospatial content interacts with existing digital repository software environments Investigate re-ingest into a second platform Challenge: keep the collection repository-agnostic Start to define format migration paths Special problem: geodatabases Purse long term solution Roles of data producing agencies, state agencies; NC OneMap; NCSU
57
Note: Percentages based on the actual number of respondents to each question 57 Project Status Completing inventory analysis stage Storage system and backup deployed DSpace deployed to production Metadata workflow finalized Ingest workflow near finalization Content migration workflow near finalization Regional site visits planned for coming months Wide range of outreach/collaboration: FGDC, ESRI, EDINA (JISC), USGS, OGC, TRB, etc. Pilot project, georegistering digital archival geologic maps
58
Note: Percentages based on the actual number of respondents to each question 58 Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries ph: (919) 515-1361 Steven_Morris@ncsu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.