Presentation is loading. Please wait.

Presentation is loading. Please wait.

Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries.

Similar presentations


Presentation on theme: "Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries."— Presentation transcript:

1 Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries EDINA National Data Centre University of Edinburgh North Carolina State University Libraries NCGDAP Architecture Working Group OGC TC/PC Meeting Bonn, 9th November 2005

2 Objectives Why we’re here……………… Introduce preservation and access use cases to OGC Find points of intersection with OGC initiatives Flesh out research agenda for preservation of geospatial digital data “Permanent access and reuse” not just preservation

3 North Carolina Preservation Partners North Carolina State University Libraries –University-wide GIS services since 1992 –New focus on publishing WMS services for use by external clients or service aggregators –Archiving local agency geospatial data since 2000 NC Center for Geographic Information & Analysis –State government GIS agency –Maintains state’s Corporate Geographic Database –Coordinates many SDI initiatives, including NC OneMap NC OneMap –Seamless access to local, state, and federal data; component part of National Map –WMS services available individually from sources or through aggregator viewer –Focus on standards, best practices, data sharing agreements, inventories, and metadata outreach

4 NC Geospatial Data Archiving Project Cooperative project with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) –One of 8 NDIIPP partnership projects, others focusing on web pages, numeric data, video, business records, etc. –Focus on developing a network of partners, identifying preservation issues in various domain areas NCGDAP: 3 year project focused on preservation of state and local agency digital geospatial data –Identify and acquire data –Develop digital repository; ingest and manage content Objective: engage existing spatial data infrastructures in process of data preservation

5 NCGDAP Project Phases Content Identification and Selection –Work from existing inventory processes –Select from among “early”, “middle”, and “late” stage information products Content Acquisition –Acquire state and local agency content –Investigate methods of automating archive development Partnership Building –Work within NC OneMap framework (infrastructure) –Several other emerging geo-preservation projects Content Retention and Transfer –Metadata and ingest workflow –Emphasis on repository-agnostic approach, avoid “imprinting” one environment –Initially using DSpace open source software, re-ingest into a different environment later

6 Common Themes – Cartographic Representation The counterpart to the map is not just the dataset but also models, symbology, interpretation. These key elements give real meaning – how are these captured for reuse?

7 Common Themes – GML for archiving? Interest in alternative to proprietary vector file formats “Permanent access” requirements: –profiles and application schemas widely understood and supported, avoid requiring “digital archaeology” –Role of GML Simple Features Specification? Assessing formats for preservation: sustainability factors, quality & functionality factors Planned environmental scan of existing GML profiles and application schemas –Collaboration with National Archives and Records Administration and FGDC Historical Data Working Group –Vendor support? Official status? Stability over time? How to handle proprietary formats? –UC Santa Barbara/Stanford NDIIPP project working on format registry –Spatial databases pose special challenges

8 Common Themes – Content replication Need efficient means to replicate content to archive –North Carolina: 100 counties and 140 municipalities Content replication also needed for: –Disaster preparedness –State and federal data improvement projects –Aggregation by regional geospatial web service providers WFS, e.g.: efficiency in complete content transfer? Rsync-like function, plus: rights management, inventory processes, metadata management, informed by data update cycles Archiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future Other models: LOCKSS (Lots of Copies Keeps Stuff Safe)

9 Common Themes – Time versioning How to manage datasets that change over time? –Versions will live in different repositories, must handle relationships outside of the individual repository Industry focus on most current data … but increased demand for temporal data –e.g., land use change detection, business trends analysis –Much older data lost -- “Digital dark age” Draft NCGDAP approach: manage information for “serial objects” separately, link to serial entity via persistent identifier (Handle) –Support “get current data/metadata/DRM” operations –Avoid managing volatile information (e.g., service connections) in individual static metadata records –Other technologies: OpenURL for service connections?

10 EDINA A National Data Centre for Tertiary Education since 1995 –based at the University of Edinburgh Data Library Our mission... to enhance the productivity of research, learning and teaching in UK higher and further education GeoServices team - provide SDI components to UK academic sector Substantial experience in handling and delivering key geospatial data and geo-referenced information OGC members since 1999 Strategic move toward interoperability & shared services role – use of OGC interface specifications in our projects and services

11 GRADE project introduction According to OECD Follow up Group on Issues of Access to Publicly Funded Research Data 1 … “More widespread and efficient access to and sharing of research data will have substantial benefits for most areas of scientific research.” Evidence of re-use of data within UK data centres is low: –“Level of re-use of data held in the AHDS and ESRC archives has been disappointingly low” (Alison Allden, 2003) –“NERC spends about £5 million per annum on data management, but unclear what benefit it derives from this. More research is needed to establish benefits and value of data re-use” (Mark Thorley, 2003) –Qualidata survey of qualitative data re-use (2000). 44% respondents used colleague's data rather than acquiring archived data via a dissemination service (33%) 1 Interim Report, 20 October 2002

12 GRADE project introduction Within UK academia there is a focus on the potential use of digital repositories to assist with a variety of facets of digital asset management including encouraging reuse of research data GRADE will investigate and report on the technical and cultural issues around the reuse of geospatial data within the context of discipline-based repositories Particular focus on sharing and reuse of derived geospatial data EDINA leading GRADE with consortium partners: –AHRC Research Centre for Studies in Intellectual Property and Technology Law, School of Law, Edinburgh University –National Oceanography Centre, Southampton University –Variety of other associate partners including NCGDAP, British Atmospheric Data Centre, Ordnance Survey

13 Common Themes – Digital Rights UK environment, a complex one –dominant provider of base vector geospatial data provider –array of space borne survey data available, much free for non- commercial use –Stakeholder interest from research funders (research councils) and research hosts (institutions) When we consider the reuse of derived geospatial data concerns over data ownership, IPR and copyright often suppress any initial enthusiasm We can offer the geoDRM discussion real scenarios of –IPR issues for derived geospatial data and –Geospatial data reuse/sharing use cases

14 Derived Data Example OS Landline Digitise coastline positions Input Processing Output ESRI Shapefile and tables of retreat Ground surveyHistoric OS Maps 2001 Orthophotos Scan Geo- reference Accuracy assessment Planimetric correction GPS survey Calculation of cliff retreat Source: Use case provision of derived geospatial data as part of the GRADE project in scoping digital repositories (draft report)

15 Common Themes – Content Packaging Consider a geospatial data asset deposited into a repository, it’s more than one file: –GML and associated schema! –proprietary vector format plus cartographic representation detail –geodatabase –raster with header file –Data set metadata and IPR info What is best method to package data? In eLibrary world the Metadata Encoding and Transmission Standard (METS) and IMS content package (IMS CP) and MPEG-21 DIDL for repository objects “Interoperable repositories need to encode, exchange and describe complex objects in agreed ways” What direction is the GI industry taking with content packaging?

16 Common Themes – Persistent Identifiers Once a geospatial data asset is deposited within a repository, there is a need to be able to persistently identify this asset Particular repository softwares use particular schemes e.g. Fedora uses ‘info’ URI scheme Requirement to ensure identifier is actionable We are thinking about OpenURL Resolvers and perhaps Digital Object Identifier (DOI) for handle schemes What direction is GI industry taking with persistent identifiers?

17 Common Themes – ‘data plus services’ model National Library of New Zealand http://wiki.tertiary.govt.nz/static/wikifarm/InstitutionalRepositories.uploads/Main/IR_report.pdf

18 Conclusions Aim is to flesh out research agenda Presented 7 common themes from our work Shift to web services consumption poses threat to secondary archive development … but can geospatial web services be put to use in preservation processes? Encourage GI community to connect with these issues or outcome may be that archive community will fail to take account of OGC work Where to from here?

19 Contact details Anne Robertson GRADE Project Manager Edina National Data Centre a.m.robertson@ed.ac.uk GRADE web site: http://edina.ac.uk/projects/gradehttp://edina.ac.uk/projects/grade Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Steven_morris@ncsu.edu NCGDAP web site: http://www.lib.ncsu.edu/ncgdap/http://www.lib.ncsu.edu/ncgdap/ Questions?


Download ppt "Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries."

Similar presentations


Ads by Google