Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries.

Similar presentations

Presentation on theme: "Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries."— Presentation transcript:

1 Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries

2 Note: Percentages based on the actual number of respondents to each question 2 NC Geospatial Data Archiving Project (NCGDAP) Partnership between NCSU Libraries and NC Center for Geographic Information & Analysis $520,000 funding – 3 years Focus on state and local geospatial content in North Carolina (state demonstration) Address NC OneMap objective: “Historic and temporal data will be maintained and available.” One of eight projects in the first NDIIPP funding round: “Building a Network of Partners”

3 Note: Percentages based on the actual number of respondents to each question 3

4 4 NDIIPP Overview National Digital Information Infrastructure and Preservation Program Congress appropriated $100 million for this effort, which instructs the Library to spend an initial $25 million to develop and execute a congressionally approved strategic plan Eight initial projects, 2004-2007: web pages, cultural heritage, numeric data, video, business records, mixed content, geospatial (2) Developing partnerships and identifying issues Extensive interaction among NDIIPP projects

5 Note: Percentages based on the actual number of respondents to each question 5 Targeted Content Resource Types GIS “vector” (point/line/polygon) data Digital orthophotography Digital maps Tabular data (e.g. assessment data) Content Producers Mostly state, local, regional agencies Some university, not-for-profit, commercial Selected local federal projects

6 Note: Percentages based on the actual number of respondents to each question 6 Risks to Digital Geospatial Data.shp.mif.gml.e00.dwg.dgn.bsb.bil.sid

7 Note: Percentages based on the actual number of respondents to each question 7 Risks to Digital Geospatial Data Focus on current data Archiving data does not guarantee “permanent access” Future support of data formats in question Need to migrate formats or allow for emulation Data failure “Bit rot”, media failure Preservation metadata requirements Descriptive, administrative, technical, DRM Shift to “streaming data” for access

8 Note: Percentages based on the actual number of respondents to each question 8 Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC

9 Note: Percentages based on the actual number of respondents to each question 9 Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002

10 Note: Percentages based on the actual number of respondents to each question 10 Today’s geospatial data as tomorrow’s cultural heritage

11 Note: Percentages based on the actual number of respondents to each question 11 Earlier NCSU Acquisition Efforts NCSU University Extension project 2000-2001 Target: County/city data in eastern NC “Digital rescue” not “digital preservation” Project learning outcomes Confirmed concerns about long term access Need for efficient inventory/acquisition Wide range in rights/licensing Need to work within statewide infrastructure Acquired experience; unanticipated collaboration

12 Note: Percentages based on the actual number of respondents to each question 12 One Earlier Project Outcome: Directory of County and City Services Among top 15 most used resources on library web site 99.5% of directory users from outside

13 Note: Percentages based on the actual number of respondents to each question 13 NDIIPP Project Phases Content Identification and Selection Content Acquisition Partnership Building Content Retention and Transfer All 8 NDIIPP cooperative projects adhere to this structure

14 Note: Percentages based on the actual number of respondents to each question 14 Content Identification and Selection Work from NC OneMap Data Inventory Combine with inventory information from various state agencies and from previous NCSU efforts Develop methodology for selecting from among “early,” “middle,” and “late” stage products Develop criteria for time series development Investigate use of emerging Open Geospatial Consortium technologies in data identification

15 Note: Percentages based on the actual number of respondents to each question 15 Content Acquisition Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”) Secure individual agreements (the “quilt”) Investigate use of OGC technologies in capture Use METS (Metadata Encoding and Transfer Standard) as a metadata wrapper Bundle data files, metadata, ancillary documentation Supplement FGDC metadata with additional administrative, technical, and descriptive metadata Encode rights (Digital Rights Management – DRM) Links to services

16 Note: Percentages based on the actual number of respondents to each question 16 Partnership Building Work within context of the NC OneMap initiative Explore state, local, federal partnerships Defined characteristic: “ Historic and temporal data will be maintained and available” Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees Seek external partners National States Geographic Information Council FGDC Historical Data Committee … more

17 Note: Percentages based on the actual number of respondents to each question 17 Content Retention and Transfer Ingest into Dspace open source digital repository software Look more generically at the issue of putting geospatial content into digital repositories Investigate re-ingest into a second platform Start to define format migration paths Special problem: geodatabases Purse long term solution Roles of data producing agencies, state agencies; NC OneMap; NCSU

18 Note: Percentages based on the actual number of respondents to each question 18 Big Geoarchiving Challenges Format migration paths Management of data versions over time Preservation metadata Preserving cartographic representation Keeping content repository-agnostic Preserving geodatabases Harnessing geospatial web services More …

19 Note: Percentages based on the actual number of respondents to each question 19 Vector Data Format Issues Vector data much more complicated than image data ‘Preservation’ vs. ‘Permanent access’ An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access Piles of XML need to be widely understood piles GML: need widely accepted application schemas (like OSMM?) The Geodatabase conundrum Export feature classes, and lose topology, annotation, relationships, etc. … or use the Geodatabase as the primary archival platform (some are now thinking this way)

20 Note: Percentages based on the actual number of respondents to each question 20 Geography Markup Language Issues GML still more useful as a transfer format than an archival format, support limited even for transfer FGDC Historical Data Working Group investigations into GML for use in archiving Plans for environmental scan of existing GML profiles and application schemas or profiles schema name (e.g. OSMM, top10NL, ESRI GML, LandGML) responsible agency; scheme has official government status? GML version; known unsupported GML components schema history; known interoperation with other schemas vendor support; translator support

21 Note: Percentages based on the actual number of respondents to each question 21 Managing Time-versioned Content Many local agency data layers continuously updated Older versions not generally available Individual versioned datasets will wander off from the archive How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? How do we certify concurrency and agreement between the metadata and the data?

22 Note: Percentages based on the actual number of respondents to each question 22 Preservation Metadata Issues FGDC Metadata Many flavors, incoming metadata needs processing Other standards: PREMIS, MODS Metadata wrapper METS (Metadata Encoding and Transmission Standard) vs. other industry solutions Need a geospatial industry solution for the ‘METS- like problem’ GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGC Web Services 3)

23 Note: Percentages based on the actual number of respondents to each question 23 Preserving Cartographic Representation The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: Intellectual choices about symbolization, layer combinations Data models, analysis, annotations Cartographic representation typically encoded in proprietary files (.avl,.lyr,.apr,.mxd) that do not lend themselves well to migration Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

24 Note: Percentages based on the actual number of respondents to each question 24 Preserving Cartographic Representation

25 Note: Percentages based on the actual number of respondents to each question 25 Preserving Cartographic Representation Image-based approaches (“dessicated data”) Generate images using Map Book or similar tools Harvest existing atlas images Capture atlases from WMS servers Export ‘layouts’ or ‘maps’ to image Vector-based approaches Store explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2) Archive and upward-migrate existing files.avl,.apr,.lyr,.mxd, etc. SVG, VML or other XML approaches Other?

26 Note: Percentages based on the actual number of respondents to each question 26 Preserving Cartographic Representation

27 Note: Percentages based on the actual number of respondents to each question 27 Preserving Cartographic Representation

28 Note: Percentages based on the actual number of respondents to each question 28 Preserving Geodatabases Not just data layers and attributes—also topology, annotation, relationships, behaviors ESRI Geodatabase archival issues XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication Growing use of geodatabases by municipal, county agencies Some looking to Geodatabase as archival platform (in addition to feature class export)

29 Note: Percentages based on the actual number of respondents to each question 29 Geodatabase Availability According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format.

30 Note: Percentages based on the actual number of respondents to each question 30 Evolving Geodatabase Handling Approaches Project StagePlanned Approach Original Proposal (Nov. 2003) Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size

31 Note: Percentages based on the actual number of respondents to each question 31 Harnessing Geospatial Web Services Automated content identification ‘capabilities files,’ registries, catalog services WMS (Web Map Service) for batch extraction of image atlases last ditch capture option preserve cartographic representation retain records of decision-making process … feature services (WFS) later. Rights issues in the web services space are ambiguous

32 Note: Percentages based on the actual number of respondents to each question 32 Partnerships ESRI Discussing software requirements: meetings with development teams April 2005 Open Geospatial Consortium (OGC) Meet with Architecture Working Group Nov. 2005 National Archives and Records Administration Investigations into GML for archiving; planned presentation to NARA technology team FGDC Historical Data Working Group General geospatial data preservation issues

33 Note: Percentages based on the actual number of respondents to each question 33 Partnerships EDINA (University of Edinburgh, UK) NCSU is Associate Partner on UK project for geospatial institutional repositories UC Santa Barbara & Stanford University Other NDIIPP geospatial project EROS Data Center Planned site visit Project visits to regional GIS groups Albemarle Regional GIS meeting Nov. 3 More planned …

34 Note: Percentages based on the actual number of respondents to each question 34 Progress to Date Completion of project agreements Hiring staff Acquisition and deployment of storage system (12.4 TB capacity – two 16.8 TB systems) Testing and deployment of repository software Development of metadata workflow Development of ingest workflow Pilot project with NC Geologic Survey data … Initial focus on developing the “plumbing”

35 Note: Percentages based on the actual number of respondents to each question 35 Questions for You? What are your current practices for: Archiving data and managing time versions Managing geodatabase versions Transfer mechanisms for data to regional entities? to off-site storage for disaster recovery? Archiving project files and finished products What rights issues exist with regard to putting county and city data into an archive? What would you like this project to do?

36 Note: Percentages based on the actual number of respondents to each question 36 Ways to Participate in NCGDAP Identifying data for inclusion in the repository Discussing data format strategies Sharing ideas about archiving approaches and architectures Sharing and identifying concerns about rights issues, liability, etc. Host project visits to regional GIS groups Use Local Government GIS listserv to discuss preservation issues?

37 Note: Percentages based on the actual number of respondents to each question 37 Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries

Download ppt "Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries."

Similar presentations

Ads by Google