Jim Tuttle North Carolina State University Libraries

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
JTX Overview Overview of Job Tracking for ArcGIS (JTX)
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
US Army Corps of Engineers BUILDING STRONG ® Creating a Data Dictionary for Your Local Data USACE SDSFIE Training Prerequisites: Preparing Your Local Data.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
19 th Advanced Summer School in Regional Science An introduction to GIS using ArcGIS.
Esri UC 2014 | Technical Workshop | Leveraging Metadata Standards for Supporting Interoperability in ArcGIS Aleta Vienneau, David Danko.
Esri UC 2014 | Technical Workshop | Working with Metadata in ArcGIS Aleta Vienneau.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Implementing ISO Aleta Vienneau and David Danko ESRI.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
North Carolina Geospatial Data Archiving Project (NCGDAP) Project Overview Partnership –University library (NCSU) and state agency (NCCGIA) –$520,000 funding,
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
PeDALS Persistent Digital Archives & Library System Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library,
Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris.
Copyright © 2005 Trimble Navigation Ltd. All rights reserved GPS Processing within the Geodatabase.
Finding a New Way Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records Using.
Fundamentals of XML Management Greg Alexopoulos Systems Engineer Documentum.
State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners.
NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User.
North Carolina Geospatial Data Archiving Project (NCGDAP) JISC/NDIIPP Joint Digital Preservation Workshop – May 2006 Presented by: Rob Farrell, Steve Morris,
Data Interoperability Basics Bruce Harold & Dale Lutz.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,
Mind Your Metadata Geri Miller. Metadata in ArcGIS ArcGIS metadata goals Editing metadata Setting your metadata style Leveraging metadata in ArcGIS Importing.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
AIP Backup & Restore Sunita Barve NCRA, Pune. AIP The latest version of DSpace 1.7.0, supports backup and restore of all its contents as a set of AIP.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
GeoMAPP: Using Metadata to Help Preserve Geospatial Content Matt Peters, Utah’s Automated Geographic Reference Center Glen McAninch, Kentucky Department.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
NCPMA Fall MeetingOctober 11, 2006 GIS Data Preservation: Partnership with Library of Congress Steve Morris North Carolina State University Libraries.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Sharing Workflows with.
Persistent Digital Archives and Library System (PeDALS)
ATN GIS Support Introduction to ArcGIS.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
NCSU Libraries 13 June 2006 JCDL 2006 NDIIPP Preservation Network: Progress, Problems, and Promise Jim Tuttle, Geospatial Data Librarian.
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion MEDIN Workshop BGS, Edinburgh, June 2015.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
Robert Aydelotte ExxonMobil - Upstream Technical Computing 13 May 2004 Standardizing Fluid Property Reporting.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Learning Objectives Understand the concepts of Information systems.
Repository-specific Spoke Scripts Content Repository JSR-170/283 Content Repository for Java Technology API Normalized H&S METS Files METS Import/ExportMETS.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Working with Metadata.
Nancy J. Hoebelheinrich, Metadata Coordinator, Stanford University 1 Metadata for the NGDA: Developing a Shared Approach Joint UCSB / Stanford meeting.
North Carolina Geospatial Data Archiving Project/NDIIPP: Collection and preservation of at-risk digital geospatial data Partners: NCSU Libraries NC Center.
Joint Meeting of CSUL Committees,
Ingest and Dissemination with DAITSS
Tools for identifying duplicate files and known software files
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
Data Sharing We all need data
DAITSS and the Florida Digital Archive
Active Data Management in Space 20m DG
Preservation of State and Local Government Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project Steven P. Morris, James Tuttle,
CS 501: Software Engineering Fall 1999
Chapter 1: Introduction to Computers and Programming
Geospatial Metadata, Standards and Infrastructure
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Software Development Process Using UML Recap
Overview of Computer system
Esri Production Mapping: An Introduction
Presentation transcript:

Jim Tuttle North Carolina State University Libraries Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project Jim Tuttle North Carolina State University Libraries

Process Overview Data transfer Threat and format analysis, validation Archive package organization Selective format migration Metadata normalization and supplementation Source metadata translation Statistics collection Extra-repository AIP management This presentation discusses some of the tools created by the NCGDAP to facilitate archive ingest workflow. They are organized here by process.

Data Transfer Python Md5sum comparison 'Transfer set' metadata capture in 'Seed file' The copy process from external drives and optical disks is handled by a python script. It first creates a manifest of the contents including MD5 checksums, it then copies all files, creating and comparing checksums of each file as it is copied. The operator is warned after 3 failed attempts to recopy files that fail the comparison. As the data set, known as a 'transfer set', is copied to the local server, the operator captures metadata concerning the set, such as from whom it was acquired, under what circumstances it was obtained, and the permissions that apply to the set, and enters it into this form which wraps it in XML. The community and collection menus are dynamically populated from the DSpace database. This form creates both machine-actionable, unix-style and text-based, human-readable permissions. Although we don't currently utilize the permissions, we expect that they might be important for controlling access in the future.

Threat and format analysis, validation Python wrappers for the following: Virus – ClamAV Compressed files (tar, zip, gzip, bzip)‏ Geodatabases (extension and size)‏ Executable files (magic numbers)‏ Jhove validation We've written python classes to wrap some utilities such as ClamAV anti-virus; the unix 'file' utility used to identify executable files using the magic numbers; and Jhove used for format validation although it doesn't currently recognize most geospatial formats. The script scans for files that require early intervention including compressed files and geodatabases. Compressed archives are handled manually due to the inconsistency of unpacking locations.

Archive package organization ESRI ArcGIS toolbar for selected formats We have a three-stage archive item organization process. The first stage is handled by a custom toolbar written for ESRI's ArcCatalog. The primary purpose of the toolbar is format conversion, but it also provides some metadata and organization support. We plan to rewrite this as a Visual Basic extension to make it more portable.

Archive package organization Rule-based python logic filestem extension relationships ( multi-file format validation)‏ directory structure Manual intervention metadata.doc NOID assignment The second stage is a python script that builds a comma-delimited file of suggested item groupings based on several factors such as non-unique filestems and directory structure (coverage 'info' directories). The script then does a complex-format validation to ensure that required files for each format are present. The operator can then import the suggested item groupings into a spreadsheet to determine the accuracy of the groupings. Thespreadsheet, with modifications, if necessary, is then used to feed the item building script. As items are grouped, a NOID persistent identifier is assigned to each item.

Selective Format Migration Coversions using ArcGIS toolbar e00 interchange to coverage to shapefile geodatabase to raster, shapefile, etc Original files retained We've identified the shapefile as being likely to survive the longest and retaining the most functionality. We always retain the original formats.

Metadata Normalization & Supplementation Agency-specific XML templates in ArcCatalog with synchronization flags Provenance and curation metadata scripted Often, when metadata is present, it is minimal or incorrect. By using synchronization flags in an agency-specific metadata template, we modify only select elements. ArcCatalog automates some metadata augmentation, but some is handled by python. All FGDC metadata is updated to reflect NCGDAP aquisition.

Source Metadata Translation Hub-and-spoke model a la Echo Depository repository agnostic modular conversion hub facilitate repository software migration & inter-archive exchange We've taken inspiration from UIUC and implemented a simple hub-and-spoke metadata translation process. We've created a central hub with which schema-specific spokes can inter-operate. We've tried to abstract out the repository as much as possible. This approach should facilitate archive exchange. Currently we have input spokes for FGDC and our seed file metadata and output spokes for QDC and our workflow management database.

Statistics Collection Python scripted statistics generation: number of files by format cumulative size by format mean file size collection size agency contribution The processing scripts contain functions to capture statistics about transfer sets and about the processes themselves. This information is stored in the workflow management database.

Extra-repository AIP management Workflow Management Database populated as a spoke on the metadata/ingest hub External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems The workflow management database is simple MySQL database that serves several functions. First, it provides insurance against Dspace/Postgres failure. With the WMD and our files on disk, we can rebuild our archive. It allows us to generate reports concerning the content of the archive. It may also provide a means to eventually integrate the data collected in the project into university collections by using existing search tools.

Questions? Jim Tuttle Geospatial Data Librarian &Project Coordinator NCGDAP NCSU Libraries jim_tuttle at ncsu dot edu http://www.lib.ncsu.edu/ncgdap/