Download presentation
Presentation is loading. Please wait.
Published byAvice Curtis Modified over 9 years ago
1
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries
2
NCSU Libraries How the Data is Received Data is delivered as is – no control over organization of received data Contributing organizations –County and municipal agencies –State agencies –Regional councils of government Data transfer modes –CD/DVD, External Drive –FTP or Web Download
3
NCSU Libraries Ingest Challenges: General Data consists of multi-file, multi-format objects Ancillary data files can be shared by datasets Some formats require conversion now Some format conversions involve one-to-many relationships Compressed archive files are common and behave unpredictably And all the usual challenges: format validation, validity checking, threat scanning,…
4
NCSU Libraries Ingest Challenges: Metadata Metadata is encoded in a variety or ways –The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), will soon be addressed in ISO 19115/19139 FGDC implementation –XML (varied schemas), TXT, HTML Metadata is missing –Only about 25% of local agencies use FGDC Metadata is wrong –Metadata is commonly asynchronous with the data Inconsistent use of dataset naming, etc.
5
NCSU Libraries Some Key Decisions Capture “transfer set” metadata Normalize, synchronize, and remediate existing metadata, and retain original metadata record Treat contact information as archival Update metadata with format conversions Use ESRI Profile of FGDC –added technical and administrative elements –Has an XML schema –ArcCatalog tool support Use simple rights encoding scheme Record metadata in a workflow management database
6
NCSU Libraries What is Transfer Set Metadata? Administrative and technical metadata associated with a transfer device or download Propagates to individual data objects PHP Application Interface for Transfer Set Metadata Capture
7
NCSU Libraries If No Metadata, What Then? Autoextract a subset of technical and descriptive metadata through ArcCatalog Apply an agency-specific metadata template (many elements are static within the context of the agency) Acquire information from the NC OneMap Inventory –Data Source –Contact Info –Datum, Coordinate System Acquire information from agency web site Avoid direct inquiries to local agencies (“contact fatigue”)
8
NCSU Libraries What Gets Remediated and Why? Key technical elements that are wrong –Datum, coordinate system, format, … Title –Qualify to the agency (e.g. “Streets” becomes “Henderson County Streets”) Keywords –Add ISO keywords –NCSU GIS Lookup terms added later if needed for access These are basic requirements for access and use
9
NCSU Libraries Metadata Tools ArcCatalog –Automated metadata extraction ArcGIS Toolbar –Metadata synchronization, normalization, templating cns and mp –Raw text handling Python classes –Ingest workflow
10
NCSU Libraries Source Metadata Translation Hub-and-spoke model a la Echo DEPository –repository agnostic –modular conversion hub –facilitate repository software migration & inter-archive exchange
11
NCSU Libraries What is the Rights Encoding? Purpose: Define a basic set of codes to hold dataset rights information in a script-actionable form. To assign related text for use in constructing brief rights statements. Propagates to individual data objects Structure: Codes are assigned on a fixed string position basis. Rights assigned to particular user types are grouped after a flag character for that user group. Initial User Groups: –NCSU Faculty/Staff/Students (Code “N”) –General Public (Code “P”) –Library of Congress (Code “L”) Initial Rights Types: –Use –Redistribute –Commercial Use
12
NCSU Libraries Sample Rights Record M01N110P110L110 Interpretation: This dataset was acquired in a mediated transaction directly from the data producer (acquired on media or via arranged download). There is no data agreement but there is a data disclaimer. NCSU, General Public, and LC all can use and redistribute the data but commercial use is not allowed.
13
NCSU Libraries Deferred Activities Implementing METS and PreMIS Developing a serial object metadata scheme
14
NCSU Libraries Ongoing Challenges When to automate and when not to –Learn first from human intervention –Minimizing risk of error related to human intervention Accepting that ingest packages used will evolve over time (implications for archive?) Handling post-ingest migrations
15
NCSU Libraries Engagement Opportunities NCGDAP partner NCCGIA runs the NC OneMap Metadata Outreach Program Provide feedback to spatial data infrastructure about metadata inconsistencies, lack of adherence to best practices Partner with industry and standards organizations on addressing metadata issues such as poor standards support for versioned data (e.g., through OGC Data Preservation Working Group)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.