Download presentation
Presentation is loading. Please wait.
1
Metadata, Ingest, and Data Feeds
What we do with your data and why Nicole Lawrence, DLG Mike Kanning, GALILEO GALILEO Users Conference July 12, 2018
2
Your presenters Nicole Lawrence
Project Manager, Digital Library of Georgia Mike Kanning Developer, GALILEO
3
What are we going to talk about?
The DLG data supply chain including: How we gather data What we do with it Batch processing Spatial lookups and other improvements Public websites DPLA harvest process
4
The Data Supply Chain Georgia Portal Civil Rights Digital Library
OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
5
How we gather data
6
The Data Supply Chain: How we gather data
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
7
How we gather data: OAI-PMH harvest
8
How we gather data: Exported data
9
How we gather data: Locally created
10
What we do with your data
11
The Data Supply Chain: What we do with it
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
12
Steps in DLG processing
Normalize 01 System validation Faceting Enhance 02 Missing fields DLG specific fields Map 03 Crosswalk original scheme to DLG Ensure proper field headings and content Convert 04 Native format to active XML
13
Steps in DLG processing: Normalizing
14
Steps in DLG processing: Normalizing
15
Steps in DLG processing: Enhancement
16
Steps in DLG processing: Enhancement
17
Steps in DLG processing: Crosswalk
18
Steps in DLG processing: Data verification
19
Steps in DLG processing: Data verification
20
Steps in DLG processing: Convert
21
Steps in DLG processing: Convert
22
Steps in DLG processing: Convert
23
Steps in DLG processing: Convert
24
Steps in DLG processing: Convert
25
Batch processing
26
The Data Supply Chain: Ingesting
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
27
DLGAdmin’s Batch System
Batch Import Batch Commit Batch Batch Items Items Batch Import The DLGAdmin batch system is used by DLG staff to ingest, improve and validate new records, as well as update existing records. Batches are created as units-of-work and when complete, are “committed” to the public index. Batches serve as an audit trail for records. Batch processing is complex and can take up a lot of system time, so they are queued and worked as background processes.
28
Populating Batches Form XML
29
Populating Batches Search Results
30
A Populated Batch
31
Committing a Batch Commit jobs are submitted to a worker queue and worked one at a time. Status notifications are available via a Slack integration. Completed commits show in the list of batches, and in the event of an error the user is given the opportunity to revise the batch and retry the commit job. Once complete, item records are either created or updated and the new or changed record is added to the search index. The change in live in the public DLG site as soon as this happens. Viewing the item record shows the history of batch items that were used to create or update a given record.
32
Spatial lookups (and other improvements)
33
GeoJSON On import, DLGAdmin generates and indexes a GeoJSON object for each record with spatial metadata. GeoJSON is a standard format used for plotting shapes on maps like seen here. We are hoping to improve this process to introduce higher fidelity for object mapping (get the pin closer to where a photo was actually taken, for example) and support the lookup of coordinates for novel locations.
34
Indexing Dates / 1732/1783 0000/ 1/31/1991 approx. 1934 circa July 1, June 30, 1998 1/21/1999-4/4/2012 5/1986 1776-7 On import DLGAdmin also parses the dc_date field for year values that apply to the record. Ranges are parsed to include all years within the range. A variety of formats commonly found in the metadata are handled. We are working to improve this process to make it easier to return items from a user-provided range (e.g ) and also to increase the fidelity of the indexed date values (e.g index a date down to the individual day/month rather than just year)
35
Public Websites
36
The Data Supply Chain: Public Access
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
37
DLG Public Website: dlg.usg.edu
38
Other Websites: CRDL and AMSO
39
Other Websites: CRDL and AMSO
40
DPLA harvest process
41
The Data Supply Chain: Harvesting
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO
42
DLG’s OAI-PMH Feed
43
DPLA Metadata Application Profile
44
DLG in DPLA
45
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.