Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata, Ingest, and Data Feeds

Similar presentations


Presentation on theme: "Metadata, Ingest, and Data Feeds"— Presentation transcript:

1 Metadata, Ingest, and Data Feeds
What we do with your data and why Nicole Lawrence, DLG Mike Kanning, GALILEO GALILEO Users Conference July 12, 2018

2 Your presenters Nicole Lawrence
Project Manager, Digital Library of Georgia Mike Kanning Developer, GALILEO

3 What are we going to talk about?
The DLG data supply chain including: How we gather data What we do with it Batch processing Spatial lookups and other improvements Public websites DPLA harvest process

4 The Data Supply Chain Georgia Portal Civil Rights Digital Library
OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

5 How we gather data

6 The Data Supply Chain: How we gather data
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

7 How we gather data: OAI-PMH harvest

8 How we gather data: Exported data

9 How we gather data: Locally created

10 What we do with your data

11 The Data Supply Chain: What we do with it
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

12 Steps in DLG processing
Normalize 01 System validation Faceting Enhance 02 Missing fields DLG specific fields Map 03 Crosswalk original scheme to DLG Ensure proper field headings and content Convert 04 Native format to active XML

13 Steps in DLG processing: Normalizing

14 Steps in DLG processing: Normalizing

15 Steps in DLG processing: Enhancement

16 Steps in DLG processing: Enhancement

17 Steps in DLG processing: Crosswalk

18 Steps in DLG processing: Data verification

19 Steps in DLG processing: Data verification

20 Steps in DLG processing: Convert

21 Steps in DLG processing: Convert

22 Steps in DLG processing: Convert

23 Steps in DLG processing: Convert

24 Steps in DLG processing: Convert

25 Batch processing

26 The Data Supply Chain: Ingesting
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

27 DLGAdmin’s Batch System
Batch Import Batch Commit Batch Batch Items Items Batch Import The DLGAdmin batch system is used by DLG staff to ingest, improve and validate new records, as well as update existing records. Batches are created as units-of-work and when complete, are “committed” to the public index. Batches serve as an audit trail for records. Batch processing is complex and can take up a lot of system time, so they are queued and worked as background processes.

28 Populating Batches Form XML

29 Populating Batches Search Results

30 A Populated Batch

31 Committing a Batch Commit jobs are submitted to a worker queue and worked one at a time. Status notifications are available via a Slack integration. Completed commits show in the list of batches, and in the event of an error the user is given the opportunity to revise the batch and retry the commit job. Once complete, item records are either created or updated and the new or changed record is added to the search index. The change in live in the public DLG site as soon as this happens. Viewing the item record shows the history of batch items that were used to create or update a given record.

32 Spatial lookups (and other improvements)

33 GeoJSON On import, DLGAdmin generates and indexes a GeoJSON object for each record with spatial metadata. GeoJSON is a standard format used for plotting shapes on maps like seen here. We are hoping to improve this process to introduce higher fidelity for object mapping (get the pin closer to where a photo was actually taken, for example) and support the lookup of coordinates for novel locations.

34 Indexing Dates / 1732/1783 0000/ 1/31/1991 approx. 1934 circa July 1, June 30, 1998 1/21/1999-4/4/2012 5/1986 1776-7 On import DLGAdmin also parses the dc_date field for year values that apply to the record. Ranges are parsed to include all years within the range. A variety of formats commonly found in the metadata are handled. We are working to improve this process to make it easier to return items from a user-provided range (e.g ) and also to increase the fidelity of the indexed date values (e.g index a date down to the individual day/month rather than just year)

35 Public Websites

36 The Data Supply Chain: Public Access
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

37 DLG Public Website: dlg.usg.edu

38 Other Websites: CRDL and AMSO

39 Other Websites: CRDL and AMSO

40 DPLA harvest process

41 The Data Supply Chain: Harvesting
Georgia Portal OAI-PMH Harvest Civil Rights Digital Library DLG Processing DLGadmin Exported Data Civil War in the American South DLG OAI-PMH Data Feed Locally Created Digital Public Library of America EBSCO

42 DLG’s OAI-PMH Feed

43 DPLA Metadata Application Profile

44 DLG in DPLA

45 Questions?


Download ppt "Metadata, Ingest, and Data Feeds"

Similar presentations


Ads by Google