Download presentation
Presentation is loading. Please wait.
Published byVanessa Atkins Modified over 9 years ago
1
Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University of Texas at Austin
2
in the portal
4
in the data
6
so what? Collection-level entities and collection descriptions can support a range of functions: –representing data providers –providing context for items –managing and presenting search results –assessing relevance and accessibility –supporting the contribution of collections by users. Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.
7
Approach Based on collection/item propagation rules –link item-level attribute/value pairs to collection- level attribute/values pairs collection attributes from a collection-level schema item attributes from DPLA’s Metadata Application Profile –in general, allow reasoning in either direction –we are experimenting with building descriptions of collections, using: descriptions of items collection membership a guiding propagation rule
8
collection-level properties Collection title Collection description Begin date End date Geographic boundary Places Subjects Formats Languages Genres Rights
9
Approach Take data from the DPLA based on collection membership –e.g. all items in the “Minnesota Newspapers Collection” Pick a target collection-level field –e.g. dc:subject Identify source data fields in item records –e.g. dc:subject and dc:description
10
Approach (con’t) Aggregate item data from across the collection –e.g. all unique subject strings, along with frequency counts Derive collection-level values for the selected attribute –e.g. five subjects for the Minnesota Newspaper Collection. Add attribute/value pair to collection record
11
WARNING: The following presentation contains strong, graphic imagery.
12
Collection-level Metadata Generation Support for: Portal users, Humanities scholars
13
Architecture Aggregated subject values Aggregated date values C IIIIII … Date Deriver Collection Date values Subject Deriver Collection Subject values Collection Description Extract Derive Enrich Aggregated Spatial values Spatial Deriver Collection Subject values Aggregate Populate
14
ArtStor Dates Format Variations
15
Date Processing Begin and end dates imperfect but consistent
16
Parser Factory Inside the Date Deriver Aggregated date values Rule Factory Begin year End year Additional rules D D D D Years with known formats D D D D D D D D D D D Collection Date values D D
17
Subject - Phrases
18
Commonalities and Differences
19
Thresholded Boundaries Variants Ojibwe-Ojibway GLBT-LGBT Hierarchies Labor Unions Minnesota Minneapolis Newspapers Labor Unions Organizing
20
Automatic? Descriptions
22
Inside the Subject Deriver Aggregated subject values Parser Factory Tokenizer Rule Factory Threshold detector Cluster generator Wordnet analyzer Other rules Aggregated title values Aggregated description values Collection Subject values
23
Current Description id: 49b09ce719c5184f166920a1a7c1e8cd Title: Minnesota Newspapers Collection Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public
24
.collectionResource. dateCreated: 3/30/2015 itemCount: 3528 date.begin: 1867 date.end: 2009 subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union] Enhanced Description
25
spatial.boundary : [[153.06667, -27.28333], [- 99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]] formats: newspapers languages: English, Dakota dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”] rights: Enhanced Description
26
Visual Assessment
27
S S S S C C C C DPLA D D D D D D D D D D D D Collection Profiles Support for: DPLA, Hub, Data provider Staff
28
Approach Numeric characterization (for now) ignore semantic assessment Assess consistency enhance automation, computation Assess compliance to MAP (3.1) required, recommended fields Support visual analysis (early stage)
29
Collection Profile DPLA Collection data Administrative data Collection and item details
30
Collection Details
31
Item Details
32
Visual Analysis id: 49b09ce719c5184f166920a1a7c1e8cd Title: Minnesota Newspapers Collection Item titles Item rights
33
Other Fields publisher format coordinatesnames spatial subjects
34
Subjects - Assessment
35
Subjects - Analysis
36
Correlations coordinatesnames
37
coordinatesnames subjects
38
Collection description dashboard Evaluation of developed algorithms and metrics Implications and Ongoing Work
39
Contact Unmil P. Karadkar Karen Wickett Temple Teaching Fellowship, School of Information, UT Austin Acknowledgements Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam
40
Collection description dashboard which features? which fields? Evaluation of developed algorithms and metrics Discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.