Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University.

Similar presentations


Presentation on theme: "Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University."— Presentation transcript:

1 Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University of Texas at Austin

2 in the portal

3

4 in the data

5

6 so what? Collection-level entities and collection descriptions can support a range of functions: –representing data providers –providing context for items –managing and presenting search results –assessing relevance and accessibility –supporting the contribution of collections by users. Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.

7 Approach Based on collection/item propagation rules –link item-level attribute/value pairs to collection- level attribute/values pairs collection attributes from a collection-level schema item attributes from DPLA’s Metadata Application Profile –in general, allow reasoning in either direction –we are experimenting with building descriptions of collections, using: descriptions of items collection membership a guiding propagation rule

8 collection-level properties Collection title Collection description Begin date End date Geographic boundary Places Subjects Formats Languages Genres Rights

9 Approach Take data from the DPLA based on collection membership –e.g. all items in the “Minnesota Newspapers Collection” Pick a target collection-level field –e.g. dc:subject Identify source data fields in item records –e.g. dc:subject and dc:description

10 Approach (con’t) Aggregate item data from across the collection –e.g. all unique subject strings, along with frequency counts Derive collection-level values for the selected attribute –e.g. five subjects for the Minnesota Newspaper Collection. Add attribute/value pair to collection record

11 WARNING: The following presentation contains strong, graphic imagery.

12 Collection-level Metadata Generation Support for: Portal users, Humanities scholars

13 Architecture Aggregated subject values Aggregated date values C IIIIII … Date Deriver Collection Date values Subject Deriver Collection Subject values Collection Description Extract Derive Enrich Aggregated Spatial values Spatial Deriver Collection Subject values Aggregate Populate

14 ArtStor Dates Format Variations

15 Date Processing Begin and end dates imperfect but consistent

16 Parser Factory Inside the Date Deriver Aggregated date values Rule Factory Begin year End year Additional rules D D D D Years with known formats D D D D D D D D D D D Collection Date values D D

17 Subject - Phrases

18 Commonalities and Differences

19 Thresholded Boundaries Variants Ojibwe-Ojibway GLBT-LGBT Hierarchies Labor Unions Minnesota Minneapolis Newspapers Labor Unions Organizing

20 Automatic? Descriptions

21

22 Inside the Subject Deriver Aggregated subject values Parser Factory Tokenizer Rule Factory Threshold detector Cluster generator Wordnet analyzer Other rules Aggregated title values Aggregated description values Collection Subject values

23 Current Description id: 49b09ce719c5184f166920a1a7c1e8cd Title: Minnesota Newspapers Collection Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public

24 .collectionResource. dateCreated: 3/30/2015 itemCount: 3528 date.begin: 1867 date.end: 2009 subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union] Enhanced Description

25 spatial.boundary : [[153.06667, -27.28333], [- 99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]] formats: newspapers languages: English, Dakota dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”] rights: Enhanced Description

26 Visual Assessment

27 S S S S C C C C DPLA D D D D D D D D D D D D Collection Profiles Support for: DPLA, Hub, Data provider Staff

28 Approach Numeric characterization (for now) ignore semantic assessment Assess consistency enhance automation, computation Assess compliance to MAP (3.1) required, recommended fields Support visual analysis (early stage)

29 Collection Profile DPLA Collection data Administrative data Collection and item details

30 Collection Details

31 Item Details

32 Visual Analysis id: 49b09ce719c5184f166920a1a7c1e8cd Title: Minnesota Newspapers Collection Item titles Item rights

33 Other Fields publisher format coordinatesnames spatial subjects

34 Subjects - Assessment

35 Subjects - Analysis

36 Correlations coordinatesnames

37 coordinatesnames subjects

38 Collection description dashboard Evaluation of developed algorithms and metrics Implications and Ongoing Work

39 Contact Unmil P. Karadkar Karen Wickett Temple Teaching Fellowship, School of Information, UT Austin Acknowledgements Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam

40 Collection description dashboard which features? which fields? Evaluation of developed algorithms and metrics Discussion


Download ppt "Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University."

Similar presentations


Ads by Google