Download presentation
Presentation is loading. Please wait.
Published byAlice Cuff Modified over 10 years ago
1
Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** * Tilburg Centre for Cognition & Communication The University of Tilburg, NL K.Zervanou@uvt.nl Antal.vdnBosch@uvt.nl ** National Centre for Text Mining The University of Manchester, UK Ioannis.Korkontzelos@manchester.ac.uk Sophia.Ananiadou@manchester.ac.uk
2
ACL/LaTeCH-Portland, June 24th 2011 Research on Metadata Developing standards: – collection specific (e.g. EAD, MARC21) – cross-collection (e.g. Dublin Core) Provide mappings: – across schemas – ontologies (ad hoc or standard CDOC-CRM) Discard metadata for IR (Koolen et al., 2007) Exploit metadata for IR (Zhang&Kamps, 2009)
3
ACL/LaTeCH-Portland, June 24th 2011 The IISH EAD dataset EAD: XML standard for encoding archival descriptions Challenges: – Variety of languages used – Varying type and amount of information – Style: enumerations, lists, incomplete sentences
4
ACL/LaTeCH-Portland, June 24th 2011 Motivation & Objectives Improved search and retrieval – content-based metadata document clustering – content-based/semantic search – support exploratory search – link across collections, metadata formats & institutions – create unified metadata knowledge resources
5
ACL/LaTeCH-Portland, June 24th 2011 Method overview
6
ACL/LaTeCH-Portland, June 24th 2011 Method overview
7
ACL/LaTeCH-Portland, June 24th 2011 Pre-processing EAD/XML element selection & extraction – EAD elements containing free-text & archive content information Language identification (n-gram method) – Identifier trained on Europarl corpus Text snippets length: ~20 tokens
8
ACL/LaTeCH-Portland, June 24th 2011 Snippet length based on language
9
ACL/LaTeCH-Portland, June 24th 2011 Method overview
10
ACL/LaTeCH-Portland, June 24th 2011 Method overview
11
ACL/LaTeCH-Portland, June 24th 2011 Enrichment & Structuring Topic detection: Automatic term recognition using C-value method Agglomerative hierarchical term clustering: – complete, single & average linkage criteria – document co-occurence & lexical similarity measures
12
ACL/LaTeCH-Portland, June 24th 2011 Method overview
13
ACL/LaTeCH-Portland, June 24th 2011 Method overview
14
ACL/LaTeCH-Portland, June 24th 2011 Term results (auto eval)
15
ACL/LaTeCH-Portland, June 24th 2011 Results C-value best performance: candidates that occur as non-nested at least once Average linkage criterion & Doc Co- occurence: provide broader and richer hierarchies
16
Questions? Check-out our poster!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.