HIVE as a Machine-aided Indexing Tool Personal Keyword use without vocabulary control Machine-aided indexing term extraction Participant relevant and not relevant judgments Inter-indexing consistency Rolling’s Measure Hooper’s Measure
Organizing Scientific Data Sets
HIVE/Dryad Evaluation Questions – Given Dryad article metadata (title, abstract, depositor-supplied keywords), what are the best approaches for term suggestion from selected controlled vocabularies (MeSH, ITIS, TGN)? – Can one approach be used for subject, taxonomic and geographic indexing? Method – Create “gold standard” of manually index records based on mapping of Dryad, MEDLINE and BIOSIS Previews to MeSH, TGN, ITIS – Evaluate state-of-the-art techniques for automatic subject, and taxonomic, and geographic indexing Preliminary results – For taxonomic name indexing, untrained KEA++ performs almost as well as state-of-the-art taxonomic name extraction (FindIt) – For geographic name indexing with TGN, simple graph-based ranking algorithm outperforms KEA++. Craig Willis, Hollie White, Lee Richardson, Casey Rawson Jane Greenberg, Bob Losee, Ryan Scherle, Todd Vision
Thesaurus Walking: Automatic Indexing with Controlled Vocabularies Questions – Starting from the location of terms in a document and moving to the indexer assigned controlled terms, how do indexers navigate in a thesaurus? – How can this knowledge be used to improve techniques for automatic indexing with controlled vocabularies? – How can this knowledge be used to improve thesauri? Methodology – Unsupervised, graph-based approach using random walks on thesauri Preliminary results – Indexer assigned controlled terms are identified at a rate much higher than random, but far from perfect. – Suggests that this method could best be used in combination with other dissimilar automatic indexing methods. Craig Willis, Bob Losee, Jane Greenberg