Download presentation
Presentation is loading. Please wait.
Published byArthur Knight Modified over 9 years ago
1
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland Baltimore County
3
What is tagged? Text segmentation, punctuation, special characters Numbers, dates, named entities Morphosyntactic features Parts of speech Syntactic dependencies (full syntactic parses produced) Ontologically-grounded lexical semantics (subsumes word sense selection, all word types) Extra-ontological (parametric) lexical semantics Ontologically-grounded compositional semantics (semantic dependencies, using case roles and about 300 other ontological-semantic relations) Time Space Aspect Modality (including speaker attitudes) Causality Textual co-reference Real-world reference Rhetorical relations
4
Preprocessing editor, syntax browser/editor and TMR browser/ editor windows in DEKADE: tools for interactive editing and post-editing
5
Domains and genres: currently, general news, economics news, travel and meetings, medical texts. New domains require resource augmentation (enhancement of the ontology and the lexicon). Amount done: project just starting, about 2,500 words annotated so far. Speed (no resource augmentation): 100K words in one year with 2 annotators and 50% of a systems support person. Speed (with resource augmentation): add one knowledge engineer Speed (with resource and analyzer improvements): add one software engineer If resource augmentation is undertaken, we estimate that the speed of annotation will double in the second year and will further increase 50% in the third year. Interannotator agreement: 100% because - if the amount and rate of work is as above - each stage of annotation will be done by a single annotator. The automatic component of annotation will always be consistent, which bodes well for the overall consistency of annotation using the HAMA method.
6
Possible Applications “Gold standard” TMRs, annotations produced by the HAMA method with OntoSem and DEKADE, can be used as a training corpus for machine learning research or as interlingual representations for MT purposes. But the OntoSem/DEKADE environment can be used in many more ways. TMRs constitute structured, ontologically-grounded knowledge directly usable by automatic reasoning systems. Production of “gold standard” TMRs using the HAMA method leads to the augmentation of the ontology and the lexicon, thus facilitating performance improvements in the automatic component of annotation work. This means that the TMRs and the automatic part of the process of their production promise to improve the quality of question answering, information extraction, summarization and other advanced NLP and AI applications.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.