Download presentation
Presentation is loading. Please wait.
1
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie Mellon University Bonnie Dorr, Rebecca Green Institute for Advanced Computer Studies/University of Md. Eduard Hovy Information Sciences Institute/University of S. California Keith Miller, Florence Reeder MITRE Corporation Owen Rambow, Nizar Habash Columbia University Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD
2
What we annotate multiple comparable bilingual text corpora parallel text corpora multiple translations of texts Genre - newspaper texts / DARPA corpus Goals common representation (interlingua) common methodology and tools observe and catalogue different surface realizations of the same meaning across and within languages Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD
5
Annotation Process Text is syntactically parsed (Connexor / IL0) Reviewed and corrected (TrEd) Annotation to IL1 (Tiamat) Content words annotated for sense (Omega) Arguments annotated for thematic role (LCS) 2 English translations of 6 articles Arabic, French, Hindi, Japanese, Korean, Spanish 12 annotators, 2 at each site Total: 144 annotated texts to IL1 level Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD
6
Results: Agreement & Time Tools (Tiamat) Manuals (IL0 for 7 languages, IL1) Inter-annotator agreement: kappa =.83 (mK),.66 (wn),.59 (theta-roles) Annotation time: 4 hours/annotator/ text, 250 words/text, 2 annotators/text = approx. 2 person years for 100K at IL1 Next step: merge IL1 representations and develop transformation algorithms to produce IL2 Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.