Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.

Similar presentations


Presentation on theme: "ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca."— Presentation transcript:

1 ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith Miller, Teruko Mitamura, Owen Rambow, Florence Reeder, Advaith Siddharthan CMU, Columbia University, ISI/USC, Mitre, New Mexico State University, University of Maryland

2 Theory Goal 1: Define a semantic interlingual (IL) representation that can be used for annotation Goal 2: Use IL to semantically annotate a multilingual parallel corpus Basic Premise: definition of IL is informed by comparing multiple languages and multiple English translation per foreign-language text

3 Annotations: Multi-Layered Representation IL0: Normalized deep-syntactic dependency IL1: IL0 structure + semantic annotations from Omega ontology IL2: Unifies different IL1 for semantically similar sentences; structurally, a forest of dependencies with semantic annotations from Omega ontology, plus coreference ILmore: whatever is unhandled so far

4 Notation IL0 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.”

5 Notation IL1 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.”

6 Notation IL2 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.” In progress Coreference Not Shown

7 Languages Seven Languages –Arabic, French, Hindi, Japanese, Korean, Spanish as source languages; English as a target language Domains and Genres Economic News Total source corpus of about one million words –125 source news articles in each language –Three English professional translation for each article

8 Annotation Support Resources Built Annotation Manuals –Seven IL0 Manuals (English Completed, Foreign in progress) –One IL1 Manual –IL2 Manual (in progress) Annotation Tools –Created Tiamat for Annotation –Reused TrEd tree editor from Prague as is (thanks!)

9 Completed Annotations Completed six pairs of English translations (250 words apiece) from each of the source languages for IL1 level Ten annotators were asked to annotate nouns, verbs, adjectives and adverbs only with Omega concepts Annotators selected one or more concepts from both WordNet and Mikrokosmos-derived nodes

10 Inter-annotator Agreement Annot’rsAgreementKappa MikroKosmos 3.500.7450.743 WordNet 6.080.6600.657 Theta Roles 5.750.5380.509 For 95% completed Annotations

11 Planned Production Rate Ed, David ? Future Plans Completed first year of a three-year project subject to Renewal

12 Potential Collaboration Share resources –Tools –Manuals Use a common corpus –Future comparative analysis Discussions –AMTA 2004 IL workshop –Other venues


Download ppt "ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca."

Similar presentations


Ads by Google