ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.

Slides:



Advertisements
Similar presentations
Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,
Advertisements

Machine Translation: Interlingual Methods Thanks to Les Sikos Bonnie J. Dorr, Eduard H. Hovy, Lori S. Levin.
Omega Ontology: Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Noun. Noun - verb noun Noun - verb article- adj. - adj. - Noun - verb.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
TTO3 Semantic Annotation Analysis Bonnie Dorr September 9, 2008.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Tasks Talk: ULA08 Workshop March 18, 2007 A Talk about Tasks Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008.
1 Complex Entity Coreference Builds on existing coreference annotation –Combining ACE and OntoNotes approaches Using ACE types plus “other” Represented.
HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
OntoNotes/PropBank Participants: BBN, Penn, Colorado, USC/ISI.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
Semantic Annotation Evaluation and Utility Bonnie Dorr Saif Mohammad David Yarowsky Keith Hall.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
OntoNotes project Treebank Syntax Training Data Decoders Propositions Verb Senses and verbal ontology links Noun Senses and targeted nominalizations Coreference.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Latin Grammar: Singular and Plural Magister Henderson Latin I.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
27 January 2010 A modality lexicon and its use in automatic tagging Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University
Learning Target: I can analyze text to explain the principle of neutrality stressed in Washington’s Farewell Address.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
OWL Capturing Semantic Information using a Standard Web Ontology Language Aditya Kalyanpur Jennifer Jay Banerjee James Hendler Presented By Rami Al-Ghanmi.
1 Interlingual Annotation of Multilingual Text Corpora (IAMTC) Project Overview for ITIC November 13, 2003 Carnegie Mellon University Lori Levin, Teruko.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Feb 23, Interlingua Annotation of Multilingual Corpora (IAMTC) Project Lori Levin and Teruko Mitamura Language Technologies Institute Carnegie Mellon.
Overview Project Goals –Represent a sentence in a parse tree –Use parses in tree to search another tree containing ontology of project management deliverables.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
MASC The Manually Annotated Sub- Corpus of American English Nancy Ide, Collin Baker, Christiane Fellbaum, Charles Fillmore, Rebecca Passonneau.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
MT with an Interlingua Lori Levin April 13, 2009.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Semantic Annotation for Interlingual Representation of Mulilingual Texts Teruko Mitamura (CMU), Keith Miller (MITRE), Bonnie Dorr (Maryland), David Farwell.
Semantic Annotation & Utility Evaluation Meeting: Feb 14, 2008 Project Organization: Who is here? Agenda Meaning Layers and Applications Ongoing work.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Removing the Language Barrier Machine Translation And Digital Libraries.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
SENSEVAL: Evaluating WSD Systems
Unit 1 Verbals.
Nouns Nouns not noun noun noun not not
Rapidly Retargetable Translingual Detection
--Mengxue Zhang, Qingyang Li
From Linguistic Annotations to Knowledge Objects Bonnie Dorr Saif Mohammad Boyan Onyshkevych 11/14/2008.
WordNet: A Lexical Database for English
SmaRT Visualization of Legal Rules for Compliance
WordNet WordNet, WSD.
Parts of Speech II.
Owen Rambow 6 Minutes.
Presentation transcript:

ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith Miller, Teruko Mitamura, Owen Rambow, Florence Reeder, Advaith Siddharthan CMU, Columbia University, ISI/USC, Mitre, New Mexico State University, University of Maryland

Theory Goal 1: Define a semantic interlingual (IL) representation that can be used for annotation Goal 2: Use IL to semantically annotate a multilingual parallel corpus Basic Premise: definition of IL is informed by comparing multiple languages and multiple English translation per foreign-language text

Annotations: Multi-Layered Representation IL0: Normalized deep-syntactic dependency IL1: IL0 structure + semantic annotations from Omega ontology IL2: Unifies different IL1 for semantically similar sentences; structurally, a forest of dependencies with semantic annotations from Omega ontology, plus coreference ILmore: whatever is unhandled so far

Notation IL0 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.”

Notation IL1 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.”

Notation IL2 Sheikh Mohamed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center.” In progress Coreference Not Shown

Languages Seven Languages –Arabic, French, Hindi, Japanese, Korean, Spanish as source languages; English as a target language Domains and Genres Economic News Total source corpus of about one million words –125 source news articles in each language –Three English professional translation for each article

Annotation Support Resources Built Annotation Manuals –Seven IL0 Manuals (English Completed, Foreign in progress) –One IL1 Manual –IL2 Manual (in progress) Annotation Tools –Created Tiamat for Annotation –Reused TrEd tree editor from Prague as is (thanks!)

Completed Annotations Completed six pairs of English translations (250 words apiece) from each of the source languages for IL1 level Ten annotators were asked to annotate nouns, verbs, adjectives and adverbs only with Omega concepts Annotators selected one or more concepts from both WordNet and Mikrokosmos-derived nodes

Inter-annotator Agreement Annot’rsAgreementKappa MikroKosmos WordNet Theta Roles For 95% completed Annotations

Planned Production Rate Ed, David ? Future Plans Completed first year of a three-year project subject to Renewal

Potential Collaboration Share resources –Tools –Manuals Use a common corpus –Future comparative analysis Discussions –AMTA 2004 IL workshop –Other venues