Feb 23, 20051 Interlingua Annotation of Multilingual Corpora (IAMTC) Project Lori Levin and Teruko Mitamura Language Technologies Institute Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Machine Translation: Interlingual Methods Thanks to Les Sikos Bonnie J. Dorr, Eduard H. Hovy, Lori S. Levin.
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
CODE/ CODE SWITCHING.
Semantics (Representing Meaning)
APA Style Grammar. Verbs  Use active rather than passive voice, select tense and mood carefully  Poor: The survey was conducted in a controlled setting.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 4.
Omega Ontology: Supporting Annotation Eduard Hovy with Andrew Philpot, Jerry Hobbs, Michael Fleischman, and Patrick Pantel USC/ISI.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Statistical NLP: Lecture 3
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Semantics.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Artificial Intelligence 2005/06 From Syntax to Semantics.
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
LCS and Approximate Interlingua at UMD Semantic Annotation Planning Meeting April 14, 2004 Bonnie J. Dorr University of Maryland.
Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University : Machine Translation.
Linguistic Transference and Interference: Interpreting Between English and ASL Jeffrey Davis Davis, Jeffrey E Linguistic transference and interference:
Stages of Second Language Acquisition
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Nan Connolly Stephanie Lancaster Emily McLoughlin Andrew Shaheen MORPHOLOGY PRESENTATION.
Transitivity / Intransitivity Lecture 7. (IN)TRANSITIVITY is a category of the VERB Verbs which require an OBJECT are called TRANSITIVE verbs. My son.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 6 Verb and verb phrase
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Running Records SUE pALMER 2010
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
1 Interlingual Annotation of Multilingual Text Corpora (IAMTC) Project Overview for ITIC November 13, 2003 Carnegie Mellon University Lori Levin, Teruko.
SIG IL 2000 Evaluation of a Practical Interlingua for Task-Oriented Dialogue Lori Levin, Donna Gates, Alon Lavie, Fabio Pianesi, Dorcas Wallace, Taro Watanabe,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Essay and Report Writing. Learning Outcomes After completing this course, students will be able to: Analyse essay questions effectively. Identify how.
Parallel Syntactic Annotation of Multiple Languages Owen Rambow, Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy,
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
MT with an Interlingua Lori Levin April 13, 2009.
Making Python Pretty!. How to Use This Presentation… Download a copy of this presentation to your ‘Computing’ folder. Follow the code examples, and put.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Rules, Movement, Ambiguity
Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
SENTENTIAL ERRORS IN WRITING
Semantic Annotation for Interlingual Representation of Mulilingual Texts Teruko Mitamura (CMU), Keith Miller (MITRE), Bonnie Dorr (Maryland), David Farwell.
Session Outline Introduction to writing sentences.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Presentation about pragmatic concepts Implicatures Presuppositions
Making it stick together…
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 17 part 2 By Herbert I. Gross and Richard A. Medeiros next.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
ACT Reading & ELA Preparation Color:________. Red Orange Green Blue.
Parts of speech English Grade 9 Kaleena Ortiz PARTS OF SPEECH Noun Pronoun Adjective AdverbVerbPreposition Conjunction Interjection Click here for this.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
In Other Words: a Coursebook on Translation (1992)
Approaches to Machine Translation
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Machine Learning in Practice Lecture 11
Approaches to Machine Translation
Linguistic Essentials
Semantics Going beyond syntax.
Information Retrieval
Presentation transcript:

Feb 23, Interlingua Annotation of Multilingual Corpora (IAMTC) Project Lori Levin and Teruko Mitamura Language Technologies Institute Carnegie Mellon Univeristy

Feb 23, IAMTC project members Collaboration: New Mexico, Maryland, Columbia, MITRE, CMU, ISI Members: Bonnie Dorr (Maryland) David Farwell (NMSU) Rebecca Green (Maryland) Nizar Habash (Columbia) Stephen Helmreich (NMSU) Eduard Hovy (ISI) Lori Levin (CMU) Keith Miller (MITRE) Teruko Mitamura (CMU) Owen Rambow (Columbia) Flo Reeder (MITRE) Advaith Siddharthan (Columbia)

Feb 23, IL-Annotation Outcomes IL design –Three levels of depth: IL0, IL1, and IL2 Annotation methodology –Manuals, tools, evaluations Annotated parallel texts –Foreign language original and multiple English translations –Foreign languages: Arabic, French, Hindi, Japanese, Korean, Spanish

Feb 23, Uniqueness of Annotation Effort Multi-parallel –Three versions of each text Original language and two English translations –Shows multiple surface realizations of the same meaning Multi-lingual –Each text is in at least two languages (English and one other) –The methodology is applied to multi-parallel corpora in six languages. Arabic, French, Hindi, Japanese, Korean, Spanish

Feb 23, Motivation Interlingua designed for MT –Multiple English translations of same source show translation divergences. Some phenomena: Lexical level: word changes Syntactic level: phrasing, thematization, nominalization Semantic level: additional/different content Discourse level: multi-clause structure, anaphor Pragmatic level: Speech Acts, implicatures, style, interpersonal Causes of divergence –Genuine ambiguity/vagueness of source meaning –Translator error/reinterpretation

Feb 23, IL Development: Staged, deepening IL0: –Shows simple dependency structure IL1: –Replace open class lexical items with concept names –Replace grammatical relation labels with semantic role labels IL2: (under development) –Separates shared portions and unresolved portions of divergent sentences

Feb 23, Details of IL0 Deep syntactic dependency representation: –Removes auxiliary verbs, determiners, and some function words –Normalizes passives, clefts, etc. –Removes strongly governed prepositions –Includes syntactic roles (Subj, Obj)

Feb 23, Construction of IL0 Dependency parsers Connexor (English), Tapanainen and Jarvinen, 1997 Kabocha (Japanese) Hand-corrected Extensive manual and instructions on IAMTC Wiki website –for English, Spanish, Japanese, and possibly others

Feb 23, Syntactic Variation Resolved at IL0 Passive The gangster killed at least 3 innocent bystanders. At least 3 innocent bystanders were killed by the gangster. Other transitivity alternations

Feb 23, Example of IL0 TrEd, Pajas, 1998 Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

Feb 23, Example of IL0 Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center” announced V Root Mohamed PN Subj Sheikh PN Mod Defense_Minister PN Mod who Pron Subj also Adv Mod of P Mod UAE PN Obj at P Mod ceremony N Obj inauguration N Mod

Feb 23, Details of IL1 Associate open-class lexical items with Omega Ontology items Replace syntactic relations by one of approx. 20 semantic (theta) roles (from Dorr) e.g., AGENT, THEME, GOAL, INSTR… No treatment of prepositions, quantification, negation, time, modality, idioms, proper names, NP-internal structure… Nodes may receive more than one concept –Average: about 1.2

Feb 23, Construction of IL1 TIAMAT annotation tool Manual for converting IL0 to IL1 is available

Feb 23, Syntactic Variation Resolved at IL1 Lexical Synonymy –The toddler sobbed, and he attempted to console her. –The baby wailed, and he tried to comfort her. Thematic Divergence –Bob enjoys playing with his kids. –Playing with his kids pleases Bob.

Feb 23, Example of IL1 Sheikh Mohammed, who is also the Defense Minister of the United Arab Emirates, announced at the inauguration ceremony that “we want to make Dubai a new trading center”

Feb 23, Example of IL1: internal representation The study led them to ask the Czech government to recapitalize CSA at this level. [3, lead, V, lead, Root, LEAD<GET, GUIDE] [2, study, N, study, AGENT, SURVEY<WORK, REPORT] [4, they, N, they, THEME, ---, ---] [6, ask, V, ask, PROPOSITION, ---, ---] [9, government, N, government, GOAL, AUTHORITIES, GOVERNMENTAL-ORGANIZATION] [8, Czech, Adj, Czech, MOD, CZECH~CZECHOSLOVAKIA, ---] [11, recapitalize, V, recapitalize, PROP, CAPITALIZE<SUPPLY, INVEST] [12, csa, N, csa, THEME, AIRLINE<LINE, ---] [16, at, P, value_at, GOAL, ---, ---] [15, level, N, level, ---, DEGREE, MEASURE] [14, this, Det, this, ---, ---, ---] Semantic Roles Concepts from the Omega Ontology

Feb 23, Tiamat: annotation interface For each new sentence: For each word to be annotated (shown with dependents)

Feb 23, Tiamat: annotation interface For each new sentence: Candidate concepts Step 1: find Omega concepts for objects and events

Feb 23, Tiamat: annotation interface (note: similarity to PDT annotation interface) For each new sentence: Candidate concepts Step 1: find Omega concepts for objects and events Step 2: select event frame (theta roles)

Feb 23, Details of IL2 Start capturing meaning: –Handle proper names: one of around 5 classes ( PERSON, LOCATION, TIME, ORGANIZATION… ) –Conversives (buy vs. sell) at the FrameNet level –Non-literal language usage (open the door to customers vs. start doing business) –Extended paraphrases involving syntax, lexicon, grammatical features –Possible incorporation of other ‘standardized’ notations for temporal and spatial expressions Still excluded: –Quantification and negation –Discourse structure –Pragmatics

Feb 23, Variation Resolved at IL2 Morphological Derivation –I was surprised that he destroyed the old house. –I was surprised by his destruction of the old house. Differences in clause subordination –This is Joe’s new car, which he bought in New York. –This is Joe’s new car. He bought it in New York. N-N Compounds –She loves velvet dresses. –She loves dresses made of velvet.

Feb 23, IL2 (continued) Head Switching –Mike Mussina excels at pitching. –Mike Mussina pitches well. –Mike Mussina is a good pitcher. Lexical Conflation –Lindbergh flew across the Atlantic Ocean. –Lindbergh crossed the Atlantic Ocean by plane.

Feb 23, Not normalized Comparitives vs. Superlatives –He’s smarter than everybody else. –He’s the smartest one. Different Sentence Types –Who composed the Brandenburg Concertos? –Tell me who composed the Brandenburg Concertos. Inverse Relationship –Only 20% of the participants arrived on time. –80% of the participants were late. Inference –The Porto player kicked the ball into the net. –The Porto player scored a goal. Viewpoint Variation –Stop getting in the way. –Stop trying to help.

Feb 23, Note from Lori In my version of Powerpoint the color blocks on the next slide don’t line up with the text correctly. I didn’t have time to fix it, so I inserted the other version of the same slide. If you have time to fix the color box version, then you can delete the two slides after that. Otherwise, you can delete the color box version.

Feb 23, Theoretical goal: Getting at meaning Semantically identical K1E1: Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or KTF. … The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with services like voice quality. K1E2: Starting January 1st of next year, customers of SK Telecom can change their service company to LG Telecom or KTF … Once a service company swap has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced. Semantically equivalent Additional/less information Semantically different: Different information

Feb 23, Getting at Meaning (Two translations of Korean original text) Starting on January 1 of next year, SK Telecom subscribers can switch to less expensive LG Telecom or KTF. … The Subscribers cannot switch again to another provider for the first 3 months, but they can cancel the switch in 14 days if they are not satisfied with services like voice quality. Starting January 1st of next year customers of SK Telecom can change their service company to LG Telecom or KTF … Once a service company swap has been made, customers are not allowed to change companies again within the first three months, although they can cancel the change anytime within 14 days if problems such as poor call quality are experienced.

Feb 23, Color Key Black: same meaning and same expression Green: small syntactic difference Blue: Lexical difference Red: Not contained in the other text Purple: Larger difference. –Need to use some inference to know that the meaning is the same

Feb 23, Getting at meaning (Two translations of a Japanese original text) This year, too, in addition to the birth of Mitsubishi Chemical, which has already been announced, other rather large-scale mergers may continue, and be recorded as a "year of mergers." This year, which has already seen the announcement of the birth of Mitsubishi Chemical Corporation as well as the continuous numbers of big mergers, may too be recorded as the “year of the merger” for all we know. More lexical similarity. More differences in dependency relations.

Feb 23, Common Aspects of Meaning This year, too, in addition to the birth of Mitsubishi Chemical, which has already been announced, other rather large-scale mergers may continue, and be recorded as a "year of mergers.“ This year, which has already seen the announcement of the birth of Mitsubishi Chemical Corporation as well as the continuous numbers of big mergers, may too be recorded as the “year of the merger” for all we know. Big mergers continue this year Mergers continue in addition to the birth of Mitsubishi Chemical Birth of Mitsubishi Chemical Someone announces the birth of Mitsubishi Chemical Someone records this year as the year of the merger

Feb 23, Divergences that can be resolved This year, too, in addition to the birth of Mitsubishi Chemical, which has already been announced, other rather large-scale mergers may continue, and be recorded as a "year of mergers.“ This year, which has already seen the announcement of the birth of Mitsubishi Chemical Corporation as well as the continuous numbers of big mergers, may too be recorded as the “year of the merger” for all we know. Mergers are big Someone announces the birth of Mitsubishi Chemical Someone records something as the year of the merger

Feb 23, Benefits for Other Projects MT Question Answering Summarization Information Retrieval Information Extraction Text Mining Etc.

Feb 23, Approaches to Evaluation Inter-annotator agreement — completed Sentence generation from extracted annotation structure Comparison of interlingual structures (graph comparisons) Ontology growth (or shrinkage) rate (per unit of text) –Competing goals: Addressing coverage gaps (1/3 of open class words marked as having no concept) Omega seems too rich: Hard to distinguish between senses; Granularity of concept selection

Feb 23, Inter-annotator Agreement Is the IL sufficiently defined to permit consistent annotation? –Ontology –Theta-roles –Coverage and precision

Feb 23, Evaluation webpage

Feb 23, Inter-annotator agreement Difficulty is that more than one sense can be selected for a given annotation –Standard kappa does not apply in this case Two alternatives for calculating expected probability of agreement: –Agreement and kappa for positive senses –Agreement and kappa for all senses Both were explored –Positive sense agreement, kappa shown here

Feb 23, Positive agreement annotations Construct a table for each word: –For each annotator and each sense whether or not that sense was selected by that annotator Calculate agreement = Calculate kappa using Monte Carlo simulation of P(E)

Feb 23, Evaluation results – positive examples Annotators who finished 95% of their annotations Annotators who finished 90% of their annotations Annotators who finished 50% of their annotations All annotators A#APAKappaA#APAKappaA#APAKappaA#APAKappa Mikro- kosmos Word- Net Theta Roles

Feb 23, All cases count Count 0,0 and 1,1 agreements – T 00, T 11 Count 0,1 and 1,0 disagreements – T 10, T 01 Count number of 0 & 1 for annotators 1 & 2 - A 01, A 11 ; A 02, A 12 Divide all counts by number senses Agreement = T 00 + T 11 Kappa = 2 * ((T 00 * T 11 ) – (T 10 * T 01 )) / ((A 01 * A 12 ) + (A 02 * A 11 )) [marginal prob.]

Feb 23, All Cases Agreement / Kappa Zero-Pairs All cases Exclude zero- pairs AgreeKappaAgreeKappa Theta Roles WordNet Mikrokosmos

Feb 23, Annotation Issues 1.Post-annotation consistency checking –Novice annotators may make inconsistent annotations within the same text. –Intra-annotator consistency checking procedure e.g. If two nodes in different sentences are co-indexed, then annotators must ensure that the two nodes carry the same meaning in the context of the two different sentences 2.Post-annotation reconciliation

Feb 23, Post-annotation reconciliation Question: How much can annotators be brought into agreement? Procedure: –Annotator sees all annotations, votes Yes/Maybe/No on each –Annotators then discuss all differences (telephone conf) –Annotators then vote again, independently –We collapse all Yes and Maybe votes, compare them with No to identify all serious disagreement

Feb 23, Results of Reconciliation Annotators derive common methodology Small errors and oversights removed during discussion Inter-annotator agreement improved Serious problems of interpretation or error identified

Feb 23, Annotation across Translations Question: How different are the translations? Procedure: –Annotator sees annotations across both translations, identifies differences of form and meaning –Annotator selects ‘true’ meaning(s) Results (work still in progress): –Impacts ontology richness/conciseness –Improvement in Interlingua representation ‘depth’ –Useful for IL2 design development Observations: –This is very hard work –Methodology unclear: what is seen first, how to show alternatives, what to do with results…

Feb 23, Outcomes—how have we done? IL design –IL0 and IL1 finished –IL2 in the works Annotation methodology –Manuals for IL0 in at least three languages –Manual for converting IL0 to IL1 –Annotation tools for IL0 and IL1 –Evaluation of inter-coder agreement –Procedure for annotator reconciliation Around 144 annotated parallel texts in IL0 and IL1 –Six texts from six different source languages –Two English translations of each text –10-12 annotators for each text

Feb 23, Next Steps Foreign language annotation standards and tools Development of IL2 Addressing coverage gaps (1/3 of open class words marked as having no concept)

Feb 23, Contact information URLs and Wiki pages: –Project website: –PIs: –Annotators: Annotator/IAMTC-Annotator.wiki Text Annotation: anyone interested to try??? –Download the tools –Download the texts –Have fun (if you’re so inclined!)…

Feb 23, Extra Slides

Feb 23, IAMTC Tasks Interlingua Content Development –Three level design: IL0, IL1, IL2 (and possibly more…) –Linguistic/semantic divergences Noun-noun compound Thematic roles Named entities and Time expressions Conjunctions Ontology reduction Tool Development Evaluation Methodology Annotation of 7 languages