27 January 2010 A modality lexicon and its use in automatic tagging Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Semantics (Representing Meaning)
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Statistical NLP: Lecture 3
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
The Nature of Learner Language
Annotating Committed Belief Mona Diab, Lori Levin, Teruko Mitamura, Owen Rambow CMU/Columbia-CCLS
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Annotating Modality Marjorie McShane and Sergei Nirenburg UMBC An analyst will benefit from being able to distinguish what Al-Qaeda can/might/is trying.
The Subjunctive Mood: Day 3 January 28 th, 2015 January 23 rd, 2015.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
What makes communication by language possible? Striking fact (a) If someone utters a sentence and you know which proposition her utterance expresses, then.
What makes communication by language possible? Striking fact (a) If someone utters a sentence and you know which proposition her utterance expresses, then.
Embedded Clauses in TAG
Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.
Mood and Modality Rajat Kumar Mohanty rkm[AT]cse[DOT]iitb[DOT]ac[DOT]in Centre for Indian Language Technology Department of Computer Science and Engineering.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Some Advances in Transformation-Based Part of Speech Tagging
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Invitation to Computer Science, Java Version, Second Edition.
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
1 Boolean Expressions to Make Comparisons Boolean expression –Represents only one of two states –Expression evaluates to either true or false Expressions.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
MT with an Interlingua Lori Levin April 13, 2009.
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Supertagging CMSC Natural Language Processing January 31, 2006.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Natural Language Processing Vasile Rus
Approaches to Machine Translation
Statistical NLP: Lecture 3
Semantics (Representing Meaning)
Approaches to Machine Translation
Linguistic Essentials
Boolean Expressions to Make Comparisons
Presentation transcript:

27 January 2010 A modality lexicon and its use in automatic tagging Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathanial W. Filardo, Lori Levin, Christine Piatko May 20, 2010 Presented by Lori Levin Language Technologies Institute Carnegie Mellon University

Context SCALE 2009  Summer Camp in Applied Language Engineering  Johns Hopkins University Human Language Technology Center of Excellence SIMT  Semantically informed MT  Can we improve statistical MT with semantic knowledge?  experiments with modality and named entities

Modality Tagger Output Example 1: Input: Americans should know that we can not hand over Dr. Khan to them. Output: Americans that we over Dr. Khan to them Example 2: Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections Output: He to general elections in the year 2002, but he ignorant of the fact that the world at large did these Trigger: lexical item that carries a modal meaning. Target: head of the proposition that it scopes over Holder: the experiencer or cognizer of the modality.

Outline A modality annotation scheme A modality lexicon A string based modality tagger A tree based modality tagger Evaluation of the taggers Semantically informed MT

Core Cases of Modality NecessityPossibility EpistemicJohn must have arrived John may have arrived Deontic/Situation al John has to leave now You may leave now. One can get to Staten Island using a ferry. (van der Auwera and Amman, World Atlas of Language Structures)

Related Concepts: Factivity Did the proposition happen or not?  John went to New York.  John may go to New York.  If John goes to New York, he will visit MOMA.  John bought a ticket to go to NY. FactBank: Saurí and Pustejovsky

Related Concepts: Evidentiality Source of information  First hand experience or hearsay  They say that John went to NY.  Sensory information  I heard that John went to NY.  Conclusion from evidence  I don’t see John, so he must have gone to NY.

Other Related Concepts Speaker attitude and sentiment Conditionality Hypotheticality Realis and Irrealis mood Tense, aspect, etc.

Modality Example 1: Input: Americans should know that we can not hand over Dr. Khan to them. Output: Americans that we over Dr. Khan to them Example 2: Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections Output: He to general elections in the year 2002, but he ignorant of the fact that the world at large did these

Modality Annotation and Tagging Annotation: Humans add labels to text, following instructions from a coding manual that defines an annotation scheme. Tagging: A program automatically assigns labels Goals:  Design an annotation scheme that can be followed with high intercoder agreement and low annotation time and cost  Train a tagger on human annotated data  Build a tagger based on the annotation scheme

The inventory of modalities in the annotation scheme Belief: with what strength does H believe P? Requirement: does H require P? Permissive: does H allow P? Intention: does H intend P? Effort: does H try to do P? Ability: can H do P? Success: does H succeed in P? Want: does H want P? Joint work with Sergei Nirenburg, Marge McShane, Teruko Mitamura, Owen Rambow, Mona Diab, Eduard Hovy, Bonnie Dorr, Christine Piatko, Michael Bloodgood H = Holder (experiencer or cognizer) P = Proposition

The Annotation Scheme Identify a modality target P and then choose one of these modalities (choose the first one that applies)  H requires [P to be true/false]  H permits [P to be true/false]  H succeeds in [making P true/false]  H does not succeed in [making P true/false]  H is trying [to make P true/false]  H is not trying [to make P true/false]  H intends [to make P true/false]  H does not intend [to make P true/false]  H is able [to make P true/false]  H is not able [to make P true/false]  H wants [P to be true/false]  H firmly believes [P is true/false]  H believes [P may be true/false]

Six Simplifications Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe Annotators were not asked to mark the holder.

Simplifications Transparency to negation Some modalities have negatives in the annotation scheme: not intend, not try, not be able, not succeed Believe and want do not have negatives in our annotation scheme because of the similarity of  I don’t want him to go/I want him not to go.  Both are coded as H wants P to be false  I don’t believe he will go/I believe he will not go.  Both are coded as H believes P to be false.

Simplifications Duality of require and permit Require and permit do not have negations in the annotation scheme because  Not require P to be true means Permit P to be false  Not permit P to be true means Require P to be false

Simplifications Ordering for entailment John managed to go to NY.  What modality is this? Success? Intent? Effort? Desire? Ability? Two entailment groupings ordered with respect to each other: 1. {requires  permits} 2. {succeeds  tries  intends  is able  wants} Both apply before “believe”, which is not in an entailment relation with either grouping. The annotators are instructed to choose the first modality in the list that applies.

Simplifications No embedding of modalities He might be able to swim  Only ability is tagged Modals are never considered as targets of other modals in the annotation process

Six Simplifications Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities.  Default is Firmly Believe Annotators were not asked to mark the holder.

Six Simplifications Transparency to negation Duality of require and permit Ordering for entailment Annotators were not asked to nest modalities. Default is Firmly Believe  Annotators were not asked to mark the holder.

English Modality Lexicon Modality trigger words  might, should, require, permit, need, try, possible, fail, etc. About 150 lemmas  plus five forms for each verb where applicable  bare infinitive, present tense –s, past tense, past participle, present participle

English Modality Lexicon Example need  Pos: VB  Modality: Require  Trigger word: Need  Subcategorization codes  V3-passive-basic Large helicopters are needed to dispatch urgent relief materials.  V3-I3-basic The government will need to work continuously for at least a year. We will need them to work continuously.  T1-monotransitive-for-V3-verbs We need a Sir Sayyed again to maintain this sentiment.  T1-passive-for-V3-verb He is needed to work continuously.  modal-auxiliary-basic He need not go.

Modality Example 1: Input: Americans should know that we can not hand over Dr. Khan to them. Output: Americans that we over Dr. Khan to them Example 2: Input: He managed to hold general elections in the year 2002, but he can not be ignorant of the fact that the world at large did not accept these elections Output: He to general elections in the year 2002, but he ignorant of the fact that the world at large did these

String Based English Modality Tagger Input  Text that has been tagged with parts of speech. Mark Triggers  Mark spans of words that are exact matches to entries in the modality lexicon and that have the same part of speech. Mark Targets  Next non-auxiliary verb to the right of a trigger Spans of words can be marked multiple times with different triggers and targets.

Americans NNPS S NP VP should MD know VB that S NP we PRP VP can MD not RB hand VB over NP Dr NNP Khan NNP PP to them Modality Tagging VB VP MD should Template Used T-Surgeon (Stanford NLP tools) to find trees that match templates and mark modality triggers and targets. Target Trigger The Structure-Based English modality Tagger

The Structure-Based English Modality Tagger S NP Americans NNPS VP-require MD-TrigRequire VB-TargRequire should know that S NP we PRP VP-NOTAble MD-TrigAble can RB-TrigNegation not VB-TargNOTAble hand VB over NP Dr NNP Khan NNP PP to them 1.T-surgeon 2.Percolation

What was covered 15 subcategorization patterns 150 lemmas Expressions of modality with lexical triggers

What wasn’t covered Non-lexical modality  Imperatives  Other constructions  It will be a long time/a cold day in hell before… Targets in coordinate structures  To do next Word sense disambiguation  Can, must: deontic or epistemic  Manage: manage to do something vs manage a project Transitivity alternations: alternate mappings between grammatical relations and semantic roles  The plan succeeded  The government succeeded in its plan.  The government succeeded ????

Evaluation: agreement between string-based and structure-based taggers Calculated Kappa on the basis of sentences  from the English side of the Urdu-English corpus for MTEval 2009 Example:  TargPermit (John is allowed to to NY)  585 Matching Both taggers  163 Matching just structure-based tagger  194 Matching just string-based tagger  No match either tagger Triggers: Kappa =.82 Targets: Kappa =.76

Evaluation: Structure Based Tagger  Recall: not feasible to look for all expressions of modality that we didn’t tag.  No gold-standard annotated corpus.  Precision:  249 sentences that were tagged with triggers and targets  From the English side of the MTEval 2009 training sentences  86.3% correct But ranges from about 82% to about 92% depending on genre

Precision: Errors Light verb or noun is correct syntactic target but not the correct semantic target.  Earthquake affected areas in Pakistan will be provided the required number of tents and blankets by November 15.  The decision should be taken on delayed cases on the basis of merit. Wrong word sense  In Bayas, Sikhs attacked a train under cover of night and killed everyone.  The process of provision of relief goods to needy people should be managed by the Army and the Edhi Trust.  Should be allowed to work like this in the future.  Like: succeed in something

Precision: Errors Wrong subcategorization pattern.  The officials should consider themselves as servants of the people. Coordinate Structures  Many large helicopters are needed to dispatch urgent relief materials to the many affected in far flung areas of the Neelam Valley and only America can help us in this regard.

Recall: what did we miss? Special forms of negation  There was no place to seek shelter.  The buildings should be reconstructed, not with the RCC, but with the wood and steel sheets. Constructional and phrasal triggers  President Pervaiz Musharraf has said that he will not rest unless the process of rehabilitation is completed. Random lexical omissions  It is not possible in the middle of winter to re-open the roads.

SIMT Semantically Informed MT S NP Americans NNPS VP-require MD-TrigRequire VB-TargRequire should know that S NP we PRP VP-NOTAble MD-TrigAble can RB-TrigNegation not VB-TargNOTAble hand VB over NP Dr NNP Khan NNP PP to them 1.T-surgeon 2.Percolation

Integration of the modality tagger with Syntax Based SMT Joshua Syntax Based SMT system  Callison-Burch Tag modalities on the English side of the training data. Without modality tags: BLUE 26.4 With modality tags: BLUE 26.7

Advantages of SIMT Good for translation between a less commonly taught language and a common language  Modality can be analyzed on the common language and projected via word alignments to the LCTL Depth of semantic analysis Robustness of statistical approach

Summary Modality annotation scheme Modality lexicon Automatic modality tagger An method for integrating semantics into SMT  Good for translation between LCTLs and common languages

Future work Improvements to the tagger  Add patterns for constructions without simple lexical triggers.  Word sense disambiguation (manage, attack, etc.)  Semantic composition of multiple modalities and negation.  Tagging of holders Applications of the tagger  Further experiments with SIMT  Integration into tagger for Committed Belief (factivity)

END