Automatic Hedge Detection

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Syntax n All native speakers of a language are able to produce and comprehend an unlimited number of sentences.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Part of speech (POS) tagging
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali and Yang Liu {nisa,
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Discovery of Manner Relations and their Applicability to Question Answering Roxana Girju 1,2, Manju Putcha 1, and Dan Moldovan 1 University of Texas at.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Hedge Detection with Latent Features SU Qi CLSW2013, Zhengzhou, Henan May 12, 2013.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Natural Language Processing Vasile Rus
Requirements Specification
Language Identification and Part-of-Speech Tagging
If: expressing different scenarios through language
First Grade English High Frequency Words
Measuring Monolinguality
Presenter: Jia-Kuan Lin Advisor: Chung-Hsien Wu
Coarse-grained Word Sense Disambiguation
Conditional Random Fields for ASR
Uses reported speech accurately Activity 06
Telegraphic speech: two- and three-word utterances
Yoav Goldberg and Michael Elhadad
A Statistical Model for Parsing Czech
Lecture 21 Computational Lexical Semantics
THE NATURE of LEARNER LANGUAGE
12B reported (or indirect) speech
Category-Based Pseudowords
Statistical NLP: Lecture 9
Probabilistic and Lexicalized Parsing
Toward Better Understanding
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Clauses.
PLANS and INTENTIONS GOING TO + INFINITIVE General plans, arrangements and intentions about the future. I’m going to visit New York in summer. I’m going.
CS246: Information Retrieval
Phil 148 Chapter 2A.
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
The Nature Of Learner Language
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Extracting Why Text Segment from Web Based on Grammar-gram
Statistical NLP : Lecture 9 Word Sense Disambiguation
Stance Classification of Ideological Debates
Presentation transcript:

Automatic Hedge Detection Morgan Ulinski and Julia Hirschberg May 8, 2017

What is Hedging? Hedge: a word or phrase that adds ambiguity or uncertainty; shows the speaker’s lack of commitment to what they are saying Eg. I think John will arrive tomorrow Eg. John may arrive tomorrow. Eg. John will arrive tomorrow sort of early.

Hedge Types Relational: words that show uncertainty in the relation between speaker and utterance Eg. think, may, probably, in my opinion Propositional: words that add uncertainty to propositional content of utterance Eg. some, frequently, kind of, sort of, among others Propositional: degree of quantity, frequency,

Detection of NonCommitted vs Committed Belief “Aaron may know/NCB that Bill had/CB an accident, and Chris told/CB me Doris knows /ROB. I hope/CB Bill gets/NA better soon!” Detect hedge word in sentence “Aaron <hRel>may</hRel> know that Bill had an accident, and Chris told me Doris knows. I hope Bill gets better soon!” Goal: Improve CB tagger by adding hedge features

Hedge Features Word features: Dependency features: Sentence features: HedgeToken, HedgeLemma, HedgeType (prop/rel) Dependency features: HedgeToken{Child,Parent,DepAncestor,Sibling} HedgeLemma{Child,Parent,DepAncestor,Sibling} HedgeType{Child,Parent,DepAncestor,Sibling} Sentence features: SentenceContainsHedge

Hedge Detection Dictionary-based: simple lookup in list of hedge words Rule-based: Use rules based on context, part-of- speech, and dependency structure "I assume his train was late” (hedge) vs. "When will the President assume office?” (non-hedge) Rule: If assume has dependent with relation ccomp, mark as hedge. Otherwise, non-hedge. “Her work is pretty good.” (hedge) vs. “She has a pretty face.” (non-hedge) Rule: If part-of-speech of pretty is adverb, mark as hedge. Otherwise, non-hedge.

Baseline Belief Results (without hedge detection)* Tag (count): Precision: Recall: F-measure: ROB (256): 28.02 19.92 23.29 NCB (193): 44.93 16.06 23.66 NA (2762): 77.49 56.34 65.24 CB (4299): 69.80 74.78 72.21 Overall (49643) : 70.69 64.62 67.52 CB: Committed belief NA: Not applicable NCB: Non-committed belief ROB: Reported belief *Experiments used 5-fold cross validation on 2014 DEFT Committed Belief Corpus (Release No. LDC2014E55)

Belief Results: Dictionary lookup Tag (count): Precision: Recall: F-measure: ROB (256): 30.22 21.48 25.11 NCB (193): 49.28 17.62 25.95 NA (2762): 77.69 56.73 65.58 CB (4299): 70.27 75.04 72.58 Overall (49643) : 71.18 65.01 67.95 Belief Results: Rule-based hedge detector Tag (count): Precision: Recall: F-measure: ROB (256): 31.63 24.22 27.43 NCB (193): 50.60 21.76 30.43 NA (2762): 77.89 56.52 65.51 CB (4299): 70.58 74.95 72.70 Overall (49643) : 71.36 65.07 68.07 Improvements primarily in ROB and NCB tho since these represent small portions of the overall corpus

Current Plan Obtain word sense annotation through crowd-sourcing Data Acquisition Obtain word sense annotation through crowd-sourcing Analysis Analyze the accuracy of Turker judgments Classification Train classifier to take into account WSD Evaluation Compare to simple lexical hedge detector

Current Plan Get word sense annotation through crowd sourcing Data Acquisition Get word sense annotation through crowd sourcing Analysis Analyze accuracy of Turker judgments Classification Train classifier to take into account WSD Evaluation Compare to simple lexical hedge detector

Process Get word sense annotation through crowd sourcing Data Acquisition Get word sense annotation through crowd sourcing Analysis Analyze the accuracy of Turker judgments Classification Train classifier to disambiguate hedge from non-hedge uses Evaluation Compare to simple lexical hedge detector

Process Get word sense annotation through crowd-sourcing Data Acquisition Get word sense annotation through crowd-sourcing Analysis Analyze accuracy of Turker judgments Classification Train classifier to disambiguate hedge from non-hedge uses Evaluation Compare to current lexical-based hedge detectors

Obtaining Word Sense Annotations For each hedge word we currently have (80 words, 40 phrases), get hedge and non-hedge definitions from WordNet For each sentence containing hedge word(s), use definitions to formulate task for AMT. “The book takes [about] 400 pages to say what Veblen said in one sentence.” Does the [about] in this sentence mean: almost, approximately, near, on the verge of regarding, other Now: 80 hedges, 40 multi word phrases Previously: 133 with all the tenses; only did AMT gather on 47 of them that had appropriate alternate definitions examples: it was about 2 o’clock, he was about the lake, he was talking about john For task - 10 Qs per hit, randomize which is first, 1 gold check Q

New Annotated Data Potential hedges Hedge Incidence 20,683 9311 hedge 11,372 non-hedge (45.02%) about think like know could … and so forth to a certain extent in some ways et cetera 2124 1724 1507 1399 915 2 1 A little = 83.2% (a little girl, He’s a little much for me to handle.) About – 12.3% (I was talking about Steve. He’s about 5 inches tall). Table 4. Analysis of AMT Annotated Data. (Forum posts from 2014 DEFT Committed Belief Corpora – release no. LDC2014E55, LDC2014E106, LDC2014E125)

Future Work Methods: SVMs, Neural Nets Features Part of speech Position of the hedge Lemmatization LIWC Features Bi-grams/ Tri-grams Likelihood of hedge vs. non-hedge use for word Integrate new (disambiguating) Hedge Detector into BeST System

Thank you!