Jan Wiebe University of Pittsburgh Claire Cardie Cornell University Ellen Riloff University of Utah Opinions in Question Answering.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Problem Semi supervised sarcasm identification using SASI
Manual Subjectivity Analysis. EUROLAN July 30, Preliminaries What do we mean by subjectivity? The linguistic expression of somebody’s emotions,
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis Theresa Wilson Janyce Wiebe Paul Hoffmann University of Pittsburgh.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak.
Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Software Quality Metrics
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.
Combining Low-Level and Summary Representations of Opinions for Multi- Perspective Question Answering Claire Cardie, Janyce Wiebe, Theresa Wilson, Diane.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Chapter 5 Data mining : A Closer Look.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining) Sisay Fissaha Adafre School of Computing Dublin City University.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.
Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William.
A Language Independent Method for Question Classification COLING 2004.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
1 Italian FE Component CROSSMARC Eighth Meeting Crete 24 June 2003.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Evaluating an Opinion Annotation Scheme Using a New Multi- perspective Question and Answer Corpus (AAAI 2004 Spring) Veselin Stoyanov Claire Cardie Diane.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Finding strong and weak opinion clauses Theresa Wilson, Janyce Wiebe, Rebecca Hwa University of Pittsburgh Just how mad are you? AAAI-2004.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Learning Extraction Patterns for Subjective Expressions
Aspect-based sentiment analysis
Paradigms, Corpora, and Tools in Discourse and Dialog Research
Automatic Detection of Causal Relations for Question Answering
Presentation transcript:

Jan Wiebe University of Pittsburgh Claire Cardie Cornell University Ellen Riloff University of Utah Opinions in Question Answering

Overview Techniques and tools to support multi- perspective question answering (MPQA) Goals:  produce high-level summaries of opinions  incorporate rich information about opinions extracted from text

Overview Opinion-oriented information extraction Extract opinion frames for individual expressions Combine to create opinion-oriented “scenario” templates Opinion Summary Template

MPQA Corpus Grew out of the 2002 ARDA NRRC Workshop on Multi-Perspective Question Answering Detailed annotations of opinions Freely available (thanks to David Day): nrrc.mitre.org/NRRC/publications.htm

Collaborations Interactions with end-to-end system teams Integrated corpus annotation Pilot opinion evaluation

Outline Recent activities  Subjective sentence identifier  Clause intensity identifier  Extended annotation scheme Version 1  Q&A corpus  Nested opinions  Opinion summaries What’s next

Subjective Sentence Identifier Input is unlabeled data Evaluated on manual annotations of the MPQA corpus Accuracy as good as supervised systems which classify all sentences

Subjective Sentence Identifier Bootstraps from a known subjective vocabulary, labeling the sentences it can with confidence Extraction pattern learner finds clues of subjectivity in that corpus Incorporated into a statistical model trained on the automatically labeled data Multiple classification strategies  76% accuracy with 54% baseline  80% subj. precision and 66% subj. recall  80% obj. precision: and 51% obj. recall

Clause-level intensity (strength) identification Maximum intensity of the opinions in a clause Neutral, low, medium, high Evaluated on manual annotations of the MPQA corpus

I am furious that my landlord refused to return my security deposit until I sued them. Example return my that am them sued I to refused landlord furious I until deposit securitymy High Strength Medium Strength Neutral Opinionated Sentence

Clause-level intensity (strength) identification Classification and regression learners Accuracy: how many clauses are assigned exactly the correct class? Mean Squared Error: how close are the answers to the right ones? Accuracy: classification > regression  23-79% over baseline MSE: regression > classification  57-64% over baseline

Opinion Frames direct subjective annotation Span: “strongly criticized and condemned” Source: Strength (intensity): high Attitudes: negative toward the report Target: report The report has been strongly criticized and condemned by many countries.

Major Attitude Types Positive Negative Arguing for ones world view Intention

Negative and Positive Example People are happy because Chavez has fallen, she said. direct subjective annotation span: are happy source: attitude: attitude annotation span: are happy because Chavez has fallen type: positive and negative positive target: negative target: target annotation span: Chavez has fallen target annotation span: Chavez

Arguing for World View Example Putin remarked that events in Chechnia “could be interpreted only in the context of the struggle against international terrorism.” direct subjective annotation span: remarked source: attitude: attitude annotation span: could be interpreted only in the context of the struggle against international terrorism type: argue for world view target: target annotation span: events in Chechnia

Characteristics Sarcastic "Great, keep on buying dollars so there'll be more and more poor people in the country," shouted one. Speculative Leaders probably held their breath… Characteristics of the linguistic realization

Q&A Corpus Includes 98 documents from the NRRC corpus, split into four topics: Kyoto Protocol 2002 elections in Zimbabwe U.S. annual human rights report 2002 coup in Venezuela

Q&A Corpus Includes 30 questions 15 questions classified as fact  What is the Kyoto Protocol about?  What is the Kiko Network?  Where did Mugabe vote in the 2002 presidential election? 15 questions classified as opinion  How do European Union countries feel about the US opposition the Kyoto protocol?  Are the Japanese unanimous in their opinion of Bush’s position on the Kyoto Protocol?  What was the American and British reaction to the reelection of Mugabe?

Q&A Corpus Answers annotations added by two annotators  Minimal spans that constituted or contributed to an answer Confidence Partial?

Difficulties in Corpus Creation Annotating answers  Difficult to decide what constitutes an answer: Q: “Did most Venezuelans support the 2002 coup?” A: “Protesters…failed to gain the support of the army.” ???  Not clear what sources to attribute to collective entities European Union: The EU Parliament? Great Britain? GB government? Tony Blair? The Japanese: The Japanese government? Emperor Akihito? Empress Michiko? The Kiko Network?

Q&A Corpus Interannotator agreement  85% on average  using Wiebe et. al’s agr(a||b) measure  78% and 93%, respectively for each annotator

Evaluating MPQA Opinion Annotations Answer probability: estimate P(opinion answer | opinion question) P(fact answer | fact question)  Low-level opinion information reliable predictor facts: 78% opinions: 93% Answer rank  Sentence-based retrieval  Filter based on opinion annotations  Examine rank of first sentence w/answer  Filtering improves answer rank

Summary Representations of Opinions Direct subjective annotation Source: Attitude: Opinion Summary Template

Reporting in text Clapp sums up the environmental movement’s reaction: “The polluters are unreasonable’’ Charlie was angry at Alice’s claim that Bob was unhappy

Hierarchy of Perspective & Speech Expressions Charlie was angry at Alice’s claim that Bob was unhappy angryclaimimplicit speech event unhappy sums up implicit speech event reaction Clapp sums up the environmental movement’s reaction: “The polluters are unreasonable’’

Baseline 1: Only filter through writer 66% correct angryclaim implicit unhappy

Baseline 2: Dependency Tree 72% correct angry implicit claim unhappy claim unhappy claim unhappy

ML Approach Features  Parse-based  Positional  Lexical  Genre-specific IND decision trees (mml criterion) 78% correct

Summary Representations of Opinions Direct subjective annotation Source: Attitude: Opinion Summary Template

Opinion Summaries Summaries based on manual annotations  Single-document summaries  Opinion annotations grouped by source and target  Sources characterized by degree of subjectivity/objectivity  Simple graph-based graphical interface Overview of entire graph Focus on portion of the graph Drill-down to opinion annotations (highlighted) Grouping/deleting of sources/targets JGRAPH package

The next 6 months Identify individual expressions of subjectivity Perform manual annotations Extract Sources Opinion summaries with automatic annotations