AQUAINT Phase II Six Month Workshop – October 2004 Fusing Rich Information Extracted from Multiple Media and Languages to Generate Contextualized, Complex.

Slides:

Advertisements

Similar presentations

SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS

Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Semantic Role Labeling Abdul-Lateef Yussiff

Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.

Semantic Role Chunking Combining Complementary Syntactic Views Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Martin, Daniel Jurafsky  Center for.

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Introduction to Machine Learning Approach Lecture 5.

Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.

Information Retrieval in Practice

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Interpreting Dictionary Definitions Dan Tecuci May 2002.

Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.

Open Information Extraction using Wikipedia

Discovery of Manner Relations and their Applicability to Question Answering Roxana Girju 1,2, Manju Putcha 1, and Dan Moldovan 1 University of Texas at.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

A Language Independent Method for Question Classification COLING 2004.

AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

What you have learned and how you can use it : Grammars and Lexicons Parts I-III.

Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

AQUAINT June 2002 Workshop June 2002 Just-in-Time Interactive Question Answering Sanda Harabagiu: PI Language Computer Corporation.

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

Supertagging CMSC Natural Language Processing January 31, 2006.

Semi-automatic Product Attribute Extraction from Store Website

Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Overview of Statistical NLP IR Group Meeting March 7, 2006.

AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.

Approaches to Machine Translation

CSC 594 Topics in AI – Natural Language Processing

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Automatic Detection of Causal Relations for Question Answering

Approaches to Machine Translation

CS224N Section 3: Corpora, etc.

CS224N Section 3: Project,Corpora

Presentation transcript:

AQUAINT Phase II Six Month Workshop – October 2004 Fusing Rich Information Extracted from Multiple Media and Languages to Generate Contextualized, Complex Answers Vasileios Hatzivassiloglou, Kathleen R. McKeown, Dan Jurafsky, Wayne H. Ward, James H. Martin Columbia University Stanford University University of Colorado at Boulder University of Texas at Dallas

AQUAINT Phase II Six Month Workshop – October 2004 Phase II Vision Provide long, detailed, and complex answers Handle question types other than factual questions Develop a unified, extensible framework for treating such questions

AQUAINT Phase II Six Month Workshop – October 2004 Research Goals Develop new unified strategy for generating and piecing together complex answers Shallow semantic analysis annotates answer fragments, allowing answer filtering, comparison, and composition Extend analysis to multiple languages, media, and linked questions

AQUAINT Phase II Six Month Workshop – October 2004 Semantic Analysis Multiple levels Top level provides appropriate fillers for slots dependent on the question type –Events (who? when? where? completed? conditional?) –Opinions (target, holder, group, actual opinion predicate, time frame, polarity, strength) –Definitions –Biographies

AQUAINT Phase II Six Month Workshop – October 2004 Semantic Analysis Support Bottom level annotates text with general features that can be used to determine the higher level features –Semantic roles (from semantic parser) –Time expressions –Lexical polarity and semantic strength values

AQUAINT Phase II Six Month Workshop – October 2004 Maximum Coverage of Information A new approach for formalizing the problem of information selection Input: –Set of text units (e.g., sentences) that are potentially relevant to the answer –Set of concepts that are desirable in the answer (e.g., representations of related events) –Matrix showing which text unit covers which concepts –Information weights assigned to each concept –Costs assigned to each text unit

AQUAINT Phase II Six Month Workshop – October 2004 Example I(T1) = I(T2 & T3) T1 T2 T3 T4 C1C2C3C4C

AQUAINT Phase II Six Month Workshop – October 2004 Benefits of the approach Formalization allows decoupling of the features (concepts) from the information selection algorithm Problem translates to well-known complexity theory problem (maximum set cover) Proof that under this model, this part of Q&A is NP-hard

AQUAINT Phase II Six Month Workshop – October 2004 But there is a silver lining… Efficient and effective greedy algorithm for Maximum Set Cover can be applied here Solution guaranteed to cover at least (1-1/e) ≈ 64% of the information in the ideal solution Evaluation over DUC data showed that this approach addresses redundancy effectively (see Filatova & Hatzivassiloglou, Coling 04)

AQUAINT Phase II Six Month Workshop – October 2004 Definitional Questions Approach: Combine data-driven and knowledge-based methods The latter anticipate what “should” be in the definition (e.g., “X is a kind of Y”) System improvements –Doubled predicate pattern coverage in 2004 –Increased system robustness –Included rewriting of pronominal references

AQUAINT Phase II Six Month Workshop – October 2004 Learning Definitional Predicates Before, we used hand-annotated examples Now, we –bootstrap from a few known patterns (X caused Y) signaling a given relationship to –find many pairs for this relationship (attack/explosion, speeding/ticket) –use statistical data to find new such relationships without the patterns

AQUAINT Phase II Six Month Workshop – October 2004 Extracting Definitions First place in “question-based” DUC 2004 definitions among 22 teams Who is Sonia Gandhi? Congress President Sonia Gandhi, who married into what was once India’s most powerful political family, is the first non-Indian since independence 50 years ago to lead the Congress. After Prime Minister Rajiv Gandhi was assassinated in 1991, Gandhi was persuaded by the Congress to succeed her husband to continue leading the party as the chief, but she refused. The BJP had shrugged off the influence of the 51-year-old Sonia Gandhi when she stepped into politics early this year, dismissing her as a “foreigner.” Sonia Gandhi is now an Indian citizen. Gandhi, who is 51, met her husband when she was an 18-year old student at Cambridge in London, the first time she was away from her native Italy.

AQUAINT Phase II Six Month Workshop – October 2004 New Work in Opinions Localize opinion to a specific predicate; add time and opinion holder attributes Use WordNet hypernym/hyponym relationships to propagate positive/negative polarity values at the word level Calculate measure of semantic strength Participated in recent opinion pilot

AQUAINT Phase II Six Month Workshop – October 2004 New Work in Events Tested event model (participants + connecting verb) as a possible set of information concepts Significant improvement over a word-based approach (tf*idf) Use clusters of related events to learn automatically which relationships are random and which are typical of an event type

AQUAINT Phase II Six Month Workshop – October 2004 Fusing Rich Information Extracted from Multiple Media and Languages to Generate Contextualized, Complex Answers Project Status Wayne Ward, James H. Martin, Kadri Hacioglu Sameer Pradhan, Steven Bethard,Ying Chen, Benjamin Douglas University of Colorado Dan Jurafsky Stanford University

AQUAINT Phase II Six Month Workshop – October 2004 Initial Focus Semantic Role Structure for QA –Approaches complementary to Columbia Specific Work On –Opinions –Time Expressions –Events Multi-Lingual Work –English, Chinese, Arabic tools

AQUAINT Phase II Six Month Workshop – October 2004 Thematic Parse Accuracy IDClassCombined Gold96 (97,96)9391 (91,90) Charniak87 (92,82)9281 (86,76) PropBank Data TREC Data IDClassCombined Charniak73 (76,71)8463 (65,61)

AQUAINT Phase II Six Month Workshop – October 2004 Alternate Algorithms Dependency tree based –Potentially more robust because of simpler path structures –Different “view” from Minipar, based on rules not trained on TreeBank Chunking –SVM chunk syntactic base phrases –Second SVM classify chunks with semantic roles

AQUAINT Phase II Six Month Workshop – October 2004 Semantic Parsing in Chinese Syntactic parser –SVM POS tagger –Retrained Collins parser –Chinese Treebank 2.0 –Performance: P/R = 78.9/76.4 Semantic parser –PropBank Tags –Features: Syntactic Path, Target, Phrasal Category –Data: 1023 sentences as training set 113 sentences test set –Performance: P/R = 81.6/67.1

AQUAINT Phase II Six Month Workshop – October 2004 Opinion/Opinion_Holder Joint work with Columbia Opinion ID as supervised Machine Learning Answer “How does X feel about Y” Propositional opinions (prop arg of verb) Same SVM framework as general semantic tagger Annotated FrameNet and PropBank sentences If [ OH she] hadn’t known [ O that he liked nothing about her] she might have mistaken that note in his voice for admiration

AQUAINT Phase II Six Month Workshop – October 2004 Opinion/Opinion_Holder Two different SVM architectures for Opinion –Single classifier walk constituent tree CxC –2 stage: find propositions then classify op/non-op PxP Opinion and Opinion_Holder

AQUAINT Phase II Six Month Workshop – October 2004 Time Expressions Recognize time expressions in English and Chinese SVM chunking and tagging problem Language independent representation Participated in TERN evaluation That’s 30 percent more than [the same period [a year ago.]]

AQUAINT Phase II Six Month Workshop – October 2004 Time Expressions

AQUAINT Phase II Six Month Workshop – October 2004 Event Detection Train and test on TimeBank corpus Determine phrases describing events Chunk EVENT expressions in TimeBank Label with attribute –REPORTING, PERCEPTION, ASPECTUAL, I_ACTION, I_STATE, STATE, OCCURRENCE.

AQUAINT Phase II Six Month Workshop – October 2004 Arabic Work SVM based NLP tools for Arabic Tokenizer Part-Of-Speech tagger Syntactic base phrase chunker Trained on Arabic TreeBank

AQUAINT Phase II Six Month Workshop – October 2004 Arabic Work

AQUAINT Phase II Six Month Workshop – October 2004 Next 18 months Complete opinion work Much more focus on events Processing audio documents –Produce word lattice with ASR –Use chunking tagger to parse word lattice Dialog –Decomposition –Clarification –Follow-up

AQUAINT Phase II Six Month Workshop – October 2004 Thematic Role Tagging Assigning semantic labels to sentence elements. Elements are arguments of some predicate or participants in some event. [ DATE In 1901] [ PATIENT President William McKinley] was [ PREDICATE shot] [ AGENT by anarchist Leon Czolgosz] [ LOCATION at the Pan-American Exposition]

AQUAINT Phase II Six Month Workshop – October 2004 Use of thematic tagging in QA Generating novel answers involving –Opinions (believe, confirm, deny, negate) –Events ( Activities with a starting and ending point involving fixed participants ) –Causal questions Query: What effect does a prism have on light? Thematic Tagging:[RESULT What effect] does [CAUSE a prism] have on [THEME light]? Now search for a RESULT that has ‘prism’ as a CAUSE.