LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Chapter 4 Probability and Probability Distributions
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Helping Editors Choose Better Seed Sets for Entity Set Expansion Vishnu Vyas, Patrick Pantel, Eric Crestan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/05/10.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Generalized Model Selection For Unsupervised Learning in High Dimension Vaithyanathan and Dom IBM Almaden Research Center NIPS ’ 99.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Outline Problem Background Theory Extending to NLP and Experiment
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words Dmitry Davidov, Ari Rappoport The Hebrew University.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Queensland University of Technology
Using lexical chains for keyword extraction
David Mareček and Zdeněk Žabokrtský
Erasmus University Rotterdam
Applying Key Phrase Extraction to aid Invalidity Search
Statistical NLP: Lecture 9
A Unifying View on Instance Selection
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Enriching Taxonomies With Functional Domain Knowledge
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP & CoNLL 2007 (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning)

Outlines Introduction Related Work Learning Directionality of Inference Rules Experimental Setup Experimental Results Conclusion

Introduction (1) Inference: X eats Y ⇔ X likes Y Examples: “I eat spicy food. ⇒ I like spicy food. (YES) I like rollerblading( 直排輪溜冰 ). ⇒ I eat rollerblading. (NO) Preference: X eats Y ⇒ X likes Y (Asymmetric) Plausibility: 2 sets:{1, 2, 3} {4} Directionality: 3 sets: {1} {2} {3}

Introduction (2) Applications (for improving the performance of) QA (Harabagiu and Hickl, 2006) Multi-Document Summarization (Barzilay et al. 1999) IR (Anick and Tipirneni 1999) Proposed algorithm LEDIR (LEarning Directionality of Inference Rules, pronounced “Leader”) Filtering incorrect rules (case 4) Identifying the directionality of the correct ones (case 1, 2, or 3)

Related Work Learning inference rules Barzilay and McKeown (2001) for paraphrases, DIRT (Lin and Pantel 2001) and TEASE (Szpektor et al. 2004) for inference rules Low precision and bidirectional rules only Learning directionality Chklovski and Pantel (2004) Zanzotto et al. (2006) Torisawa (2006) Geffet and Dagan (2005)

Learning Directionality of Inference Rules (1) – Formal Definition p is a binary semantic relation. The semantic relation can be verb or other relation. x, y are entities. Plausibility: 2 sets:{1, 2, 3} {4} Directionality: 3 sets: {1} {2} {3}

Learning Directionality of Inference Rules (2) – Underlying Assumptions Distributional hypothesis (Harris 1954) words that appear in the same contexts tend to have similar meanings For modeling lexical semantics Directionality hypothesis If two binary semantic relations tend to occur in similar contexts and the first one occurs in significantly more contexts than the second, then the second most likely implies the first and not vice versa. Generality: X eats Y 3000 次 X eats Y ⇒ X likes Y X likes Y 8000 次 Should be

Learning Directionality of Inference Rules (3) – Underlying Assumptions (cont.) Concept in semantic space Being much richer for reasoning about inferences than simple surface words Modeling the context of a relation p of the form using the semantic classes c x and c y of words that can be instantiated for x and y respectively Context similarity of two relations Overlap coefficient: |X ∩ Y| / min(|X|, |Y|)

Learning Directionality of Inference Rules (4) – Selectional Preferences Relational selectional preferences (RSPs) of a binary relation p in the set of semantic classes C(x) and C(y) of words x and y C(x) = { c x : x in instance, c x : the class of term x} C(y) = { c y : y in instance, c y : the class of term y} Example: x likes y using the semantic classes from WordNet C(x) = {individual, social_group…} C(y) = {individual, food, activity…}

Learning Directionality of Inference Rules (5) – Inference Plausibility and Directionality Context similarity of two relations The overlap coefficient of p i and p j Example: ∩

Learning Directionality of Inference Rules (6) – Inference Plausibility and Directionality (cont.) α and β will be determined by experiments.

Learning Directionality of Inference Rules (7) – Two Models (JRM and IRM) Model 1: Joint Relational Model (JRM) Count the actual occurrences of relation p in the corpus Model 2: Independent Relational Model (IRM) Context similarity of two relations Cartesian product

Learning Directionality of Inference Rules (8) – Model 1: Joint Relational Model (JRM) Context similarity of two relations The overlap coefficient of p i and p j Estimating the frequencies :

Experiment Setup (1) Inference rules choosing the inference rules from the DIRT resource (Lin and Pantel 2001) DIRT consists of 12 million rules extracted from 1GB of newspaper text

Experiment Setup (2) Semantic classes Must having the right balance between abstraction and discrimination The first set of semantic classes obtained by running the CBC clustering algorithm (Pantel and Lin, 2002)  on TREC-9 and TREC-2002 newswire collections consisting of over 600 million words. resulted in 1628 clusters, each representing a semantic class. The second set of semantic classes Obtained by using WordNet 2.1 (Fellbaum 1998) A cut at depth four resulted in a set of 1287 semantic classes (only WordNet noun Hierarchy)

Experiment Setup (3) Implementation parsed the 1999 AP newswire collection consisting of 31 million words with Minipar (Lin 1993) Gold Standard Construction randomly sampled 160 inference rules of the form pi ⇔ pj from DIRT, removed 3 nominalization rules, resulted in 157 rules. Using 2 annotators 57 rules used for training set to train annotators 100 rules used for blind test set for this two annotators  Inter-annotator agreement: kappa=0.63  Revising the disagreements together to get the final gold standard

Experiment Setup (4) Baselines B-random Randomly assigns one of the four possible tags to each candidate inference rule. B-frequent Assigns the most frequently occurring tag in the gold standard to each candidate inference rule B-DIRT Assumes each inference rule is bidirectional and assigns the bidirectional tag to each candidate inference rule.

Experimental Results (1) Evaluation Criterion Parameter combination Ran all our algorithms with different parameter combinations on the development set (the 57 DIRT rules), resulted in a total of 420 experiments Used the accuracy statistic to obtain the best parameter combination for each of our four systems Then used these parameter values to obtain the corresponding percentage accuracies on the test set for each of the four systems

Experimental Results (2)

Experimental Results (3) Baseline 66% Baseline 48.48%

Conclusion The problem of semantic inferences fundamental to understanding natural language an integral part of many natural language applications The Directionality Hypothesis The Directionality Hypothesis can indeed be used to filter incorrect inference rules This result is one step in the direction of solving the basic problem of semantic inference

Thanks!!