ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Improved TF-IDF Ranker
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Creating a Similarity Graph from WordNet
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
WMES3103 : INFORMATION RETRIEVAL
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Computer Science 1 Web as a graph Anna Karpovsky.
Introduction to Machine Learning Approach Lecture 5.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Rui Yan, Yan Zhang Peking University
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
June 12, 2008 The University of Mississippi Design Strategy for Knowledge Base Formation to Automate a Course Map Creation Susan Lukose
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
A Graph-based Friend Recommendation System Using Genetic Algorithm
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Algorithmic Detection of Semantic Similarity WWW 2005.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Presentation transcript:

ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Motivations We present a graph-based approach to learn a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike other approaches, we learn both concepts and relations entirely from scratch in 3 steps: 1) term extraction 2) definition and hypernym extraction 3) graph pruning A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Taxonomy Learning Workflow Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Taxonomy Learning Workflow Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Terminology Extraction Domain Corpus Domain terms flow network hash function information processing maximum likelihood mesh generation pattern recognition A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Hypernym graph Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Induced taxonomy Taxonomy Learning Workflow A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Definition & Hypernym Extraction + Domain Filtering definition extraction (WCL) In graph theory, a flow network is a directed graph. Global Cash Flow Network is a business opportunity to make money online. A flow network is a network with two distinguished vertices. Domain terms flow network domain non domain A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Domain Corpus Web glossaries & documents

Definition & Hypernym Extraction + Domain Filtering In graph theory, a flow network is a directed graph. hypernym extraction A flow network is a network with two distinguished vertices. flow network directed graph network directed graph network A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli definition extraction (WCL) Domain terms flow network Domain Corpus Web glossaries & documents

Definition & Hypernym Extraction + Domain Filtering hypernym extraction A directed graph is a data structure... directed graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Domain Corpus Web glossaries & documents definition extraction (WCL) A directed graph is a graph where... graph flow network directed graph network graph data structure Terms from previous iteration

Hypernym Extraction Algorithm (1) Large training set with many uncommon patterns “X is a ADJ term that refers to a kind of Y” Annotated with 4 fields: definiendum (D), definitor (V) containing the verbal pattern and definiens (H) containing the hypernym, and the rest of the sentence (R). – An (often represented by the generic formula HA)/ is traditionally considered / any chemical compound / that, when dissolved in water, gives a solution with a hydrogen ion activity greater than in pure water The algorithm builds a set of word lattices from the training set. Independent lattices are created for each of the 3 basic fields A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Hypernym Extraction Algorithm (2) Lattice learning consists of three steps: 1.each sentence in the training set is pre- processed and each field is generalized to a star pattern “[In arts, a chiaroscuro] D [is] V [a monochrome picture] H.” D=“In *, a ”, V=“is”, H=“a * ” A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Hypernym Extraction Algorithm (3) 2.Clustering: for each field, the training sentences are then clustered according to the star patterns they belong to; In arts, a chiaroscuro is a monochrome picture. In mathematics, a graph is a data structure that consists of... In computer science, a pixel is a dot that is part of a computer image. D: In *, a V: is H: a * A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Hypernym Extraction Algorithm (4) 3. Word-Class Lattice construction: for each sentence cluster, a WCL is created by means of a greedy alignment algorithm A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Performance in definition extraction Outperforms existing methods for definition extraction WikipediaUKWac corpus A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Precision in hypernym extraction Wikipedia UKWac Pattern-based methods achieve much lower recall: 62 vs. 383 hypernyms extracted from UKWac A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy Taxonomy Learning Workflow A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Graph Pruning 2) We weight edges and nodes with a novel weighting algorithm. Given the hypernym graph 1) We disconnect false roots and false leaves. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Graph Pruning As a result we obtain a tree-like taxonomy. 2) We weight edges and nodes with a novel weighting algorithm. Given the hypernym graph 1) We disconnect false roots and false leaves. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

From the Noisy Hypernym Graph... A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

...to a Tree-like Taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Application to ACL Taxonomy ACL Anthology from year 1979 to 2010 (4176 papers). 29 upper terms from WordNet’s abstaction 10,000 terms extracted, first 2000 inspected, 1006 selected (eliminated e.g. : word pair, input sentence, human judgement) 5 iterations, 1329 definitions, 1031 nodes 1274 edges After pruning, 936 nodes 935 edges A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

Evaluation (5 annotators) A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Another application of the algorithm starting from IJCAI collection, similar results

Evaluation: WordNet reconstruction Same evaluation strategy as in Kozareva&Hovy (EMNLP2010) Only nodes both in WordNet and in the acquired taxonomy are considered in the evaluation (as in K&H)

Future work From “strict” taxonomy to lattice A in-house implementation of google “define” to overcome search limitations (no API for Google define) Extension to other languages A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli

April 2011

Hypernyms from iteration I Upper terms Initial terminology Hypernyms from iteration II Hypernyms from iteration III Hypernyms from iteration IV Hypernyms from iteration V