Download presentation
Presentation is loading. Please wait.
Published bySabina Parrish Modified over 9 years ago
1
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it http://lcl.uniroma1.it
2
Motivations We present a graph-based approach to learn a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike other approaches, we learn both concepts and relations entirely from scratch in 3 steps: 1) term extraction 2) definition and hypernym extraction 3) graph pruning A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
3
Taxonomy Learning Workflow Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
4
Taxonomy Learning Workflow Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
5
Terminology Extraction Domain Corpus Domain terms flow network hash function information processing maximum likelihood mesh generation pattern recognition A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli http://lcl.uniroma1.it/termextractor
6
Hypernym graph Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Induced taxonomy Taxonomy Learning Workflow A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
7
Definition & Hypernym Extraction + Domain Filtering definition extraction (WCL) In graph theory, a flow network is a directed graph. Global Cash Flow Network is a business opportunity to make money online. A flow network is a network with two distinguished vertices. Domain terms flow network domain non domain A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Domain Corpus Web glossaries & documents
8
Definition & Hypernym Extraction + Domain Filtering In graph theory, a flow network is a directed graph. hypernym extraction A flow network is a network with two distinguished vertices. flow network directed graph network directed graph network A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli definition extraction (WCL) Domain terms flow network Domain Corpus Web glossaries & documents
9
Definition & Hypernym Extraction + Domain Filtering hypernym extraction A directed graph is a data structure... directed graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Domain Corpus Web glossaries & documents definition extraction (WCL) A directed graph is a graph where... graph flow network directed graph network graph data structure Terms from previous iteration
10
Hypernym Extraction Algorithm (1) Large training set with many uncommon patterns “X is a ADJ term that refers to a kind of Y” Annotated with 4 fields: definiendum (D), definitor (V) containing the verbal pattern and definiens (H) containing the hypernym, and the rest of the sentence (R). – An (often represented by the generic formula HA)/ is traditionally considered / any chemical compound / that, when dissolved in water, gives a solution with a hydrogen ion activity greater than in pure water The algorithm builds a set of word lattices from the training set. Independent lattices are created for each of the 3 basic fields A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
11
Hypernym Extraction Algorithm (2) Lattice learning consists of three steps: 1.each sentence in the training set is pre- processed and each field is generalized to a star pattern “[In arts, a chiaroscuro] D [is] V [a monochrome picture] H.” D=“In *, a ”, V=“is”, H=“a * ” A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
12
Hypernym Extraction Algorithm (3) 2.Clustering: for each field, the training sentences are then clustered according to the star patterns they belong to; In arts, a chiaroscuro is a monochrome picture. In mathematics, a graph is a data structure that consists of... In computer science, a pixel is a dot that is part of a computer image. D: In *, a V: is H: a * A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
13
Hypernym Extraction Algorithm (4) 3. Word-Class Lattice construction: for each sentence cluster, a WCL is created by means of a greedy alignment algorithm A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
14
Performance in definition extraction Outperforms existing methods for definition extraction WikipediaUKWac corpus A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
15
Precision in hypernym extraction Wikipedia UKWac Pattern-based methods achieve much lower recall: 62 vs. 383 hypernyms extracted from UKWac A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
16
The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
17
The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
19
Terminology extraction Definition & hypernym extraction Domain filtering Graph pruning Domain Corpus Web glossaries & documents Domain terms Upper terms Domain terms Hypernym graph Induced taxonomy Taxonomy Learning Workflow A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
20
Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
21
Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli 0 5 3 2 1 0 2 1 2 1 555 888 10 7 7 9
22
Graph Pruning 2) We weight edges and nodes with a novel weighting algorithm. Given the hypernym graph 1) We disconnect false roots and false leaves. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli 0 5 3 2 1 0 2 1 2 1 555 888 10 7 7 9
23
Graph Pruning As a result we obtain a tree-like taxonomy. 2) We weight edges and nodes with a novel weighting algorithm. Given the hypernym graph 1) We disconnect false roots and false leaves. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
24
From the Noisy Hypernym Graph... A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
25
...to a Tree-like Taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
27
Application to ACL Taxonomy ACL Anthology from year 1979 to 2010 (4176 papers). 29 upper terms from WordNet’s abstaction 10,000 terms extracted, first 2000 inspected, 1006 selected (eliminated e.g. : word pair, input sentence, human judgement) 5 iterations, 1329 definitions, 1031 nodes 1274 edges After pruning, 936 nodes 935 edges A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
28
Evaluation (5 annotators) A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Another application of the algorithm starting from IJCAI collection, similar results
29
Evaluation: WordNet reconstruction Same evaluation strategy as in Kozareva&Hovy (EMNLP2010) Only nodes both in WordNet and in the acquired taxonomy are considered in the evaluation (as in K&H)
30
Future work From “strict” taxonomy to lattice A in-house implementation of google “define” to overcome search limitations (no API for Google define) Extension to other languages A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli
31
http://lcl.uniroma1.it April 2011
32
Hypernyms from iteration I Upper terms Initial terminology Hypernyms from iteration II Hypernyms from iteration III Hypernyms from iteration IV Hypernyms from iteration V
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.