A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009, Singapore
R OADMAP Introduction Related Work Metric-Based Taxonomy Induction Framework The Features Experimental Results Conclusions
I NTRODUCTION Semantic taxonomies, such as WordNet, play an important role in solving knowledge-rich problems Limitations of Manually-created Taxonomies Rarely complete Difficult to include new terms from emerging/changing domains Time-consuming to create; May make it unfeasible for specialized domains and personalized tasks
I NTRODUCTION Automatic Taxonomy Induction is a solution to Augment existing resources Quickly produce new taxonomies for specialized domains and personalized tasks Subtasks in Automatic Taxonomy Induction Term extraction Relation formation This paper focuses on Relation Formation
Related Work Pattern-based Approaches Define lexical-syntactic patterns for relations, and use these patterns to discover instances Have been applied to extract Is-a, part-of, sibling, synonym, causal, etc, relations Strength: Highly accurate Weakness: Sparse coverage of patterns Clustering-based Approaches Hierarchically cluster terms based on similarities of their meanings usually represented by a feature vector Have only been applied to extract is-a and sibling relations Strength: Allowing discovery of relations which do not explicitly appear in text; higher recall Weaknesses: Generally fail to produce coherent cluster for small corpora [ Pantel and Pennacchiotti 2006 ]; Hard to label non-leaf nodes
A UNIFIED SOLUTION Combine strengths of both approaches in a unified framework Flexibly incorporate heterogeneous features Use lexical-syntactic patterns as one types of features in a clustering framework Metric-based Taxonomy Induction
THE FRAMEWORK A novel framework, which Incrementally clusters terms Transforms taxonomy induction into a multi-criteria optimization Using heterogeneous features Optimization based on two criteria Minimization of taxonomy structures Minimum Evolution Assumption Modeling of term abstractness Abstractness Assumption
L ET ’ S B EGIN WITH S OME I MPORTANT D EFINITIONS A Taxonomy is a data model Concept Set Relationship Set Domain
M ORE D EFINITIONS ball table Game Equipment A Full Taxonomy: AssignedTermSet={game equipment, ball, table, basketball, volleyball, soccer, table-tennis table, snooker table} UnassignedTermSet={}
M ORE D EFINITIONS ball Game Equipment A Partial Taxonomy table AssignedTermSet={game equipment, ball, table, basketball, volleyball} UnassignedTermSet={soc cer, table-tennis table, snooker table}
M ORE D EFINITIONS Ontology Metric distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table
A SSUMPTIONS Minimum Evolution Assumption: The Optimal Ontology is One that Introduces Least Information Changes!
I LLUSTRATION Minimum Evolution Assumption
I LLUSTRATION Minimum Evolution Assumption
I LLUSTRATION Minimum Evolution Assumption ball
I LLUSTRATION Minimum Evolution Assumption ball table
I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment
I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment
I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment
A SSUMPTIONS Abstractness Assumption: Each abstraction level has its own Information function
A SSUMPTIONS Abstractness Assumption ball table Game Equipment
M ULTIPLE C RITERION O PTIMIZATION Minimum Evolution objective function Abstractness objective function Scalarization variable
E STIMATING O NTOLOGY M ETRIC Assume ontology metric is a linear interpolation of some underlying feature functions Ridge Regression to estimate and predict the ontology metric
THE FEATURES Our framework allows a wide range of features to be used Input for the Feature Functions: Two terms Output: A numeric score to measure semantic distance between these two terms We can use the following types of feature functions, but not restricted to only these: Contextual Features Term Co-occurrence Lexical-Syntactic Patterns Syntactic Dependency Features Word Length Difference Definition Overlap, etc
E XPERIMENTAL R ESULTS Task: Reconstruct taxonomies from WordNet and ODP Not the entire WordNet or ODP, but fragments of WordNet or ODP Ground Truth: 50 hypernym taxonomies from WordNet; 50 hypernym taxonomies from ODP; 50 meronym taxonomies from WordNet. Auxiliary Datasets: 1000 Google documents per term or per term pair; 100 Wikipedia documents per term. Evaluation Metrics: F1-measure (averaged by Leave-One-Out Cross Validation).
D ATASETS
P ERFORMANCE OF TAXONOMY INDUCTION Compare our system (ME) with other state-of-the-art systems HE: 6 is-a patterns [Hearst 1992] GI: 3 part-of patterns [Girju et al. 2003] PR: a probabilistic framework [Snow et al. 2006] ME: our metric-based framework
P ERFORMANCE OF TAXONOMY INDUCTION Our system (ME) consistently gives the best F1 for all three tasks. Systems using heterogeneous features (ME and PR) achieve a significant absolute F1 gain (>30%)
F EATURES VS. RELATIONS This is the first study of the impact of using different features on taxonomy induction for different relations Co-occurrence and lexico- syntactic patterns are good for is-a, part-of, and sibling relations Contextual and syntactic dependency features are only good for sibling relation
F EATURES VS. ABSTRACTNESS This is the first study of the impact of using different features on taxonomy induction for terms at different abstraction levels Contextual, co-occurrence, lexical-syntactic patterns, and syntactic dependency features work well for concrete terms; Only co-occurrence works well for abstract terms
C ONCLUSIONS This paper presents a novel metric-based taxonomy induction framework, which Combines strengths of pattern-based and clustering-based approaches Achieves better F1 than 3 state-of-the-art systems The first study on the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels
C ONCLUSIONS This work is a general framework, which Allows a wider range of features Allows different metric functions at different abstraction levels This work has a potential to learn more complex taxonomies than previous approaches
THANK YOU AND QUESTIONS
E XTRA S LIDES
FORMAL FORMULATION OF TAXONOMY INDUCTION The Task of Taxonomy Induction: The construction of a full ontology T given a set of concepts C and an initial partial ontology T 0 Keeping adding concepts in C into T 0 Note T 0 could be empty Until a full ontology is formed
GOAL OF TAXONOMY INDUCTION Find the optimal full ontology s.t. the information changes since T 0 are least, i.e., Note that this is by the Minimum Evolution Assumption
G ET TO THE G OAL Goal: Since the optimal set of concepts is always C Concepts are added incrementally
G ET TO THE G OAL Plug in definition of information change Transform into a minimization problem Minimum Evolution objective function
E XPLICITLY M ODEL A BSTRACTNESS Model Abstractness for each Level by Least Square Fit Plug in definition of amount of information for an abstraction level Abstractness objective function
T HE O PTIMIZATION A LGORITHM
M ORE D EFINITIONS distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table Information in an Taxonomy T
M ORE D EFINITIONS d(, ) = 2 d(, ) = 1 ball d(, ) = 1 Information in a Level L ball
Contextual Features Global Context KL-Divergence = KL-Divergence(1000 Google Documents for C x, 1000 Google Documents for C y ); Local Context KL-Divergence = KL-Divergence(Left two and Right two words for C x, Left two and Right two words for C y ). Term Co-occurrence Point-wise Mutual Information (PMI) = # of sentences containing the term(s); or # of documents containing the term(s); or n as in “Results 1-10 of about n for …” in Google. EXAMPLES OF FEATURES
Syntactic Dependency Features Minipar Syntactic Distance = Average length of syntactic paths in syntactic parse trees for sentences containing the terms; Modifier Overlap = # of overlaps between modifiers of the terms; e.g., red apple, red pear; Object Overlap = # of overlaps between objects of the terms when the terms are subjects; e.g., A dog eats apple; A cat eats apple; Subject Overlap = # of overlaps between subjects of the terms when the terms are objects; e.g., A dog eats apple; A dog eats pear; Verb Overlap = # of overlaps between verbs of the terms when the terms are subjects/objects; e.g., A dog eats apple; A cat eats pear.
EXAMPLES OF FEATURES Lexical-Syntactic Patterns
EXAMPLES OF FEATURES Miscellaneous Features Definition Overlap = # of non-stopword overlaps between definitions of two terms. Word Length Difference