A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,

A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009, Singapore

R OADMAP  Introduction  Related Work  Metric-Based Taxonomy Induction Framework  The Features  Experimental Results  Conclusions

I NTRODUCTION  Semantic taxonomies, such as WordNet, play an important role in solving knowledge-rich problems  Limitations of Manually-created Taxonomies  Rarely complete  Difficult to include new terms from emerging/changing domains  Time-consuming to create; May make it unfeasible for specialized domains and personalized tasks

I NTRODUCTION  Automatic Taxonomy Induction is a solution to  Augment existing resources  Quickly produce new taxonomies for specialized domains and personalized tasks  Subtasks in Automatic Taxonomy Induction  Term extraction  Relation formation  This paper focuses on Relation Formation

Related Work  Pattern-based Approaches  Define lexical-syntactic patterns for relations, and use these patterns to discover instances  Have been applied to extract Is-a, part-of, sibling, synonym, causal, etc, relations  Strength: Highly accurate  Weakness: Sparse coverage of patterns  Clustering-based Approaches  Hierarchically cluster terms based on similarities of their meanings usually represented by a feature vector  Have only been applied to extract is-a and sibling relations  Strength: Allowing discovery of relations which do not explicitly appear in text; higher recall  Weaknesses: Generally fail to produce coherent cluster for small corpora [ Pantel and Pennacchiotti 2006 ]; Hard to label non-leaf nodes

A UNIFIED SOLUTION  Combine strengths of both approaches in a unified framework  Flexibly incorporate heterogeneous features  Use lexical-syntactic patterns as one types of features in a clustering framework Metric-based Taxonomy Induction

THE FRAMEWORK  A novel framework, which  Incrementally clusters terms  Transforms taxonomy induction into a multi-criteria optimization  Using heterogeneous features  Optimization based on two criteria  Minimization of taxonomy structures  Minimum Evolution Assumption  Modeling of term abstractness  Abstractness Assumption

L ET ’ S B EGIN WITH S OME I MPORTANT D EFINITIONS  A Taxonomy is a data model Concept Set Relationship Set Domain

M ORE D EFINITIONS ball table Game Equipment A Full Taxonomy: AssignedTermSet={game equipment, ball, table, basketball, volleyball, soccer, table-tennis table, snooker table} UnassignedTermSet={}

M ORE D EFINITIONS ball Game Equipment A Partial Taxonomy table AssignedTermSet={game equipment, ball, table, basketball, volleyball} UnassignedTermSet={soc cer, table-tennis table, snooker table}

M ORE D EFINITIONS Ontology Metric distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table

A SSUMPTIONS Minimum Evolution Assumption: The Optimal Ontology is One that Introduces Least Information Changes!

I LLUSTRATION Minimum Evolution Assumption

I LLUSTRATION Minimum Evolution Assumption ball

I LLUSTRATION Minimum Evolution Assumption ball table

I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment

A SSUMPTIONS Abstractness Assumption: Each abstraction level has its own Information function

A SSUMPTIONS Abstractness Assumption ball table Game Equipment

M ULTIPLE C RITERION O PTIMIZATION Minimum Evolution objective function Abstractness objective function Scalarization variable

E STIMATING O NTOLOGY M ETRIC  Assume ontology metric is a linear interpolation of some underlying feature functions  Ridge Regression to estimate and predict the ontology metric

THE FEATURES  Our framework allows a wide range of features to be used  Input for the Feature Functions: Two terms  Output: A numeric score to measure semantic distance between these two terms  We can use the following types of feature functions, but not restricted to only these:  Contextual Features  Term Co-occurrence  Lexical-Syntactic Patterns  Syntactic Dependency Features  Word Length Difference  Definition Overlap, etc

E XPERIMENTAL R ESULTS  Task: Reconstruct taxonomies from WordNet and ODP  Not the entire WordNet or ODP, but fragments of WordNet or ODP  Ground Truth: 50 hypernym taxonomies from WordNet; 50 hypernym taxonomies from ODP; 50 meronym taxonomies from WordNet.  Auxiliary Datasets: 1000 Google documents per term or per term pair; 100 Wikipedia documents per term.  Evaluation Metrics: F1-measure (averaged by Leave-One-Out Cross Validation).

D ATASETS

P ERFORMANCE OF TAXONOMY INDUCTION  Compare our system (ME) with other state-of-the-art systems  HE: 6 is-a patterns [Hearst 1992]  GI: 3 part-of patterns [Girju et al. 2003]  PR: a probabilistic framework [Snow et al. 2006]  ME: our metric-based framework

P ERFORMANCE OF TAXONOMY INDUCTION  Our system (ME) consistently gives the best F1 for all three tasks.  Systems using heterogeneous features (ME and PR) achieve a significant absolute F1 gain (>30%)

F EATURES VS. RELATIONS  This is the first study of the impact of using different features on taxonomy induction for different relations  Co-occurrence and lexico- syntactic patterns are good for is-a, part-of, and sibling relations  Contextual and syntactic dependency features are only good for sibling relation

F EATURES VS. ABSTRACTNESS  This is the first study of the impact of using different features on taxonomy induction for terms at different abstraction levels  Contextual, co-occurrence, lexical-syntactic patterns, and syntactic dependency features work well for concrete terms;  Only co-occurrence works well for abstract terms

C ONCLUSIONS  This paper presents a novel metric-based taxonomy induction framework, which  Combines strengths of pattern-based and clustering-based approaches  Achieves better F1 than 3 state-of-the-art systems  The first study on the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels

C ONCLUSIONS  This work is a general framework, which  Allows a wider range of features  Allows different metric functions at different abstraction levels  This work has a potential to learn more complex taxonomies than previous approaches

THANK YOU AND QUESTIONS huiyang@cs.cmu.edu callan@cs.cmu.edu huiyang@cs.cmu.edu callan@cs.cmu.edu

E XTRA S LIDES

FORMAL FORMULATION OF TAXONOMY INDUCTION  The Task of Taxonomy Induction:  The construction of a full ontology T given a set of concepts C and an initial partial ontology T 0  Keeping adding concepts in C into T 0 Note T 0 could be empty  Until a full ontology is formed

GOAL OF TAXONOMY INDUCTION  Find the optimal full ontology s.t. the information changes since T 0 are least, i.e.,  Note that this is by the Minimum Evolution Assumption

G ET TO THE G OAL  Goal: Since the optimal set of concepts is always C Concepts are added incrementally

G ET TO THE G OAL Plug in definition of information change Transform into a minimization problem Minimum Evolution objective function

E XPLICITLY M ODEL A BSTRACTNESS  Model Abstractness for each Level by Least Square Fit Plug in definition of amount of information for an abstraction level Abstractness objective function

T HE O PTIMIZATION A LGORITHM

M ORE D EFINITIONS distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table Information in an Taxonomy T

M ORE D EFINITIONS d(, ) = 2 d(, ) = 1 ball d(, ) = 1 Information in a Level L ball

 Contextual Features  Global Context KL-Divergence = KL-Divergence(1000 Google Documents for C x, 1000 Google Documents for C y );  Local Context KL-Divergence = KL-Divergence(Left two and Right two words for C x, Left two and Right two words for C y ).  Term Co-occurrence  Point-wise Mutual Information (PMI)  = # of sentences containing the term(s); or # of documents containing the term(s); or n as in “Results 1-10 of about n for …” in Google. EXAMPLES OF FEATURES

 Syntactic Dependency Features  Minipar Syntactic Distance = Average length of syntactic paths in syntactic parse trees for sentences containing the terms;  Modifier Overlap = # of overlaps between modifiers of the terms; e.g., red apple, red pear;  Object Overlap = # of overlaps between objects of the terms when the terms are subjects; e.g., A dog eats apple; A cat eats apple;  Subject Overlap = # of overlaps between subjects of the terms when the terms are objects; e.g., A dog eats apple; A dog eats pear;  Verb Overlap = # of overlaps between verbs of the terms when the terms are subjects/objects; e.g., A dog eats apple; A cat eats pear.

EXAMPLES OF FEATURES  Lexical-Syntactic Patterns

EXAMPLES OF FEATURES  Miscellaneous Features  Definition Overlap = # of non-stopword overlaps between definitions of two terms.  Word Length Difference

A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,

Similar presentations

Presentation on theme: "A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,

Similar presentations

Presentation on theme: "A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,"— Presentation transcript:

Similar presentations

About project

Feedback