A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009,


Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Fast Algorithms For Hierarchical Range Histogram Constructions
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University 5 Sep Xi’an Jiaotong University.
Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University 15 Aug Chinese Academy.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Near-Duplicate Detection by Instance-level Constrained Clustering Hui Yang, Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Webpage Understanding: an Integrated Approach
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Query Operations Relevance Feedback & Query Expansion.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Supertagging CMSC Natural Language Processing January 31, 2006.
Semi-automatic Product Attribute Extraction from Store Website
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Semi-Supervised Clustering
Chapter 7. Classification and Prediction
A Brief Introduction to Distant Supervision
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Ontology Evolution: A Methodological Overview
Text Categorization Berlin Chen 2003 Reference:
Enriching Taxonomies With Functional Domain Knowledge
VERB PHYSICS: Relative Physical Knowledge of Actions and Objects
Presentation transcript:

A Metric-based Framework for Automatic Taxonomy Induction Hui Yang and Jamie Callan Language Technologies Institute Carnegie Mellon University ACL2009, Singapore

R OADMAP  Introduction  Related Work  Metric-Based Taxonomy Induction Framework  The Features  Experimental Results  Conclusions

I NTRODUCTION  Semantic taxonomies, such as WordNet, play an important role in solving knowledge-rich problems  Limitations of Manually-created Taxonomies  Rarely complete  Difficult to include new terms from emerging/changing domains  Time-consuming to create; May make it unfeasible for specialized domains and personalized tasks

I NTRODUCTION  Automatic Taxonomy Induction is a solution to  Augment existing resources  Quickly produce new taxonomies for specialized domains and personalized tasks  Subtasks in Automatic Taxonomy Induction  Term extraction  Relation formation  This paper focuses on Relation Formation

Related Work  Pattern-based Approaches  Define lexical-syntactic patterns for relations, and use these patterns to discover instances  Have been applied to extract Is-a, part-of, sibling, synonym, causal, etc, relations  Strength: Highly accurate  Weakness: Sparse coverage of patterns  Clustering-based Approaches  Hierarchically cluster terms based on similarities of their meanings usually represented by a feature vector  Have only been applied to extract is-a and sibling relations  Strength: Allowing discovery of relations which do not explicitly appear in text; higher recall  Weaknesses: Generally fail to produce coherent cluster for small corpora [ Pantel and Pennacchiotti 2006 ]; Hard to label non-leaf nodes

A UNIFIED SOLUTION  Combine strengths of both approaches in a unified framework  Flexibly incorporate heterogeneous features  Use lexical-syntactic patterns as one types of features in a clustering framework Metric-based Taxonomy Induction

THE FRAMEWORK  A novel framework, which  Incrementally clusters terms  Transforms taxonomy induction into a multi-criteria optimization  Using heterogeneous features  Optimization based on two criteria  Minimization of taxonomy structures  Minimum Evolution Assumption  Modeling of term abstractness  Abstractness Assumption

L ET ’ S B EGIN WITH S OME I MPORTANT D EFINITIONS  A Taxonomy is a data model Concept Set Relationship Set Domain

M ORE D EFINITIONS ball table Game Equipment A Full Taxonomy: AssignedTermSet={game equipment, ball, table, basketball, volleyball, soccer, table-tennis table, snooker table} UnassignedTermSet={}

M ORE D EFINITIONS ball Game Equipment A Partial Taxonomy table AssignedTermSet={game equipment, ball, table, basketball, volleyball} UnassignedTermSet={soc cer, table-tennis table, snooker table}

M ORE D EFINITIONS Ontology Metric distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table

A SSUMPTIONS Minimum Evolution Assumption: The Optimal Ontology is One that Introduces Least Information Changes!

I LLUSTRATION Minimum Evolution Assumption

I LLUSTRATION Minimum Evolution Assumption

I LLUSTRATION Minimum Evolution Assumption ball

I LLUSTRATION Minimum Evolution Assumption ball table

I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment

I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment

I LLUSTRATION Minimum Evolution Assumption ball table Game Equipment

A SSUMPTIONS Abstractness Assumption: Each abstraction level has its own Information function

A SSUMPTIONS Abstractness Assumption ball table Game Equipment

M ULTIPLE C RITERION O PTIMIZATION Minimum Evolution objective function Abstractness objective function Scalarization variable

E STIMATING O NTOLOGY M ETRIC  Assume ontology metric is a linear interpolation of some underlying feature functions  Ridge Regression to estimate and predict the ontology metric

THE FEATURES  Our framework allows a wide range of features to be used  Input for the Feature Functions: Two terms  Output: A numeric score to measure semantic distance between these two terms  We can use the following types of feature functions, but not restricted to only these:  Contextual Features  Term Co-occurrence  Lexical-Syntactic Patterns  Syntactic Dependency Features  Word Length Difference  Definition Overlap, etc

E XPERIMENTAL R ESULTS  Task: Reconstruct taxonomies from WordNet and ODP  Not the entire WordNet or ODP, but fragments of WordNet or ODP  Ground Truth: 50 hypernym taxonomies from WordNet; 50 hypernym taxonomies from ODP; 50 meronym taxonomies from WordNet.  Auxiliary Datasets: 1000 Google documents per term or per term pair; 100 Wikipedia documents per term.  Evaluation Metrics: F1-measure (averaged by Leave-One-Out Cross Validation).


P ERFORMANCE OF TAXONOMY INDUCTION  Compare our system (ME) with other state-of-the-art systems  HE: 6 is-a patterns [Hearst 1992]  GI: 3 part-of patterns [Girju et al. 2003]  PR: a probabilistic framework [Snow et al. 2006]  ME: our metric-based framework

P ERFORMANCE OF TAXONOMY INDUCTION  Our system (ME) consistently gives the best F1 for all three tasks.  Systems using heterogeneous features (ME and PR) achieve a significant absolute F1 gain (>30%)

F EATURES VS. RELATIONS  This is the first study of the impact of using different features on taxonomy induction for different relations  Co-occurrence and lexico- syntactic patterns are good for is-a, part-of, and sibling relations  Contextual and syntactic dependency features are only good for sibling relation

F EATURES VS. ABSTRACTNESS  This is the first study of the impact of using different features on taxonomy induction for terms at different abstraction levels  Contextual, co-occurrence, lexical-syntactic patterns, and syntactic dependency features work well for concrete terms;  Only co-occurrence works well for abstract terms

C ONCLUSIONS  This paper presents a novel metric-based taxonomy induction framework, which  Combines strengths of pattern-based and clustering-based approaches  Achieves better F1 than 3 state-of-the-art systems  The first study on the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels

C ONCLUSIONS  This work is a general framework, which  Allows a wider range of features  Allows different metric functions at different abstraction levels  This work has a potential to learn more complex taxonomies than previous approaches



FORMAL FORMULATION OF TAXONOMY INDUCTION  The Task of Taxonomy Induction:  The construction of a full ontology T given a set of concepts C and an initial partial ontology T 0  Keeping adding concepts in C into T 0 Note T 0 could be empty  Until a full ontology is formed

GOAL OF TAXONOMY INDUCTION  Find the optimal full ontology s.t. the information changes since T 0 are least, i.e.,  Note that this is by the Minimum Evolution Assumption

G ET TO THE G OAL  Goal: Since the optimal set of concepts is always C Concepts are added incrementally

G ET TO THE G OAL Plug in definition of information change Transform into a minimization problem Minimum Evolution objective function

E XPLICITLY M ODEL A BSTRACTNESS  Model Abstractness for each Level by Least Square Fit Plug in definition of amount of information for an abstraction level Abstractness objective function


M ORE D EFINITIONS distance = 1.5distance = 2 distance =1 d(, ) = 2 d(, ) = 1 ball d(, ) = 4.5 table Information in an Taxonomy T

M ORE D EFINITIONS d(, ) = 2 d(, ) = 1 ball d(, ) = 1 Information in a Level L ball

 Contextual Features  Global Context KL-Divergence = KL-Divergence(1000 Google Documents for C x, 1000 Google Documents for C y );  Local Context KL-Divergence = KL-Divergence(Left two and Right two words for C x, Left two and Right two words for C y ).  Term Co-occurrence  Point-wise Mutual Information (PMI)  = # of sentences containing the term(s); or # of documents containing the term(s); or n as in “Results 1-10 of about n for …” in Google. EXAMPLES OF FEATURES

 Syntactic Dependency Features  Minipar Syntactic Distance = Average length of syntactic paths in syntactic parse trees for sentences containing the terms;  Modifier Overlap = # of overlaps between modifiers of the terms; e.g., red apple, red pear;  Object Overlap = # of overlaps between objects of the terms when the terms are subjects; e.g., A dog eats apple; A cat eats apple;  Subject Overlap = # of overlaps between subjects of the terms when the terms are objects; e.g., A dog eats apple; A dog eats pear;  Verb Overlap = # of overlaps between verbs of the terms when the terms are subjects/objects; e.g., A dog eats apple; A cat eats pear.

EXAMPLES OF FEATURES  Lexical-Syntactic Patterns

EXAMPLES OF FEATURES  Miscellaneous Features  Definition Overlap = # of non-stopword overlaps between definitions of two terms.  Word Length Difference