Concept Hierarchy Induction by Philipp Cimiano

Slides:



Advertisements
Similar presentations
Conceptual Clustering
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Improved TF-IDF Ranker
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Predicting the Semantic Orientation of Adjectives
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Clustering Unsupervised learning Generating “classes”
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
BY PHILIPP CIMIANO PRESENTED BY JOSEPH PARK CONCEPT HIERARCHY INDUCTION.
Ontology Learning from Text: A Survey of Methods Source: LDV Forum,Volume 20, Number 2, 2005 Authors: Chris Biemann Reporter:Yong-Xiang Chen.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
A Language Independent Method for Question Classification COLING 2004.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Playing Biology ’ s Name Game: Identifying Protein Names In Scientific Text Daniel Hanisch, Juliane Fluck, Heinz-Theodor Mevissen and Ralf Zimmer Pac Symp.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Class Imbalance in Text Classification
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Learning Attributes and Relations
Result of Ontology Alignment with RiMOM at OAEI’06
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

Concept Hierarchy Induction by Philipp Cimiano

Objective Structure information into categories Provide a level of generalization to define relationships between data Application: Backbone of any ontology

Overview Different approaches of acquiring conceptual hierarchies from text corpus. Various clustering techniques. Evaluation Related Work Conclusion

Machine Readable Dictionaries Entries: ‘a tiger is a mammal’, or ‘mammals such as tigers, lions or elephants’. exploit the regularity of dictionary entries. the head of the first NP - hypernym.

Example

Exception

Exception is-a (corolla, part)………..is a NOT VALID is-a (republican, member) ……….. is a NOT VALID is-a (corolla, flower)………..is a NOT VALID is-a (republican, political party)………..is a NOT VALID

Exception

Alshawis solution

Results using MRDs Dolan et al. - 87% of the hypernym relations extracted are correct Calzolari cites a precision of > 90% Alshawi - precision of 77%

Strengths And Weaknesses Correct, explicit knowledge Robust basis for ontology learning Weakness- domain independent

Lexico-Syntactic patterns Task: automatically learning hyponym relations from the corpora. 'Such injuries as bruises, wounds and broken bones' hyponym (bruise, injury) hyponym (wound, injury) hyponym (broken bone, injury)

Hearst patterns 'Such injuries as bruises, wounds and broken bones'

Requirements Occur frequently in many text genres. Accurately indicate the relation of interest. Be recognizable with little or no pre-encoded knowledge

Strengths And Weaknesses Identified easily and are accurate Weakness: patterns appear rarely is-a relation do not appear in Hearst style pattern

Distribution Similarity 'you shall know a word by the company it keeps’ [Firth, 1957]. semantic similarity of words – similarity of the contexts.

Using distribution similarity

Strengths And Weaknesses reasonable concept hierarchy. Weakness: Cluster tree lacks clear and formal interpretation Does not provide any intentional description of concepts Similarities may be accidental (sparse data)

Formal Concept Analysis (FCA)

FCA output

Similarity measures

Smoothing

Evaluation Semantic cotopy (SC). Taxonomy overlap (TO)

Evaluation Measure

100% Precision Recall

Low Recall

Low Precision

Results

Results

Results

Results

Strengths And Weaknesses FCA generates formal concepts Provides intentional description Weakness: Size of the lattice can get exponential in the size spurious clusters Finding appropriate labels for the cluster

Problems with Unsupervised Approaches to Clustering Data sparseness leads to spurious syntactic similarities Produced clusters can’t be appropriately labeled Unsupervised approaches are dependant upon calculating similarity of words on the basis of linguistic context

Guided Clustering Hypernyms directly used to guide clustering WordNet Hearst Agglomerative clustering Two terms are only clustered if there is a corresponding common hypernym according to an oracle Hearst—an approach matching lexico-syntactic patterns to find hypernyms

Similarity computation calculated by taking the cosine between corresponding context vectors of two terms

Similarity Computation Ten most similar terms of the tourism reference taxonomy

The Hypernym Oracle Three sources WordNet Hearst patterns matched in a corpus Hearst patterns matched in the World Wide Web Record hypernyms and amount of evidence found in support of hypernyms. How the oracle is constructed

WordNet Collect hypernyms found in any dominating synset containing term, t Include number of times the hypernym appears in a dominating synset

Hearst Patterns (Corpus) NP = noun phrase Record number of isa-relations found between two terms

Hearst Patterns (WWW) Download 100 Google abstracts for each concept and clue: Use clues to make Google queries and to search resulting abstracts for terms that have isa-relationships with the concept Again, record the number of relationships found between two terms (term and concept)

Evidence Total Evidence for Hypernyms: time: 4 vacation: 2 period: 2

Clustering Algorithm Input a list of terms Calculate the similarity between each pair of terms and sort from highest to lowest For each potential pair to be clustered consult the oracle.

Consulting the Oracle case 1 If term 1 is a hypernym of term 2 or vice-versa: Create appropriate subconcept relationship.

Consulting the Oracle case 2 Find the common hypernym for both terms with greatest evidence. If one term has already been classified: If a common hypernym even exists > means ‘hypernym of’ for my purposes t’ = classification h = common hypernym t’ = h h is a hypernym of t’ t’ is a hypernym of h

Consulting the Oracle case 3 Neither term has been classified: Each term becomes a subconcept of the common hypernym.

Consulting the Oracle case 4 The terms do not share a common hypernym: Set aside the terms for further processing.

r-matches For all unprocessed terms, check for r-matches (i.e. ‘credit card’ matches ‘international credit card’) unprocessed terms are those with no similar terms t1 r-matches t2 if t2 literally contains the string t1 in it

Further Processing If either term in a pair is already classified as t’, the other term is classified under t’ as well. Otherwise place both terms under the hypernym of either term with the most evidence. Any unclassified terms are added under the root concept. run these algorithms on those pairs that were set aside

Example

Evaluation Taxonomic overlap (TO) Sibling overlap (SO) ignore leaf nodes Sibling overlap (SO) measures quality of clusters All of these measures are compared to a handcrafted reference concept hierarchy Ignore leaf nodes, otherwise hierarchies with every concept directly subordinate to root node can be rated very high.

Evaluation Tourism domain: Finance domain: Lonely Planet Mecklenburg Reuters-21578 Caraballo – post-processing the hierarchy with hypernyms

Tourism Results—TO These are F-measures Caraballo’s method = Gold Standard: After agglomerative clustering: label each cluster based on most frequent hypernym (must be a hypernym of at least 2 members of the cluster; otherwise remove cluster)

Finance Results—TO These are F-measures Caraballo’s method = Gold Standard: After agglomerative clustering: label each cluster based on most frequent hypernym (must be a hypernym of at least 2 members of the cluster; otherwise remove cluster)

Tourism Results—SO These are F-measures Caraballo’s method = Gold Standard: After agglomerative clustering: label each cluster based on most frequent hypernym (must be a hypernym of at least 2 members of the cluster; otherwise remove cluster)

Finance Results—SO These are F-measures Caraballo’s method = Gold Standard: After agglomerative clustering: label each cluster based on most frequent hypernym (must be a hypernym of at least 2 members of the cluster; otherwise remove cluster)

Human Evaluation Scores 3 : correct 2: Almost correct 1: not completely wrong 0: wrong # = number of non-root taxonomic relations

Future Work Take word sense into consideration for the WordNet source.

Summary Hypernym guided agglomerative clustering works pretty good. Better than the “Golden Standard” Good human evaluation Provides labels for clusters No spurious similarities Faster than agglomerative clustering Gold standard = Caraballo faster because similarities are calculated between single elements and not between clusters

Learning from Heterogeneous Sources of Evidence Many ways to learn concept hierarchies Can we combine different paradigms? Any manual attempt to combine strategies would be ad hoc Use supervised learning to combine techniques

Determining relationships with machine learning Example: Determine if a pair of words has an “isa” relationship

Feature 1: Matching patterns in a corpus Given two terms t1 and t2 we record how many times a Hearst-pattern indicating an isa-relation between t1 and t2 is matched in the corpus Normalize by maximum number of Hearst patterns found for t1

Example This provided the best F-measure with a single-feature classifier

Feature 2: Matching patterns on the web Use the Google API to count the matches of a certain expression on the Web

Feature 3: Downloading webpages Allows for matching expressions with a more complex linguistic structure Assign functions to each of the Hearst patterns to be matched Use these “clues” to decide what pages to download Download 100 abstracts matching the query “such as conferences”

Example

Feature 4: WordNet – All senses Is there a hypernym relationship between t1 and t2? Can be more than one path from the synsets of t1 to the synsets of t2

Feature 5: WordNet – First sense Only consider the first sense of t1

Feature 6: “Head”- heuristic If t1 r-matches t2 we derive the relation isa(t2,t1) e.g. t1 = “conference” t2 = “international conference” isahead(“international conference”,”conference”)

Feature 7: Corpus-based subsumption t1 is a subclass of t2 if all the syntactic contexts in which t1 appears are also shared by t2

Feature 8: Document-based subsumption t1 is a subclass of term t2 if t2 appears in all documents in which t1 appears # of pages where t1 and t2 occur # of pages where t1 occurs

Example

Naïve Threshold Classifier Used as a baseline Classify an example as positive if the value of a given feature is above some threshold t For each feature, the threshold has been varied from 0 to 1 in steps of 0.01

Baseline Measures

Evaluation Classifiers Naïve Bayes Decision Tree Perceptron Multi-layer perceptron

Evaluation Strategies Undersampling Remove a number of majority class examples (non-isa examples) Oversampling Add additional examples to the minority class Varying the classification threshold Try different threshold values other than 0.5 Introducing a cost matrix Different penalties for different types of misclassification One Class SVMs Only considers positive examples 77

Results

Results (cont.)

Discussion The best results achieved with the one-class SVM (F = 32.96%) More than 10 points above the baseline classifier average (F = 21.28%) and maximum (F = 21%) strategies More than 14 points better than the best single-feature classifier (F = 18.84%) using the isawww feature Second best results obtained with a Multilayer Perceptron using oversampling or undersampling

Discussion Gain insight from finding which features were most used by classifiers Used this information to modify features and rerun experiments

Summary Using different approaches is useful Machine learning approaches outperform naïve averaging Unbalanced character of the dataset poses a problem SVMs (which are not affected by the imbalance) produce the best results This approach can show which features are the most reliable as predictors

Related Work Taxonomy Construction Taxonomy Refinement Lexico-syntactic patterns Clustering Linguistic approaches Taxonomy Refinement Taxonomy Extension

Lexico-syntactic patterns Hearst Iswanska et al. – added extra patterns Poesia et al. – anaphoric resolution Ammad et al. – applying to specific domains Etzioni et al. – patterns matched on the www Cederburg and Widdows – precision improved with Latent Semantic Analysis Others working on learning patterns automatically

Clustering Hindle Pereira et al. Caraballo group nouns semantically derive verb-subject and verb-object dependencies from a 6 million word sample of Associated Press news stories Pereira et al. top-down soft clustering algorithm with deterministic annealing words can appear in different clusters (multiple meanings of words) Caraballo bottom-up clustering approach to build a hierarchy of nouns uses conjunctive and appositive constructions for nouns derived from the Wall Street Journal Corpus

Clustering (cont.) The ASIUM System The Mo'K Workbench Grefenstette Gasperin et al. Reinberger et al. Lin et al. CobWeb Crouch et al. Haav Curran et al. Terascale Knowledge Acquisition

Linguistic Approaches Linguistic analysis exploited more directly rather than just for feature extraction OntoLT - use shallow parser to label parts of speech and grammatical relations (e.g. HeadNounToClass-ModToSubClass, which maps a common noun to a concept or class) OntoLearn - analyze multi-word terms compositionally with respect to an existing semantic resource (Word-Net) Morin et al. - tackle the problem of projecting semantic relations between single terms to multiple terms (e.g. project the isa-relation between apple and fruit to an isa-relation between apple juice and fruit juice)

Linguistic Approaches Sanchez and Moreno – download first n hits for a search word and process the neighborhood linguistically to determine candidate modifiers for the search term Sabou - inducing concept hierarchies for the purpose of modeling web services (applies methods not to full text, but to Java-documentation of web services)

Taxonomy Refinement Hearst and Schutze Widdows Madche, Pekar and Staab Alfonseca et aL

Taxonomy Extension Agirre et al. Faatz and Steinmetz Turney

Conclusions Compared different hierarchical clustering approaches with respect to: effectiveness speed traceability Set-theoretic approaches, as FCA, can outperform similarity-based approaches.

Conclusions Presented an algorithm for clustering guided by a hypernym oracle. More efficient than agglomerative clustering.

Conclusions Used machine learning techniques to effectively combine different approaches for learning taxonomic relations from text. A learned model indeed outperforms all single approaches. It also outperforms naïve combinations of them.

Open Issues Which similarity or weighting measure should be chosen Which features should be considered to represent a certain term Can features be aggregated to represent a term at a more abstract level How should we model polysemy of terms Can we automatically induce lexico-syntactic patterns (unsupervised!) What other approaches are there for combining different paradigms; and how can we compare these These are all issues very general to the problem of Concept Hierarchy Induction Obviously weight and similarity measure depend largely on the dataset. Cluster features themselves. lexico-syntactic patterns are like Hearst patterns

Questions