AUTOMATIC GLOSS FINDING for a Knowledge Base using Ontological Constraints Bhavana Dalvi (PhD Student, LTI) Work done with: Prof. William Cohen, CMU Prof.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Albert Gatt Corpora and Statistical Methods Lecture 13.

Unsupervised Learning

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,

Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.

CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

A Probabilistic Framework for Semi-Supervised Clustering

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Visual Recognition Tutorial

Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.

Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Crash Course on Machine Learning

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Text Classification, Active/Interactive learning.

Collectively Representing Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen and Jamie Callan Language Technologies Institute, Carnegie.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

Exploratory Learning Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School Of Computer.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

EXPLORATORY LEARNING Semi-supervised Learning in the presence of unanticipated classes Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

Semi-Supervised Learning William Cohen. Outline The general idea and an example (NELL) Some types of SSL – Margin-based: transductive SVM Logistic regression.

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Einat Minkov University of Haifa, Israel CL course, U

Semi-Supervised Clustering

NELL Knowledge Base of Verbs

A Brief Introduction to Distant Supervision

Constrained Clustering -Semi Supervised Clustering-

10701 / Machine Learning.

Vincent Granville, Ph.D. Co-Founder, DSC

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

DBpedia 2014 Liang Zheng 9.22.

Text Categorization Berlin Chen 2003 Reference:

Presentation transcript:

AUTOMATIC GLOSS FINDING for a Knowledge Base using Ontological Constraints Bhavana Dalvi (PhD Student, LTI) Work done with: Prof. William Cohen, CMU Prof. Einat Mikov, University of Haifa Prof. Partha Talukdar, IISC Bangalore 1

Motivation 2

Need for gloss finding  KBs are useful for many NLP tasks: E.g. Question answering  Lot of research in fact extraction to populate KBs  Glosses can help further in applications like - Word/Entity sense disambiguation - Information retrieval  Automatically constructed KBs lack glosses e.g. NELL, YAGO 3

Example: Gloss finding Class constraints:  Inclusion: Every entity that is of type “Fruit” is also of type “Food”.  Mutual Exclusion: If an entity is of type “Food” then it cannot be of type “Organization” 4

Example: Gloss finding 5

6

7 Knowledge Bases: NELL / Freebase / YAGO Candidate glosses: DBPedia abstracts/ Wiktionary definitions

Gloss Finding 8

Problem Definition InputExample KB classesFood, Fruits, Company … Ontological constraints: Subset, Mutex Entities ‘E’ belonging to KB categoriesBanana, Microsoft Lexical strings ‘L’ that refer to entities ‘E’E.g. ‘MS’, ‘microsoft inc’ Candidate glossesE.g. G3: Apple, formerly Apple Computer Inc., is an American multinational corporation headquartered in Cupertino … Output: Matching candidate glosses to entities in the KB E.g. (Apple, G3)  Company:Apple 9

Can we use existing techniques?  Problem: Match potential glosses to appropriate entities in the KB.  Entity linking: Assume existence of glosses on KB side  Input KB does not have glosses  Chicken & egg problem  Ontology alignment: Both ends being matched are structured databases  Asymmetric problem: Structured KB on one side, without glosses Candidate glosses contain text but no structure 10

Proposed Gloss Finding Procedure  Decide head-NP for a gloss: NP being defined G3: Apple, formerly Apple Computer Inc., is an American multinational corporation headquartered in Cupertino …  Select candidate glosses for which string match (head-NP, KB entity)  For each gloss  a set of candidate KB entities (Apple, G3)  (Fruit:Apple, Company:Apple)  Classify the head-NP into KB classes using ontological constraints (Apple, G3)  Company  Choose the KB entity match based on chosen KB category. (Apple, G3)  Company:Apple 11

Building Classifiers 12

Training classifiers for KB Categories Train: Unambiguous glosses Test: Ambiguous glosses 13

Assumptions  If a gloss has only one candidate entity matching in a KB, then it is correct  i.e. we assume that KB is always correct and complete in terms of senses.  Assumption holds for 81% for NELL dataset  Given the category, a mention is unambiguous [Suchanek WWW’07, Nakashole ACL’13]  i.e. we can differentiate between entities of different category but not within a category. 14

Methods  Baselines  SVM Learning Train binary classifiers using unambiguous glosses Predict categories for ambiguous glosses  Label propagation PIDGIN [Wijaya et al. CIKM’13]: Graph-based label propagation method.  GLOFIN: semi-supervised EM + use of ontological constraints. 15

Proposed Method: GLOFIN Initialize model with few seeds per class Iterate till convergence (Data likelihood) E step: Predict labels for unlabeled points For each unlabeled datapoint  Find P(Class | datapoint) for all classes  Assign a consistent bit vector of labels in accordance with ontological constraints M step: Recompute model parameters using seeds + predicted labels for unlabeled points 16

Proposed Method: GLOFIN Initialize model with few seeds per class Iterate till convergence (Data likelihood) E step: Predict labels for unlabeled points For each unlabeled datapoint  Find P(Class | datapoint) for all classes  Assign a consistent bit vector of labels in accordance with ontological constraints M step: Recompute model parameters using seeds + predicted labels for unlabeled points 17

Estimating class parameters and assignment probabilities  Naïve Bayes Independent multinomial distributions per word  K-Means Cosine similarity between centroid and datapoint  von-Mises Fisher Data distributed on a unit hypersphere 18

Proposed Method: GLOFIN Initialize model with few seeds per class Iterate till convergence (Data likelihood) E step: Predict labels for unlabeled points For each unlabeled datapoint  Find P(Class | datapoint) for all classes  Assign a consistent bit vector of labels in accordance with ontological constraints M step: Recompute model parameters using seeds + predicted labels for unlabeled points 19

Mixed Integer Linear Program Input: P(C j | X i ), Class constraints: Subset, Mutex Output: Consistent bit vector y ji for X i Max { likelihood of assignment – constraint violation penalty } 20

Proposed Method: GLOFIN Initialize model with few seeds per class Iterate till convergence (Data likelihood) E step: Predict labels for unlabeled points For each unlabeled datapoint  Find P(Class | datapoint) for all classes  Assign a consistent bit vector of labels in accordance with ontological constraints M step: Recompute model parameters using seeds + predicted labels for unlabeled points 21

Experiments 22

Candidate glosses  DBPedia is a database derived from Wikipedia  We use short abstracts (definitions upto 500 characters, from Wikipedia page)  E.g. McGill University is a research university located in Montreal Quebec Canada Founded in 1821 during the British colonial era the university bears the name of James McGill a prominent Montreal merchant from Glasgow Scotland and alumnus of Glasgow University whose bequest formed the beginning of the university. 23

Knowledge bases 24

GLOFIN vs. SVM & Label propagation Freebase Dataset: Performance on ambiguous glosses 25

GLOFIN vs. SVM & Label propagation NELL Dataset: Performance on ambiguous glosses 26

Are the datasets close to real world?  Large fraction of data used for training 80% of NELL 90% of Freebase  In real world scenarios, amount of training data might be a small fraction of the dataset.  We simulate this by using 10% of unambiguous glosses for training 27

Small amount of training data Freebase Dataset: Performance on ambiguous glosses 28

Small amount of training data NELL Dataset: Performance on ambiguous glosses 29

Compare variants of GLOFIN Freebase Dataset 30

Compare variants of GLOFIN NELL Dataset 31

Some more experiments … 32  Evaluating quality of automatically acquired seeds  Manually creating gold standard for NELL dataset  Different ways of scaling GLOFIN  NELL to Freebase mappings via common glosses

And Future Work ….. Conclusions 33

Conclusions  Completely unsupervised method for gloss finding - using unambiguous matches as training data - hierarchical classification instead of entity linking  Our proposed method GLOFIN: GLOFIN ≥ Label Propagation ≥ SVM  Variants of Hierarchical GLOFIN Naïve Bayes ≥ K-Means, von-Mises Fisher  Ontological constraints help for all GLOFIN variants Hierarchical GLOFIN ≥ Flat GLOFIN  In future, we will like to add new entities to the KB. 34

head-NPGloss Candidate NELL entities Entity selected by GLOFIN McGill_UniversityMcGill University is a research university located in Montreal Quebec Canada Founded in 1821 during the British colonial era the university bears the name of James McGill a prominent … University:E, Sports_team:E University:E Kingston_upon_ Hull Kingston upon Hull frequently referred to as Hull is a city and unitary authority area in the ceremonial county of the East Riding of Yorkshire England It stands on the River Hull at its junction with … City:E, Visual_Artist:E City:E Robert_SoutheyRobert Southey was an English poet of the Romantic school one of the so called Lake Poets and Poet Laureate for 30 years from 1813 to his death in 1843 Although his fame has been long eclipsed by that … Person_Europe:E, Person_Africa:E, Politician_USA:E Person_Europe:E 35

Thank You Questions? 36

Extra Slides 37

Comparing of GLOFIN Approximations 38

Eval: quality of seeds for NELL KB  Noisy seeds: Only 81% leaf category assignments are correct  Hierarchical labeling can help: 94% higher level category labels are correct 39

Creating gold standard for NELL  Gold standard for evaluation on ambiguous glosses  For most glosses, precise category is part of NELL 40

NELL – Freebase mappings via common glosses 41

Pros and Cons of GLOFIN  Generative EM framework that can build on SSL methods: NBayes, K-Means, VMF  Can label unseen datapoints once models are learnt.  Assumption: Input KB is complete and accurate.  All experiments are done in transductive setting: need to extend for missing entities and categories in the KB. AdvantagesLimitations 42

Future work …  Adding new entities to existing KB categories  KBs are usually incomplete w.r.t coverage of entities.  GLOFIN: Classifies mentions into categories  Introducing new clusters of entities: missing categories in the KB  Extensions similar to Exploratory EM [Dalvi et al. ECML’13]  New categories: entities belonging to them, along with glosses for those entities 43