A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions

A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July.

Machine learning continued Image source:

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.

Conditional Random Fields

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Scalable Text Mining with Sparse Generative Models

Information Retrieval in Practice

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Webpage Understanding: an Integrated Approach

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

Why Categorize in Computer Vision ?. Why Use Categories? People love categories!

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

A Language Independent Method for Question Classification COLING 2004.

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Algorithmic Detection of Semantic Similarity WWW 2005.

A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.

Associative Hierarchical CRFs for Object Class Image Segmentation

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

John Lafferty Andrew McCallum Fernando Pereira

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

MSM 2013 Challenge: Annotowatch Stefan Dlugolinsky, Peter Krammer, Marek Ciglan, Michal Laclavik Institute of Informatics, Slovak Academy of Sciences.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Wen Chan 1 ， Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Automatically Labeled Data Generation for Large Scale Event Extraction

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Discriminative Probabilistic Models for Relational Data

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Extracting Why Text Segment from Web Based on Grammar-gram

Presentation transcript:

A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto Nara Institute of Science and Technology EMNLP-CoNLL th June Prague, Czech

2 Background  Named Entity  Proper nouns (e.g. Shinzo Abe (Person), Prague (Location)), time/date expressions (e.g. June 29 (Date)) and numerical expressions (e.g. 10%)  In many NLP applications (e.g. IE, QA), Named Entities play an important role  Named Entity Recognition task (NER)  Treated as sequential tagging problem  Machine learning methods have been proposed  Recall is usually low  Large scale NE dictionary is useful for NER Semi-automatic methods to compile NE dictionaries have been demanded

3 Resource for NE dictionary construction  Wikipedia  Multi-lingual encyclopedia on the Web  382,613 gloss articles (as of June 20, 2007, Japanese)  Gloss indices are composed by nouns or proper nouns  HTML (Semi-structured text) Lists( ) and Tables( ) can be used as clues for NE type categorization Linked articles are glossed by anchor texts in articles Each article has one or more categories Wikipedia has useful information for NE categorization Can be considered as a suitable resource

4 Objective  Extract Named Entities by assigning proper NE labels for gloss indices of Wikipedia Person Product Person Location Organization Natural Object

5 Use of Wikipedia features  Features of Wikipedia articles  Anchors of an article refer to the other related articles  Anchors in list elements have dependencies each other  => Make 3 assumptions about dependencies between anchors an example of a list structure Burt Bacharach…composer Dillard & Clark Carpenters Karen Carpenter ORGANIZATION PERSON VOCATION ORGANIZATION PERSON Assumption 1 : The latter element in a list item tends to be in an attribute relation to the former element Assumption 2 : The elements in the same itemization tends to be in the same NE category Assumption 3 : The nested element tends to be in a part-of relation to the upper element

6 Overview of our approach  Focus on HTML list structure in Wikipedia  Make 3 assumptions about dependencies between anchors  Formalize NE categorization problem as labeling NE classes to anchors in lists  Define 3 kinds of cliques (edges: Sibling, Cousin and Relative ) between anchors based on 3 assumptions  Construct graphs based on 3 defined cliques  CRFs for NE categorization in Wikipedia  Define potential functions over 3 edges (and nodes) to provide conditional distribution over the graphs  Estimate MAP label assignment over the graphs using Conditional Random Fields

7 Conditional Random Fields (CRFs)  Conditional Random Fields [Lafferty 2001]  Discriminative, Undirected Models  Define conditional distribution p(y|x)  Features Arbitrary features can be used Globally optimize on all possible label assignments Can deal with label dependencies by defining potential functions for cliques (2 or more nodes) x y1y1 y3y3 ynyn ･･･ y2y2

8 Use of dependencies for categorization  NE categorization problem as labeling classes to anchors  The edges of the constructed graphs corresponds to a particular dependency  Estimate MAP label assignment over the constructed graphs using Conditional Random Fields  Our formulation: Can extract anchors without gloss articles Dillard & Clark..country rock Carpenters Karen Carpenter : article exists : article does not exist

9 Clique definition based on HTML tree structure Sibling Cousin Relative Dillard & Clark country rock Carpenters Karen Carpenter Dillard & Clark…country rock Carpenters Karen Carpenter The latter element tends to be in an attribute or a concept of the former element Sibling The elements tend to have a common NE category (e.g. ORGANIZATION) Cousin The latter element tends to be in a constituent part of the former element Relative Use these 3 relations as cliques of CRFs

10 A graph constructed from 3 clique definitions Burt Bacharach…”On my own”…1986 Dillard & Clark Gene Clark Carpenters …”As Time Goes By”…2000 Karen Carpenter S : Sibling C : Cousin R : Relative R R C C CC SS SSC Estimate the MAP label assignment over the graph The latter element tends to be an attribute or a concept of the former element Sibling The elements tend to have a common attribute (e.g. ORGANIZATION) Cousin The latter element tends to be a constituent part of the former element Relative

11 Model : Potential function for nodes : Potential function for Sibling, Cousin and Relative cliques R R C C CC SS SS Constructed graphs include cycles : exact inference is computationally expensive ->Introduce Tree-based Reparameterization (TRP) [Wainwright 2003] for approximate inference

12 Experiments  The aims of experiments are: 1. Compare graph-based approach (relational) to node-wise approach (independent) to investigate how the relational classification improves classification accuracy 2. Investigate the effect of defined cliques 3. Compare CRFs models to baseline models based on SVMs 4. Show the effectiveness of using marginal probability for filtering NE candidates.

13 Dataset  Dataset  Randomly sampled 2300 articles (Japanese version as of October 2005)  Anchors in list elements( ) are hand- annotated with NE class label We used Extended Named Entity Hierarchy (Sekine et al. 2002) We reduced the number of classes to 13 from the original 200+ in order to avoid data sparseness  Classification target :16136 (14285 of those are NEs) NE Class# of articles EVENT121 PERSON3315 UNIT15 LOCATION1480 FACILITY2449 TITLE42 ORGANIZATION991 VOCATION303 NATURAL_OBJECT1132 PRODUCT1664 NAME_OTHER24 TIMEX/NUMEX2749 OTHER1851 ALL16136

14 Experiments (CRFs)  To investigate which clique type contributes classification accuracy:  We construct models that constitute of possible combinations of defined cliques  8 models (SCR, SC, SR, CR, S, C, R, I)  Classification is performed on each connected subgraph

15  Baseline ： Support Vector Machines (SVMs) ［ Vapnik 1998 ］  We perform two models: I model: each anchor text is classified independently P model: anchor texts are ordered by linear position in HTML, and performed history-based classification (j-1th classification result is used in j-th classification)  For multi-class classification : one-versus-rest  Evaluation  5-fold cross validation, by F1-value Experimental settings (Baseline), Evaluation I model P model

16 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL no article SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs ALL : whole dataset, no article : anchors without articles

17 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL no article SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 1. Graph-based vs. Node-wise Performed McNemar paired test on labeling disagreements => difference was significant (p < 0.01) ALL : whole dataset, no article : anchors without articles

18 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL no article SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 2. Which clique is most contributed? => Cousin clique Cousin cliques provided the highest accuracy improvements compare to sibling and relative cliques ALL : whole dataset, no article : anchors without articles

19 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL no article SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 3. CRFs vs. SVMs Significance Test: McNemar paired test on labeling disagreements ALL : whole dataset, no article : anchors without articles

20 Filtering NE candidates using marginal probability  Construct dictionaries from extracted NE candidates  Methods with lower cost are desirable  Extract only confident NE candidates -> Use of marginal probability that provided by CRFs  Marginal probability  probability of a particular label assignment for a node  This can be regarded as “confidence” of a classifier yiyi

21 Precision-Recall Curve Precision-Recall curve obtained by thresholding the marginal probability of the MAP estimation in the CR model of CRFs At this point, recall value is about 0.57 and precision value is about 0.97 Using the proper thresholding of marginal probability, NE dictionary can be constructed with lower cost

22 Summary and future work  Summary  Proposed a method for categorizing NEs in Wikipedia  Defined 3 kinds of cliques (Sibling, Cousin and Relative) over HTML tree  Graph-based model achieved significant improvements compare to Node-wise model, and baseline methods (SVMs)  NEs can be extracted with lower cost by exploiting marginal probability

23 Summary and Future work  Future work  Use fine-grained NE classes For many NLP applications (e.g. QA, IE), NE dictionary with fine grained label sets will be a useful resource Classification with statistical methods becomes difficult in case that the label set is large, because of the insufficient positive examples  Incorporate hierarchical structure of label sets into our models (Hierarchical Classification) Previous work suggest that exploiting hierarchical structure of label sets improve classification accuracy

24  Thank you.