Download presentation
Presentation is loading. Please wait.
Published byShon Armstrong Modified over 9 years ago
1
A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto Nara Institute of Science and Technology EMNLP-CoNLL 2007 29 th June Prague, Czech
2
2 Background Named Entity Proper nouns (e.g. Shinzo Abe (Person), Prague (Location)), time/date expressions (e.g. June 29 (Date)) and numerical expressions (e.g. 10%) In many NLP applications (e.g. IE, QA), Named Entities play an important role Named Entity Recognition task (NER) Treated as sequential tagging problem Machine learning methods have been proposed Recall is usually low Large scale NE dictionary is useful for NER Semi-automatic methods to compile NE dictionaries have been demanded
3
3 Resource for NE dictionary construction Wikipedia Multi-lingual encyclopedia on the Web 382,613 gloss articles (as of June 20, 2007, Japanese) Gloss indices are composed by nouns or proper nouns HTML (Semi-structured text) Lists( ) and Tables( ) can be used as clues for NE type categorization Linked articles are glossed by anchor texts in articles Each article has one or more categories Wikipedia has useful information for NE categorization Can be considered as a suitable resource
4
4 Objective Extract Named Entities by assigning proper NE labels for gloss indices of Wikipedia Person Product Person Location Organization Natural Object
5
5 Use of Wikipedia features Features of Wikipedia articles Anchors of an article refer to the other related articles Anchors in list elements have dependencies each other => Make 3 assumptions about dependencies between anchors an example of a list structure Burt Bacharach…composer Dillard & Clark Carpenters Karen Carpenter ORGANIZATION PERSON VOCATION ORGANIZATION PERSON Assumption 1 : The latter element in a list item tends to be in an attribute relation to the former element Assumption 2 : The elements in the same itemization tends to be in the same NE category Assumption 3 : The nested element tends to be in a part-of relation to the upper element
6
6 Overview of our approach Focus on HTML list structure in Wikipedia Make 3 assumptions about dependencies between anchors Formalize NE categorization problem as labeling NE classes to anchors in lists Define 3 kinds of cliques (edges: Sibling, Cousin and Relative ) between anchors based on 3 assumptions Construct graphs based on 3 defined cliques CRFs for NE categorization in Wikipedia Define potential functions over 3 edges (and nodes) to provide conditional distribution over the graphs Estimate MAP label assignment over the graphs using Conditional Random Fields
7
7 Conditional Random Fields (CRFs) Conditional Random Fields [Lafferty 2001] Discriminative, Undirected Models Define conditional distribution p(y|x) Features Arbitrary features can be used Globally optimize on all possible label assignments Can deal with label dependencies by defining potential functions for cliques (2 or more nodes) x y1y1 y3y3 ynyn ・・・ y2y2
8
8 Use of dependencies for categorization NE categorization problem as labeling classes to anchors The edges of the constructed graphs corresponds to a particular dependency Estimate MAP label assignment over the constructed graphs using Conditional Random Fields Our formulation: Can extract anchors without gloss articles Dillard & Clark..country rock Carpenters Karen Carpenter : article exists : article does not exist
9
9 Clique definition based on HTML tree structure Sibling Cousin Relative Dillard & Clark country rock Carpenters Karen Carpenter Dillard & Clark…country rock Carpenters Karen Carpenter The latter element tends to be in an attribute or a concept of the former element Sibling The elements tend to have a common NE category (e.g. ORGANIZATION) Cousin The latter element tends to be in a constituent part of the former element Relative Use these 3 relations as cliques of CRFs
10
10 A graph constructed from 3 clique definitions Burt Bacharach…”On my own”…1986 Dillard & Clark Gene Clark Carpenters …”As Time Goes By”…2000 Karen Carpenter S : Sibling C : Cousin R : Relative R R C C CC SS SSC Estimate the MAP label assignment over the graph The latter element tends to be an attribute or a concept of the former element Sibling The elements tend to have a common attribute (e.g. ORGANIZATION) Cousin The latter element tends to be a constituent part of the former element Relative
11
11 Model : Potential function for nodes : Potential function for Sibling, Cousin and Relative cliques R R C C CC SS SS Constructed graphs include cycles : exact inference is computationally expensive ->Introduce Tree-based Reparameterization (TRP) [Wainwright 2003] for approximate inference
12
12 Experiments The aims of experiments are: 1. Compare graph-based approach (relational) to node-wise approach (independent) to investigate how the relational classification improves classification accuracy 2. Investigate the effect of defined cliques 3. Compare CRFs models to baseline models based on SVMs 4. Show the effectiveness of using marginal probability for filtering NE candidates.
13
13 Dataset Dataset Randomly sampled 2300 articles (Japanese version as of October 2005) Anchors in list elements( ) are hand- annotated with NE class label We used Extended Named Entity Hierarchy (Sekine et al. 2002) We reduced the number of classes to 13 from the original 200+ in order to avoid data sparseness Classification target :16136 (14285 of those are NEs) NE Class# of articles EVENT121 PERSON3315 UNIT15 LOCATION1480 FACILITY2449 TITLE42 ORGANIZATION991 VOCATION303 NATURAL_OBJECT1132 PRODUCT1664 NAME_OTHER24 TIMEX/NUMEX2749 OTHER1851 ALL16136
14
14 Experiments (CRFs) To investigate which clique type contributes classification accuracy: We construct models that constitute of possible combinations of defined cliques 8 models (SCR, SC, SR, CR, S, C, R, I) Classification is performed on each connected subgraph
15
15 Baseline : Support Vector Machines (SVMs) [ Vapnik 1998 ] We perform two models: I model: each anchor text is classified independently P model: anchor texts are ordered by linear position in HTML, and performed history-based classification (j-1th classification result is used in j-th classification) For multi-class classification : one-versus-rest Evaluation 5-fold cross validation, by F1-value Experimental settings (Baseline), Evaluation I model P model
16
16 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL 14285.7854.7855.7822.7862.7817.7845.7813.7805.7798.7790 no article 3898.5465.5484..5223.5495.5271.5475.5273.5249.5386.5278 SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs ALL : whole dataset, no article : anchors without articles
17
17 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL 14285.7854.7855.7822.7862.7817.7845.7813.7805.7798.7790 no article 3898.5465.5484..5223.5495.5271.5475.5273.5249.5386.5278 SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 1. Graph-based vs. Node-wise Performed McNemar paired test on labeling disagreements => difference was significant (p < 0.01) ALL : whole dataset, no article : anchors without articles
18
18 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL 14285.7854.7855.7822.7862.7817.7845.7813.7805.7798.7790 no article 3898.5465.5484..5223.5495.5271.5475.5273.5249.5386.5278 SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 2. Which clique is most contributed? => Cousin clique Cousin cliques provided the highest accuracy improvements compare to sibling and relative cliques ALL : whole dataset, no article : anchors without articles
19
19 Results (F1-value) CRFsSVMs #SCRSCSRCRSCRIPI ALL 14285.7854.7855.7822.7862.7817.7845.7813.7805.7798.7790 no article 3898.5465.5484..5223.5495.5271.5475.5273.5249.5386.5278 SC model C C CC SS SSC SCR model C C CC SS SS R R C SR model SS SS R R CR model C C CC R R C S model SS SS C model C C CC C R model R R I model CRFs P model I model SVMs 3. CRFs vs. SVMs Significance Test: McNemar paired test on labeling disagreements ALL : whole dataset, no article : anchors without articles
20
20 Filtering NE candidates using marginal probability Construct dictionaries from extracted NE candidates Methods with lower cost are desirable Extract only confident NE candidates -> Use of marginal probability that provided by CRFs Marginal probability probability of a particular label assignment for a node This can be regarded as “confidence” of a classifier yiyi
21
21 Precision-Recall Curve Precision-Recall curve obtained by thresholding the marginal probability of the MAP estimation in the CR model of CRFs At this point, recall value is about 0.57 and precision value is about 0.97 Using the proper thresholding of marginal probability, NE dictionary can be constructed with lower cost
22
22 Summary and future work Summary Proposed a method for categorizing NEs in Wikipedia Defined 3 kinds of cliques (Sibling, Cousin and Relative) over HTML tree Graph-based model achieved significant improvements compare to Node-wise model, and baseline methods (SVMs) NEs can be extracted with lower cost by exploiting marginal probability
23
23 Summary and Future work Future work Use fine-grained NE classes For many NLP applications (e.g. QA, IE), NE dictionary with fine grained label sets will be a useful resource Classification with statistical methods becomes difficult in case that the label set is large, because of the insufficient positive examples Incorporate hierarchical structure of label sets into our models (Hierarchical Classification) Previous work suggest that exploiting hierarchical structure of label sets improve classification accuracy
24
24 Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.