A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Slides:

Advertisements

Similar presentations

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Improved TF-IDF Ranker

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.

An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto.

Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Ling 570 Day 17: Named Entity Recognition Chunking.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.

LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.

Some questions -What is metadata? -Data about data.

Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Text Clustering Hongning Wang

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Statistical NLP: Lecture 9

Introduction Task: extracting relational facts from text

Text Categorization Berlin Chen 2003 Reference:

Extracting Why Text Segment from Web Based on Grammar-gram

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Introduction In MUC-7, systems need to identify PERSON, ORGANIZATION, and ARTIFACT entities as well as LOCATION names We may envisage different scenario contexts in which other types of entity are of interest Thus far, most techniques for NER are scenario specific and do not apply well in different scenarios NERO is a system for NER that is intended to apply in any scenario with no knowledge of the types of entity pertinent to that scenario

The method for NER in the Open Domain Automatic derivation of a document specific typology of Named Entities Identification of NEs in documents Classification of NEs in line with the derived typology

Typology Derivation (1/4) Collection of Hypernyms Clustering Hypernyms Labelling Clusters

Typology Derivation (2/4) Collection of Hypernyms Follows Hearst’s method for identifying hyponyms from corpora (Hearst 92) Where X is a named entity, Y is considered a potential hypernym 1.Y such as X 2.Y like X 3.X or other Y 4.X and other Y For better recall, the method used here treats the Internet as the source of hypernyms, and not the document in which the NE appears

Typology Derivation (3/4) Clustering Hypernyms Hard bottom-up hierarchical clustering algorithm Compares all pairs of clusters and merges the most similar pair according to a group-average similarity function. Based on Learning Accuracy (Hahn and Schnattinger 98) applied as a method to obtain taxonomic similarity between Hyp1 and Hyp2 Where C is the most specific node that subsumes Hyp1 and Hyp2 ROOT is the most general node in the ontology d(X,Y) is the shortest distance in nodes from X to Y TS = d(ROOT, C) / d(Hyp1, C) + d(Hyp2, C) + d(ROOT, C) The clustering algorithm stops when the similarity level of the most similar pair drops below a threshold (0.5)

Typology Derivation (4/4) Labelling Clusters Here, the hypernyms in a cluster are taken to be words For all senses of all words in a cluster, increasingly general hypernyms are listed The most specific hypernym common to all words in a cluster is used to label the cluster as a whole. Each label is assigned a height, H, which is the mean depth of the words in the cluster measured from the label

Identification of Named Entities (Capitalised Word Normalisation) (1/2) NEs are signalled by capitalised words in documents, but capitalisation can also be context dependent (section headings, new sentences, emphasis, etc.) Require a method to identify whether or not a given string is normally capitalised in all contexts, or whether capitalisation is context dependent. This is referred to as normalisation (Mikheev 00)

Identification of Named Entities (Capitalised Word Normalisation) (2/2) In NERO, the normalisation method exploits memory based learning (Daelemans et al. 01). Vectors include: Positional information Proportion of times the word is capitalised in the document Proportion of times the word is sentence initial in the document and in the BNC (Burnard 95) Whether the instance appears in a gazetteer of person names or, following (Mikheev 01) in a list of the top 100 most frequent sentence initial words in the BNC The PoS of the word and the surrounding words Agreement of the word’s grammatical number with the following verb to be or to have 10-fold cross validation showed P R in classifying NORMALLY_CAPITALISED words and P 100 R in classifying NOT_NORMALLY_CAPITALISED words

Classification of Named Entities g(c i, t j ) is a function that maps to 1 when c i is identical to t j, and maps to 0 otherwise Assumption: Normally capitalised sequences of words are NEs. Each NE can be associated with a capitalised sequence that covers its position in the document T is a type that subsumes hypernyms { t 1, …, t m }, H is inversely proportional to the height of T, w is a word that matches, or is a substring of a capitalised sequence C. C has hypernyms ( c 1,…, c n ) and each hypernym has a frequency fc i The likelihood that w is classified as T is given by:

The Test Corpus DOC#Words#NEsGenre a Legal j Psych. j Art k Lit. k Lit. k Lit. k Lit. n Lit. win Tech. TOT

Evaluation (1/4) Normalisation Typology derivation Named entity classification

Evaluation (2/4) Evaluating Normalisation NORMALLY_CAPITALISED: P R NOT_NORMALLY_CAPITALISED: P 100 R Poor performance due to: Use of direct speech Incomplete documents

Evaluation (3/4) Evaluating Typology Derivation Clustering algorithm produces a large number of clusters, but the NE classification means that only a small subset appear in the final tagged file. The quality of this subset of clusters was assessed. In this evaluation, a TP is either a cluster that correlates well with a human derived type (perfect match), or a cluster in which more than half the hypernyms are good indicators of a human derived type (partial matches). P: R: This sets a performance limit on NE classification

Evaluation (4/4) Evaluating NE Classification DOCNORM ACC#NEsIDdCORRECTINCORRECTUNCERTAIN a j j k k k k n win TOT (41.41)397 (48.35)84 (10.23)

Conclusion (1/2) Overview Presented a framework for NER in the open domain Pros: largely unsupervised, not reliant on large quantities of annotated data, domain independent Cons: hypernym collection vulnerable to data sparseness, clustering process vulnerable to word sense ambiguity performance is poor

Conclusion (2/2) Future Work Alternative approaches to hypernym collection (IR, corpus-based, Internet-based) Word sense disambiguation Alternative clustering algorithms (e.g. soft algorithms, alternative similarity thresholds) Alternative classification rules (weighting on the basis of WSD, or cluster size, as well as height of the label) Further evaluation – comparison with systems used in MUC-7 in which a pre-defined typology is known

Related Work IE – (MUC-7, Chinchor 98; IREX, Sekine 99) Recent work in NER (MUC-7 typology) – (Zhou and Su 02; shared task at CoNLL-03) Extended set of NE types – (Sekine et al. 02) Normalisation – (Mikheev 00; Collins 02)

Thank you!