Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Clustering Categorical Data The Case of Quran Verses

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

K Means Clustering , Nearest Cluster and Gaussian Mixture

Active Learning to Classify

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.

Cross-Domain Bootstrapping for Named Entity Recognition Ang Sun Ralph Grishman New York University July 28, 2011 Beijing, EOS, SIGIR 2011 NYU.

Face detection Many slides adapted from P. Viola.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.

Reduced Support Vector Machine

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.

Introduction to Machine Learning Approach Lecture 5.

Named Entity Recognition and the Stanford NER Software Jenny Rose Finkel Stanford University March 9, 2007.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Evaluating Performance for Data Mining Techniques

Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.

CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Active Learning for Class Imbalance Problem

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Unsupervised Learning. Supervised learning vs. unsupervised learning.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Selective Sampling for Information Extraction with a Committee of Classifiers Evaluating Machine Learning for Information Extraction, Track 2 Ben Hachey,

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Semi-Supervised Clustering

Challenges in Creating an Automated Protein Structure Metaserver

Leverage Consensus Partition for Domain-Specific Entity Coreference

Presentation transcript:

Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004

Introduction Active learning is based on the assumption that a small number of annotated examples and a large number of unannotated examples are available. Different from supervised learning in which the entire corpus are labeled manually, active learning is to select the most useful example for labeling and add the labeled example to training set to retrain model. This procedure is repeated until the model achieves a certain level of performance.

Introduction Many existing work in the area focus on two approaches: –certainty-based methods (Thompson et al. 1999; Tang et al. 2002; Schohn and Cohn 2000; Tong and Koller 2000; Brinker 2003) and –committee-based methods (McCallum and Nigam 1998; Engelson and Dagan 1999; Ngai and Yarowsky 2000) to select the most informative examples for which the current model are most uncertain.

Introduction We target to minimize the human annotation efforts yet still reaching the same level of performance as a supervised learning approach. For this purpose, we make a more comprehensive consideration on the contribution of individual examples, and more importantly maximizing the contribution of a batch based on three criteria: informativeness, representativeness and diversity.

Multi-criteria for NER Active Learning In NER, SVM is to classify a word into –positive class “ 1 ” : the word is a part of an entity, or –negative class “ -1 ” : the word is not a part of an entity. Each word in SVM is represented as a high-dimensional feature vector including surface word information, orthographic features, POS feature and semantic trigger features (Shen et al. 2003). The semantic trigger features consist of some special head nouns for an entity class which is supplied by users. Furthermore, a window (size = 7), which represents the local context of the target word w, is also used to classify w.

Informativeness Measure for Word An example may be informative for the learner if the distance of its feature vector to the hyperplane is less than that of the support vectors to the hyperplane (equal to 1). The distance of a word ’ s feature vector to the hyperplane is computed as follows: The example with minimal Dist, which indicates that it comes closest to the hyperplane in feature space, is considered most informative for current model.

Informativeness Measure for Named Entity Let NE = w 1 … w N in which w i is the feature vector of the i th word of NE. Three scoring functions: –Info_Avg: /N –Info_Min: –Info_S/N:

Representativeness The representativeness of an example can be evaluated based on how many examples there are similar or near to it. Similarity Measure between Words we adapt the cosine-similarity measure to SVM as follows:

Representativeness Similarity Measure between Named Entities We employ the dynamic time warping (DTW) algorithm (Rabiner et al. 1978) to find an optimal alignment between the words in the sequences which maximize the accumulated similarity degree between the sequences. Let NE1 = w 11 w 12 … w 1n … w 1N, (n = 1, …, N) and NE2 = w 21 w 22 … w 2m … w 2M, (m = 1, …, M) denote two word sequences to be matched. Certainly, the overall similarity measure Sim* has to be normalized

Representativeness Measure for Named Entity Given a set of machine-annotated named entities NESet = {NE 1, …, NE N }, the representativeness of a named entity NE i in NESet is quantified by If NEi has the largest density among all the entities in NESet, it can be regarded as the centroid of NESet and also the most representative examples in NESet.

Diversity Diversity criterion is to maximize the training utility of a batch. We prefer the batch in which the examples have high variance to each other. For example, given the batch size 5, we try not to select five repetitious examples at a time. We propose two methods: local and global, to make the examples diverse enough in a batch.

Diversity - Global Consideration For a global consideration, we cluster all named entities in NESet based on the similarity measure. –We employ a K-means clustering algorithm (Jelinek 1997) The named entities in the same cluster may be considered similar to each other, so we will select the named entities from different clusters at one time.

Diversity - Local Consideration When selecting a machine-annotated named entity, we compare it with all previously selected named entities in the current batch. If the similarity between them is above a threshold ß, this example cannot be allowed to add into the batch. The order of selecting examples is based on some measure, such as informativeness measure, representativeness measure or their combination. In this way, we avoid selecting too similar examples (similarity value ≥ ß ) in a batch. The threshold ß may be the average similarity between the examples in NESet.

Sample Selection strategies Sample Selection Strategy 1 Given: NESet = {NE 1, …, NE N } BatchSet with the maximal size K. INTERSet with the maximal size M Steps : BatchSet = Ø INTERSet = Ø Select M entities with most Info score from NESet to INTERSet. Cluster the entities in INTERSet into K clusters Add the centroid entity of each cluster to BatchSet

Sample Selection strategies Sample Selection Strategy 2

Experiment Settings In order to evaluate the effectiveness of our selection strategies, we apply them to recognize protein (PRT) names in biomedical domain using GENIA corpus V1.1 (Ohta et al. 2002) and person (PER), location (LOC), organization (ORG) names in newswire domain using MUC-6 corpus. The batch size K = 50 in GENIA and 10 in MUC-6.

Overall Result in GENIA and MUC-6 Table 2 shows the amount of training data needed to achieve the performance of supervised learning using various selection methods, viz. Random, Strategy1 and Strategy2.

Effectiveness of Informativeness-based Selection Method

Effectiveness of Two Sample Selection Strategies Table 4: Comparisons of training data sizes for the multi- criteria-based selection strategies and the informativeness-criterion- based selection (Info_Min) to achieve the same performance level as the supervised learning. Figure 6: Active learning curves: effectiveness of the two multi-criteria-based selection strategies comparing with the informativeness-criterion-based selection (Info_Min). Info_MinStrategy1Strategy2 51.9K40K31K

Conclusions We propose a multi-criteria-based approach to select examples based on their informativeness, representativeness and diversity, which are incorporated all together by two strategies (local and global). The labeling cost can be significantly reduced by at least 80% comparing with the supervised learning. Furthermore, we will study how to overcome the limitation of the strategy 1 discussed in Section 3 by using more effective clustering algorithm. Another interesting work is to study when to stop active learning.