Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
ECG Signal processing (2)
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Minimum Redundancy and Maximum Relevance Feature Selection
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Reduced Support Vector Machine
SVM Active Learning with Application to Image Retrieval
Active Learning with Support Vector Machines
Support Vector Machine (SVM) Classification
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Support Vector Machines
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Scalable Text Mining with Sparse Generative Models
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Efficient Model Selection for Support Vector Machines
Active Learning for Class Imbalance Problem
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
by B. Zadrozny and C. Elkan
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
The Role of Metadata in Machine Learning for TAR Amanda Jones Marzieh Bazrafshan Fernando Delgado Tania Lihatsh Tami Schuyler
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Linear Discrimination
Physics-guided machine learning for milling stability:
SVMs for Document Ranking
Presentation transcript:

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering National Taiwan Normal University Main Reference: 1.Z. Xu, C. Hogan, R. Bauer, Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm, ICDM Workshops, pp , ✩ 2012/8/ ICDMW

I. INTRODUCTION 1.Effective active learning algorithms reduce human labeling effort, as well as produce better learning results. 2.However, efficient active learning algorithms for real world large scale data have not yet been well addressed either in the machine learning community or practical industrial applications. 3.The existing batch mode active learning algorithms, however, cannot exceed the computational bottleneck of the greedy algorithm, which takes O(KN), where K is the number of examples in the batch, and N is the total number of unlabeled examples in the collection. We prove the selection objective function is a submodular function, which exhibits the diminishing returns property: labeling a datum when we have only a small amount of labeled data yields more learnable information for the underlying classifier, than labeling it when we already have a large amount of labeled data. 2

II. RELATED WORKS 1.Several active learning algorithms [3], [4] have been proposed to improve the support vector machine classifier. Coincidentally, these active learning approaches use the same selection scheme: Choosing the next unlabeled data close to the current decision hyperplane in the kernel space. 2.Brinker [5] incorporated diversity measure in the batch mode support vector machine active learning problem. This active learning algorithm employs a scoring function to select unlabeled data, which combines distance to the decision boundary [4] and diversity (that is distance to the already selected data). 3.Batch mode active learning considering diversity has also been applied to relevance feedback in information retrieval. Common grounds 1.They explicitly or implicitly model diversity of the selected dataset. 2.They solve the NP hard combinatorial optimization problem with a greedy algorithm. 3

Submodular Objective Function In the batch mode active learning problem, we aim to select a subset A of unlabeled examples from all the unlabeled examples N to acquire labels. We formulate the batch mode active learning problem as a constraint optimization problem: select the set of data which maximizes the reward objective function, while within the defined cost constraint. R(A) : the reward function of a candidate unlabeled set A. C(A) : The cost of labeling A B : the cost constraint The informativeness of unlabeled examples to the classifier is well captured by their uncertainty and diversity. 4 (1)

Submodular Objective Function Uncertainty is a widely used selection criterion for pool based active learning algorithms. The uncertainty could be measured by different heuristics, including uncertainty sampling in the logistic regression classifier [9], query by committee in the Naïve Bayes classifier [10], version space reduction in the support vector machine classifier[4]. we only focus on support vector machine classifiers in this paper. Among them, the MaxMin margin and ratio margin algorithms need to retrain the SVM classifier, which requires significant computational overhead. So we use the simple margin algorithm, which measures the uncertainty of an unlabeled example by its distance to the current separating hyperplane. 5

Submodular Objective Function 6 (2)

Submodular Objective Function 7

More formally, based on the proof above, we obtain the following Theorem. 8

Lazy Active Learning Algorithm The greedy algorithm selects the first example with the largest uncertainty, then calculates the diversity of the remaining examples and selects the example with the largest combination score. However, The total complexity of the greedy algorithm is O(KN), when we select a subset of K examples from a pool of N candidate examples. 9

Lazy Active Learning Algorithm We further explore the submodularity of the objective function to reduce the number of pairwise distance calculations. We first find an example with the largest marginal reward. If the distances between this example and any of the previously selected examples have not been calculated, we update its diversity by calculating these distances. If the updated marginal reward of this example is still the largest, we select this example. 10

Lazy Active Learning Algorithm 11

IV. EXPERIMENTS The algorithm behaves differently for different datasets. Thus, we selected 3 datasets in our experiments to cover a wide range of properties. 1.we consider the binary text classification task CCAT from the Reuters RCV1 collection [13]. This task has an almost balanced class ratio. 2.we use the task C11 category from the RCV1 collection [13], since it has an unbalanced class ratio. 3.we include the topic 103 in the TREC legal 2008 interactive task [1]. This task models a real world e-discovery task, which aims to find relevant information with respect to a legal subject matter. Thus, the final TREC legal dataset we are using contains 6421 labeled documents, among which 3440 documents are non-relevant, and 2981 documents are relevant. We randomly sample 3000 documents as test set, and use the remaining 3421 documents as training set. We use the three text only fields ( text body, Title, brand). 12

IV. EXPERIMENTS 13

How Effective is the Active Learning Algorithm ? we compare the running time of our lazy active learning algorithm with two versions of greedy algorithms: greedy algorithm using an inverted index and greedy algorithm using pairwise cosine distance calculation. 14

How Effective is the Active Learning Algorithm ? We fixed the number of feedback documents at 100, and varied the total number of training documents in the pool. For all these three datasets, we use 12.5%, 25%, 50%, and 100% of the training data as the sampling pool, and compare the speed of lazy active learning, inverted index greedy active learning, pairwise greedy active learning. 15

V. CONCLUSIONS To summarize, the major contributions of this paper are: 1.We propose a generalized object function for batch mode active learning, which is shown to be a submodular function. Based on the submodularity of the objective function, we propose an extremely fast algorithm, lazy active learning algorithm. 2.We extensively evaluate our new approach on several real world text classification tasks in terms of classification accuracy and computational efficiency. 16