Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Slides:

Advertisements

Similar presentations

Incentivize Crowd Labeling under Budget Constraint

Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

On-line learning and Boosting

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Learning Algorithm Evaluation

Boosting Approach to ML

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Remote Sensing Laboratory Dept. of Information Engineering and Computer Science University of Trento Via Sommarive, 14, I Povo, Trento, Italy Remote.

EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()

Machine Learning CS 165B Spring 2012

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Active Learning for Class Imbalance Problem

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Experimental Evaluation of Learning Algorithms Part 1.

Benk Erika Kelemen Zsolt

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

Introducing the Separability Matrix for ECOC coding

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Classification Ensemble Methods 1

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.

Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Game Theory Just last week:

Sofus A. Macskassy Fetch Technologies

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Zhipeng (Patrick) Luo December 6th, 2016

Transfer Learning in Astronomy: A New Machine Learning Paradigm

Yun-FuLiu Jing-MingGuo Che-HaoChang

Introductory Seminar on Research: Fall 2017

COMP61011 : Machine Learning Ensemble Models

Importance Weighted Active Learning

Boosting Nearest-Neighbor Classifier for Character Recognition

Data Mining Practical Machine Learning Tools and Techniques

Discriminative Frequent Pattern Analysis for Effective Classification

Classification Breakdown

Presentation transcript:

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1

Outline  Introduction  Motivation  Contributions  Methodologies  Theory Results  Experiments  Conclusion 2

Introduction  Binary Classification  Learn a classifier based on a set of labeled instances  Predict the class of an unobserved instance based on the classifier 3

Introduction  Question: how to obtain such a training dataset ?  Sampling and labeling!  It takes time and effort to label an instance.  Because of the limitation on the labeling budget, we expect to get a high- quality dataset with a dedicated sampling strategy. 4

Introduction  Random Sampling:  The unlabeled instances are observed sequentially  Sample every observed instance for labeling 5

Introduction  Selective Sampling:  The data can be observed sequentially  Sample each instance for labeling with probability 6

Introduction  What is the advantage of a classification with selective sampling ?  It saves the budget for labeling instances.  Compared with random sampling, the label complexity is much lower to achieved the same accuracy based on the selective sampling. 7

Introduction

Introduction  We aims at learning a classifier by selectively sampling instances and labeling them with probabilistic labels

Motivation  In many real scenarios, probabilistic labels are available.  Crowdsourcing  Medical Diagnosis  Pattern Recognition  Natural Language Processing 10

Motivation  Crowdsourcing:  The labelers may disagree with each other so a determinant label is not accessible but a probabilistic label is available for an instance.  Medical Diagnosis:  The labels in a medical diagnosis are normally not deterministic. The domain experts (e.g., a doctor) can give a probability that a patient suffers from some diseases.  Pattern Recognition:  It is sometimes hard to label an image with low resolution (e.g., an astronomical image). 11

Contributions  We propose a sampling strategy for labeling instances with probabilistic labels selectively  We display and prove an upper bound on the label complexity of our method in the setting probabilistic labels.  We show the prior performance of our proposed method in the experiments.  Significance of our work: It gives an example of how we can theoretically analyze the learning problem with probabilistic labels. 12

Methodologies  Importance Weight Sampling Strategy (for each single round):  Compute a weight ([0,1]) of a newly observed unlabeled instance;  Flip a coin based on the weight value and determine whether to label or not.  If we determine to label this instance, then add the newly labeled instance into the training dataset and call a passive learner (i.e., a normal classifier) to learn from the updated training dataset. 13

Methodologies 14

Methodologies 15

Methodologies Example: 16

Methodologies 17

Methodologies 18

Methodologies

Methodologies

Methodologies

Methodologies 22

Theoretical Results 23

Theoretical Results 24

Experiments  Datasets :  1 st type: several real datasets for regression (breast-cancer, housing, wine-white, wine-red)  2 nd type: a movie review dataset ( IMDb )  Setup:  A 10-fold cross-validation  Measurements:  The average accuracy  The p-value of paired t-test  Algorithms (Why?):  Passive (the passive learner we call in each round)  Active (the original importance weighted active learning algorithm)  FSAL (our method) 25

Experiments  The breast-cancer dataset The average accuracy of Passive, Active and FSAL The p-value of two paired t-test: “FSAL vs Passive” and “FSAL vs Active” 26

Experiments  The IMDb dataset The average accuracy of Passive, Active and FSAL The p-value of two paired t-test: “FSAL vs Passive” and “FSAL vs Active” 27

Conclusion  We propose a selectively sampling algorithm to learn from probabilistic labels.  We prove that selectively sampling based on the probabilistic labels is more efficient than that based on the deterministic labels.  We give an extensive experimental study on our proposed learning algorithm. 28

THANK YOU! 29

Experiments  The housing dataset The average accuracy of Passive, Active and FSAL The p-value of two paired t-test: “FSAL vs Passive” and “FSAL vs Active” 30

Experiments  The wine-white dataset The average accuracy of Passive, Active and FSAL The p-value of two paired t-test: “FSAL vs Passive” and “FSAL vs Active” 31

Experiments  The wine-red dataset The average accuracy of Passive, Active and FSAL The p-value of two paired t-test: “FSAL vs Passive” and “FSAL vs Active” 32