Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

Random Forest Predrag Radenković 3237/10

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.

Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009

Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Learning using Graph Mincuts Shuchi Chawla Carnegie Mellon University 1/11/2003.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Daozheng Chen 1, Mustafa Bilgic 2, Lise Getoor 1, David Jacobs 1, Lilyana Mihalkova 1, Tom Yeh 1 1 Department of Computer Science, University of Maryland,

Lecture 21: Spectral Clustering

Mutual Information Mathematical Biology Seminar

Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.

Localization from Mere Connectivity Yi Shang (University of Missouri - Columbia); Wheeler Ruml (Palo Alto Research Center ); Ying Zhang; Markus Fromherz.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department

Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.

A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.

Ensemble Learning (2), Tree and Forest

An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.

Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Active Learning for Class Imbalance Problem

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Manifold learning: MDS and Isomap

Lecture 4 Linear machine

University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.

Correlation Clustering

Spectral Methods for Dimensionality

Clustering (1) Clustering Similarity measure Hierarchical clustering

Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.

Semi-Supervised Clustering

Clustering CSC 600: Data Mining Class 21.

Sofus A. Macskassy Fetch Technologies

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.

K-means and Hierarchical Clustering

Learning with information of features

Generally Discriminant Analysis

Presentation transcript:

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University

Outline Problem setting Motivation Our approach Experiments Conclusion

Setting X: feature space, label set Y={-1,+1} Data D ~ X x Y D = T U U  T: training set U: unlabeled set T is small initially, U is large Active Learning:  Choose most informative samples to label  Goal: high performance with least number of labeling requests

Motivation Optimize the decision boundary placement  Sampling disproportionately on one side may not be optimal  Maximize likelihood of straddling the boundary with paired samples Three factors affect sampling  Local density  Conditional entropy maximization  Utility score

Illustrative Example Left Figure  significant shift in the current hypothesis  large reduction in version space Right Figure  small shift in the current hypothesis  small reduction in version space Paired sampling Single point sampling

Density-Sensitive Distance Cluster Hypothesis:  decision boundary should NOT cut clusters  squeeze distances in high density regions  increase distances in low density regions Solution: Density-Sensitive Distance  find the weakest link along each path in a graph G  a better way to avoid outliers (i.e. a very short edge in a long path) Chapelle & Zien (2005)

Density-Sensitive Distance Apply MDS (Multi-dimensional Scaling) to to obtain a Euclidean embedding Find eigenvalues and eigenvectors of Pick the first p eigenvectors s.t.

Active Sampling Procedure Given a training set T in MDS space 1.Train logistic regression classifier on T 2.For all  Compute the pairwise score 3.Choose the pair with the maximum score 4.Repeat 1-3

Details of the Scoring Function S Two components of S 1.Likelihood of a pair having opposite labels (straddling the decision boundary) 2.Utility of the pair  By cluster assumption  decision boundary should not clusters => points in different clusters are likely to have different labels  In the transformed space, points in different clusters have low similarity (large distance) Thus, we can estimate

An Analysis Justifying our Claim Pairwise distances are divided into bins Pairs are assigned to bins acc. to their distances For each bin, relative frequency of pairs with opposite class labels are computed This graph (empirically) shows that likelihood of having opposite labels for two points monotonically increases with the pairwise distance between them. * This graph is plotted on g50c dataset.

Utility Function Two components  Local density depends on number of close neighbors their proximity  Conditional Entropy  For binary problems

Uncertainty-Weighed Density captures  the density of a given point  information content of its neighbors novelty:  each neighbor’s contribution weighed by its uncertainty  reduces the effect of highly certain neighbors  dense points with highly uncertain neighbors become important

Utility Function utility of a pair is regularize  information content (entropy) of the pair  proximity-weighted information content of neighbors

Experimental Data pair with maximum score selected Six binary datasets

Experiment Setting For each data set  start with 2 labeled data points (1 +, 1 -)  run each method for 20 iterations  results averaged over 10 runs Baselines  Uncertainty Sampling  Density-only Sampling  Representative Sampling (Xu et. al. 2003)  Random Sampling

Results

Conclusion Our contributions:  combine uncertainty, density, and dissimilarity across decision boundary  proximity-weighted conditional entropy selection is effective for active learning Results show  our method significantly outperforms baselines in error reduction fewer labeling requests than others to achieve the same performance

Thank You!