Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

ECG Signal processing (2)
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Classification and Decision Boundaries
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Pattern Recognition and Machine Learning
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Shortest Path Algorithm l(v) - label of the vertex v 1. Set l(u 0 ) = 0, l(v) =  for v  u 0, S 0 ={u 0 } and i = For each v not in S i, replace.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Principles of Pattern Recognition
1 Learning with Local and Global Consistency Presented by Qiuhua Liu Duke University Machine Learning Group March 23, 2007 By Dengyong Zhou, Olivier Bousquet,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Lecture 4 Linear machine
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kernel nearest means Usman Roshan. Feature space transformation Let Φ(x) be a feature space transformation. For example if we are in a two- dimensional.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Spectral Methods for Dimensionality
Semi-Supervised Clustering
Intrinsic Data Geometry from a Training Set
Image Retrieval Longin Jan Latecki.
Learning with information of features
Principal Component Analysis
COSC 4335: Other Classification Techniques
Pattern Recognition and Machine Learning
Support Vector Machines
Linear Discrimination
Support Vector Machines 2
Presentation transcript:

Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang

Introduction Goal of semi-supervised classification: Use unlabeled data to improve the generalization Cluster assumption: The decision boundary should not cross high density regions, but instead lie in low density regions

Algorithm N labeled data points M unlabeled data points Labels

Algorithm (cont): Graph-based similarities

Graph-based similarities (cont) Principle: Assign low similarities to pairs of points that lie in different clusters If two points in same cluster: exit a continuous connecting curve that only goes through regions of high density If two points in different clusters: such curve has to traverse a density valley. Definition of similarity of 2 points: maximizing over all continuous connecting curves the minimum density along the connection

Graph-based similarities (cont) 1.Build nearest neighbor graph G from all (labeled and unlabeled) data. 2.Compute the n x (n + m) distance matrix of minimal -path distances according to from all labeled points to all points

Graph-based similarities (cont) 3. Perform a non-linear transformation on to get kernel K 4. Train a SVM with K and predict

Graph-based similarities (cont) Usage of p: the accuracy of this approximation depends on the value of the softening parameter p: for p -> 0, the direct connection is always shortest, so that every deletion of an edge can cause the corresponding distance to increase; forp->infinity, shortest paths almost never contain any long edge, so that edges can safely be deleted. For large values of p, the distance between points in the same cluster are decreased; in contrast, the distances between points from different clusters are still dominated by the gaps between the clusters.

Transductive Support Vector Machine ( TSVM )

Gradient TSVM The last term make this problem non-convex and it is not differentiable. So we replace it by

Gradient TSVM (cont)

initially set C* to a small value and increase it exponentionally to C The choice of setting the final value of C* to C is somewhat arbitrary. Ideally, it would be preferable to consider this value as a free parameter of the algorithm.

Multidimensional Scaling (MDS) Reason: The derived kernel is not positive definite. Goal: Find a Euclidean embedding of before applying Gradient TSVM.

Parameters

Low Density Separation (LDS)

Experiment Data Sets g50c and g10n are from two standard normal multi-variant Gaussians. g50c: the labels correspond to the Gaussians, and the means are located in 50-dimensional space such that the Bayes error is 5% Similarly, g10n is in 10 dimensions Coil20:gray-scale images of 20 different objects taken from different angles, in steps of 5 degrees Text: the classes mac and mswindows of the Newsgroup20 dataset preprocessed. Uspst: data part of the well-known USPS data on handwritten digit recognition.

Experiment parameters and results

Appendix (Dijkstra algorithm) Dijkstra's algorithm is known to be a good algorithm to find a shortest path. 1.Set i=0, S0= {u0=s}, L(u0)=0, and L(v)=infinity for v <> u0. If |V| = 1 then stop, otherwise go to step 2. 2.For each v in V\Si, replace L(v) by min{L(v), L(ui)+dvui}. If L(v) is replaced, put a label (L(v), ui) on v. 3.Find a vertex v which minimizes {L(v): v in V\Si}, say ui+1. 4.Let Si+1 = Si cup {ui+1}. 5.Replace i by i+1. If i=|V|-1 then stop, otherwise go to step 2.