Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Slides:



Advertisements
Similar presentations
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
The Stability of a Good Clustering Marina Meila University of Washington
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Lecture 21: Spectral Clustering
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis.
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
Efficient Spatiotemporal Grouping Using the Nyström Method Charless Fowlkes, U.C. Berkeley Serge Belongie, U.C. San Diego Jitendra Malik, U.C. Berkeley.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Graph Classification.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Computer Vision James Hays, Brown
CSE 185 Introduction to Computer Vision Pattern Recognition.
Data mining and machine learning A brief introduction.
Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Analysis of Social Media MLD , LTI William Cohen
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Distinctive Image Features from Scale-Invariant Keypoints
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Spectral Methods for Dimensionality
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Clustering Usman Roshan.
Classification of unlabeled data:
Dimensionality reduction
Finding Clusters within a Class to Improve Classification Accuracy
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Learning with information of features
Grouping.
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Spectral clustering methods
“Traditional” image segmentation
Clustering Usman Roshan CS 675.
Presentation transcript:

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

Tony Jebara, Columbia University Topic 12 Graphs in Machine Learning Graph Min Cut, Ratio Cut, Normalized Cut Spectral Clustering Stability and Eigengap Matching, B-Matching and k-regular graphs B-Matching for Spectral Clustering B-Matching for Embedding

Tony Jebara, Columbia University Graphs in Machine Learning Many learning scenarios use graphs Classification: k-nearest neighbors Clustering: normalized cut spectral clustering Inference: Bayesian networks belief propagation

Tony Jebara, Columbia University Normalized Cut Clustering Better than: kmeans, EM, linkage, etc. No local minima or parametric assumptions Given graph (V,E) with weight matrix A, normalized cut is we could fill in A using pairwise similarities/kernels But, this is a hard problem need a relaxation…

Tony Jebara, Columbia University Typically, use EM or k-means to cluster N data points Can imagine clustering the data points only from NxN matrix capturing their proximity information This is spectral clustering Again compute Gram matrix using, e.g. RBF kernel Example: have N pixels from an image, each x = [xcoord, ycoord, intensity] of each pixel From eigenvectors of K matrix (or slight, variant), these seem to capture some segmentation or clustering of data points! Nonparametric form of clustering since we didn’t assume Gaussian distribution… Spectral Clustering

Tony Jebara, Columbia University Spectral Clustering Convert data to graph & cut Given graph (V,E), weight matrix A, best normalized cut B* is NP Define:diagonal degree matrix volume volume of cut B unnormalized Laplacian Solve (combinatorial, NP): Relax to continuous y (Shi & Malik): Solve for y as 2 nd smallest eigenvector of:

Tony Jebara, Columbia University Stability in Spectral Clustering Standard problem when computing & using eigenvectors: Small changes in data can cause eigenvectors to change wildly Ensure the eigenvectors we keep are distinct & stable: look at eigengap… Some algorithms ensure the eigenvectors are going to have a safe eigengap. Use normalized Laplacian: 3 evecs=unsafe 3 evecs=safe gap

Tony Jebara, Columbia University Stabilized Spectral Clustering Stabilized spectral clustering algorithm:

Tony Jebara, Columbia University Stabilized Spectral Clustering Example results compared to other clustering algorithms (traditional kmeans, unstable spectral clustering, connected components).

Tony Jebara, Columbia University Matching and B-Matching Matching (or perfect matching) = permutation = assignment Maximum Weight Matching = Linear Assignment Problem Given weight matrix, find permutation matrix. O(N 3 ) B-Matching generalizes to multi-matchings (Mormon). O(bN 3 ) wife husband Kuhn-Munkres Hungarian Algorithm

Tony Jebara, Columbia University Matching and B-Matching Multi-matchings or b-matchings are also known as k-regular graphs (as opposed to k-nearest neighbor graphs) 0-regular1-regular2-regular3-regular

Tony Jebara, Columbia University Matching and B-Matching Balanced versions of k-nearest neighbor Matchings Neighbors

Tony Jebara, Columbia University B-Matched Spectral Clustering Try to improve spectral clustering using B-Matching Assume w.l.og. two clusters of roughly equal size If we knew the labeling y = [ ] the “ideal” affinity matrix A = in other words… and spectral clustering and eigendecomposition is perfect The “ideal” affinity is a B-Matching with b=N/2 Stabilize affinity by finding the closest B-Matching to it: Then, spectral cluster B-Matching or use it to prune A Also, instead of B-Matching, can do kNN (lazy strawman)

Tony Jebara, Columbia University B-Matched Spectral Clustering Synthetic experiment Have 2 S-shaped clusters Explore different spreads Affinity A ij =exp(-||X i -X j || 2 /  2 ) Do spectral clustering on Evaluate cluster labeling accuracy

Tony Jebara, Columbia University B-Matched Spectral Clustering Clustering images from real video with 2 scenes in it. Accuracy is how well we classify both scenes (10-fold) Evaluated also with kNN Only adjacent frames have high affinity BMatching does best since it boosts connection to far frames

Tony Jebara, Columbia University B-Matched Spectral Clustering Clustering images from same video but 2 other scenes

Tony Jebara, Columbia University B-Matched Spectral Clustering Unlabeled classification via clustering of UCI Optdigits data

Tony Jebara, Columbia University B-Matched Spectral Clustering Unlabeled classification via clustering of UCI Vote dataset

Tony Jebara, Columbia University B-Matched Spectral Clustering Classification accuracy via clustering of UCI Pendigits data Here, using the B-Matching to just prune A is better kNN always seems a little worse…

Tony Jebara, Columbia University B-Matched Spectral Applied method to KDD 2005 Challenge. Predict authorship in anonymized BioBase pubs database Challenge gives many splits of N~100 documents For each split, find NxN matrix saying if documents i & j were authored by same person (1) or different person (0) Documents have 8 fields Compute affinity A for each field via text frequency kernel Find B-Matching P Get spectral clustering y and compute ½ (yy T +1)

Tony Jebara, Columbia University B-Matched Spectral Clustering Accuracy is evaluated using the labeled true same-author & not-same-author matrices Explored many processings of the A matrices. Best accuracy was by using the spectral clustered values of the P matrix found via B-Matching

Tony Jebara, Columbia University B-Matched Spectral Clustering Merge all the 3x8 matrices into a single hypothesis using an SVM and a quadratic kernel. SVM is trained on labeled data (same author, not same author matrices). For each split, we get a single matrix of same-author and not-same-author which was uploaded to KDD Challenge anonymously

Tony Jebara, Columbia University B-Matched Spectral Clustering Total of 7 funded teams attempted this KDD Challenge task and were evaluated by a rank error Double-blind submission and evaluation Our method had lowest average error

Tony Jebara, Columbia University B-Matched Embedding Replace k-nearest-neighbor in machine learning algorithms Example: Semidefinite Embedding (Weinberger & Saul) 1) Get matrix A by computing affinities between all pairs of points 2) Find k-nearest neighbors graph 3) Use SDP to find P.D. matrix K which preserves distances on graph yet is as stretched as possible. Eigen- decomposition of K finds embedding of points in low dimension that preserve distance structures in high dimensional data. Maximize Tr(K) subject to K ≥ 0, Σ ij K ij = 0, and ∀ i,j such that h ij =1 or [h T h] ij =1, K ii + K jj - K ij - K ji = A ii + A jj - A ij - A ji.

Tony Jebara, Columbia University B-Matched Embedding Visualization example: images of rotating tea pot. Get affinity A ij = exp(-||X i -X j || 2 ) between pairs of images Should get ring but noisy images confuse kNN. Greedily connects nearby images without balancing in-degree and out-degree. Get folded over ring even for various values of b (k) B-Matching gives clean ring for many values of b.

Tony Jebara, Columbia University B-Matched Embedding AOL data BMatching: Growing initial connectivity using b-matching algorithm instead of k-nearest neighbor K-nearest neighbors B-matching