Presented By Wanchen Lu 2/25/2013

Slides:

Advertisements

Similar presentations

Face Recognition Sumitha Balasuriya.

Advertisements

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimension reduction (1)

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Eigenvalues and eigenvectors

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

Principal Component Analysis

3D Geometry for Computer Graphics

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Face Recognition Jeremy Wyatt.

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Unsupervised Learning

3D Geometry for Computer Graphics

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

SVD(Singular Value Decomposition) and Its Applications

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Summarized by Soo-Jin Kim

Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Chapter 2 Dimensionality Reduction. Linear Methods

Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

CSE 185 Introduction to Computer Vision Face Recognition.

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

CpSc 881: Machine Learning

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.

Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

Principal Components Analysis ( PCA)

Unsupervised Learning II Feature Extraction

Unsupervised Learning II Feature Extraction

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

CSE 554 Lecture 8: Alignment

Principal Component Analysis (PCA)

PREDICT 422: Practical Machine Learning

Dimensionality Reduction

Background on Classification

University of Ioannina

LECTURE 10: DISCRIMINANT ANALYSIS

Lecture 8:Eigenfaces and Shared Features

Dimension Reduction via PCA (Principal Component Analysis)

Machine Learning Dimensionality Reduction

Principal Component Analysis

Principal Component Analysis

Introduction PCA (Principal Component Analysis) Characteristics:

Recitation: SVD and dimensionality reduction

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

SVD, PCA, AND THE NFL By: Andrew Zachary.

Feature space tansformation methods

Generally Discriminant Analysis

LECTURE 09: DISCRIMINANT ANALYSIS

Feature Selection Methods

Principal Component Analysis

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Presented By Wanchen Lu 2/25/2013 Multi-view Clustering via Canonical Correlation Analysis Kamalika Chaudhuri et al. ICML 2009. Presented By Wanchen Lu 2/25/2013

Introduction

Assumption in Multi-View problems The input variable (a real vector) can be partitioned into two different views, where it is assumed that either view of the input is sufficient to make accurate predictions --- essentially the co-training assumption. e.g. Identity recognition with one view being a video stream and the other an audio stream; Web page classification where one view is the text and the other is the hyperlink structure; Object recognition with pictures from different camera angles; A bilingual parallel corpus, with each view presented in one language.

Intuition in Multi-View problems Many multi-view learning algorithms force agreement between the predictors based on either view. (usually force the predictor on view 1 to equal to the predictor based on view 2) The complexity of the learning problem is reduced by eliminating hypothesis from each view that do not agree with each other.

Background

Canonical correlation analysis CCA is a way of measuring the linear relationship between two multidimensional variables. Find two basis vectors, one for x and one for y, such that the correlations between the projections of the variables onto these basis vectors are maximized. x = \mathbf{x^T\hat{w}}_x , y = \mathbf{y^T\hat{w}}_y \vspace{5mm} \\ \rho = \frac{E[xy]}{\sqrt{E[x^2]E[y^2]}} = \frac{E[\mathbf{\hat{w}}^T_x \mathbf{x y}^T \mathbf{\hat{w}}_y]}{\sqrt{E[\mathbf{\hat{w}}^T_x \mathbf{x x}^T \mathbf{\hat{w}}_x]E[\mathbf{\hat{w}}^T_y \mathbf{y y}^T \mathbf{\hat{w}}_y]}} =\frac{\mathbf{w}^T_x\mathbf{C}_{xy}\mathbf{w}_y}{\sqrt{\mathbf{w}^T_x\mathbf{C}_{xx}\mathbf{w}_x \mathbf{w}^T_y\mathbf{C}_{yy}\mathbf{w}_y}}

Calculating Canonical correlations Consider the total covariance matrix of random variables x and y with zero mean: The canonical correlations between x and y can be found by solving the eigenvalue equations x = \mathbf{x^T\hat{w}}_x , y = \mathbf{y^T\hat{w}}_y \vspace{5mm} \\ \rho = \frac{E[xy]}{\sqrt{E[x^2]E[y^2]}} = \frac{E[\mathbf{\hat{w}}^T_x \mathbf{x y}^T \mathbf{\hat{w}}_y]}{\sqrt{E[\mathbf{\hat{w}}^T_x \mathbf{x x}^T \mathbf{\hat{w}}_x]E[\mathbf{\hat{w}}^T_y \mathbf{y y}^T \mathbf{\hat{w}}_y]}} =\frac{\mathbf{w}^T_x\mathbf{C}_{xy}\mathbf{w}_y}{\sqrt{\mathbf{w}^T_x\mathbf{C}_{xx}\mathbf{w}_x \mathbf{w}^T_y\mathbf{C}_{yy}\mathbf{w}_y}}

Relation to other linear subspace methods Formulate the problems in one single eigenvalue equation

Principal component analysis The principal components are the eigenvectors of the covariance matrix. The projection of data onto the principal components is an orthogonal transformation that diagonalizes the covariance matrix.

Partial least squares PLS is basically the singular value decomposition (SVD) of a between-sets covariance matrix. In PLS regression, the principal vectors corresponding to the largest principal values are used as basis. A regression of y onto x is then performed in this basis.

ALGORITHM

The basic idea Use CCA to project the data down to the subspace spanned by the means to get an easier clustering problem, then apply standard clustering algorithms in this space. When the data in at least one of the views is well separated, this algorithm clusters correctly with high probability.

Algorithm Input: a set of samples S, the number of clusters k Randomly partition S into two subsets A and B of equal size. Let C_12(A) be the covariance matrix between views 1 and 2, computed from the set A. Compute the top k-1 left singular vectors of C_12(A), and project the samples in B on the subspace spanned by these vectors. Apply clustering algorithm (single linkage clustering, K-means) to the projected examples in view 1.

Experiments

Speaker identification Dataset 41 speakers, speaking 10 sentences each Audio features 1584 dimensions Video feature 2394 dimensions Method 1: use PCA project into 40 D Method 2: use CCA (after PCA into 100 D for images and 1000 D for audios) Cluster into 82 clusters (2 / speaker) using K-means

Speaker identification Evaluation Conditional perplexity = the mean # of speakers corresponding to each cluster

clustering Wikipedia articles Dataset 128 K Wikipedia articles, evaluated on 73 K articles that belong to the 500 most frequent categories. Link structure feature L is a concatenation of ``to`` and ``from`` vectors. L(i) is the number of times the current article links to/from article i. Text feature is a bag-of-words vector. Methods: compared PCA and CCA Used a hierarchical clustering procedure, iteratively pick the largest cluster, reduce the dimensionality using PCA or CCA, and use k-means to break the cluster into smaller ones, until reaching the total desired number of clusters.

Results

Thank you

APPENDIX: A note on correlation Correlation between x_i and x_i is the covariance normalized by the geometric mean of the variances of x_i and x_j

Affine transformations An affine transformation is a map F:\mathbb{R}^n \rightarrow \mathbb{R}^n \center { $F(\mathbf{p}) = \mathbf{Ap}+\mathbf{q}$, \forall \mathbf{p} \in \mathbb{R}^n \\ } \mbox{where } \mathbf{A} \mbox{ is a linear tranformation of } \mathbb{R}^n \mbox{ and }\\ \mathbf{q} \mbox{ is a translation vector in } \mathbb{R}^n.$