Ranking Projection Zhi-Sheng Chen 2010/02/03 1/30 Multi-Media Information Lab, NTHU.

Slides:

Advertisements

Similar presentations

CHAPTER 13: Alpaydin: Kernel Machines

Advertisements

8.2 Kernel And Range.

ECG Signal processing (2)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

An Introduction of Support Vector Machine

Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.

An Introduction of Support Vector Machine

Linear Separators.

Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Computer vision: models, learning and inference Chapter 8 Regression.

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

4.4 Application--OLS Estimation. Background When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Determinants Bases, linear Indep., etc Gram-Schmidt Eigenvalue and Eigenvectors Misc

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Principal Component Analysis

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Support Vector Machines

Unsupervised Learning

COMP322/S2000/L23/L24/L251 Camera Calibration The most general case is that we have no knowledge of the camera parameters, i.e., its orientation, position,

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

Support Vector Machines

Stats & Linear Models.

Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.

Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Presented By Wanchen Lu 2/25/2013

Outline Separating Hyperplanes – Separable Case

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Domain Range definition: T is a linear transformation, EIGENVECTOR EIGENVALUE.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Chapter 5 Eigenvalues and Eigenvectors 大葉大學資訊工程系黃鈴玲 Linear Algebra.

Eigenvectors and Linear Transformations Recall the definition of similar matrices: Let A and C be n  n matrices. We say that A is similar to C in case.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

CSE 185 Introduction to Computer Vision Face Recognition.

Diagonalization and Similar Matrices In Section 4.2 we showed how to compute eigenpairs (,p) of a matrix A by determining the roots of the characteristic.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.

Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.

Dimensionality reduction

Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.

2D-LDA: A statistical linear discriminant analysis for image matrix

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *

Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)

Principal Components Analysis ( PCA)

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

SUPPORT VECTOR MACHINES

Spectral Methods for Dimensionality

PREDICT 422: Practical Machine Learning

Dimensionality reduction

Principal Component Analysis (PCA)

Face detection using Random projections

Degree and Eigenvector Centrality

Principal Component Analysis

Feature space tansformation methods

Generally Discriminant Analysis

Presentation transcript:

Ranking Projection Zhi-Sheng Chen 2010/02/03 1/30 Multi-Media Information Lab, NTHU

Introduction  Ranking is everywhere  Retrieval for music, image, video, sound, … etc  Scoring for speech, multimedia… etc  Find a projection that  Preserves the given order of the data  Reduces the dimensionality of the data 2/30 Multi-Media Information Lab, NTHU

The Basic Criteria of Linear Ranking Projection  Given the ranking order (c 1, c 2, c 3, c 4 ). In the projection space, we have the criteria  Where d(.,.) is the distance measure between two classes  In our cases we use the difference of the means 3/30 Multi-Media Information Lab, NTHU

The Basic Criteria of Linear Ranking Projection  Let be the projection vector, the previous criteria can be rewritten as 4/30 Multi-Media Information Lab, NTHU

The Ordinal Weights  Roughly speaking, these distances measure have different importance according to their order.  Ex: is more importance than  How about and ?  Instead of finding the precisely rules of ordinal weights, we use a roughly ordinal weighted rule 5/30 Multi-Media Information Lab, NTHU

The Ordinal Weights  Given a ranking order, we define a score to each term. The largest and the smallest scores indicate the top and the latest terms of the order.  Simply define the ordinal weight function as  So the weighted criteria becomes 6/30 Multi-Media Information Lab, NTHU

Some Results for Weighted Criteria  (c 1, c 2, c 3, c 4 ) 7/30 Multi-Media Information Lab, NTHU

Some Results for Weighted Criteria  (c 3, c 1, c 4, c 2 )  For the projection onto more than one-dim, the solution becomes selecting the kth eigenvectors w.r.t. the smallest kth eigenvalues 8/30 Multi-Media Information Lab, NTHU

Class with several groups  We may not care the order of some groups of the data points within the class 9/30 Multi-Media Information Lab, NTHU

Grouped Classes  For the above case, let the order be (c 1, c 2, c 3 ), then the criteria becomes 10/30 Multi-Media Information Lab, NTHU

Grouped Classes  Result 11/30 Multi-Media Information Lab, NTHU

Reweighting function  Take a look at this case However, the proper projection is … We got a problem here 12/30 Multi-Media Information Lab, NTHU

Reweighting function  Solved by reweighting  Every groups in the same class are weighted by the distance from the mean of the class  Farer groups have the larger weights  The modified criteria becomes … 13/30 Multi-Media Information Lab, NTHU

Reweighting function 14/30 Multi-Media Information Lab, NTHU

Non-linear Ranking Projection  It is impossible to find a linear projection that have the order (c 3, c 2, c 1, c 4 ) 15/30 Multi-Media Information Lab, NTHU

General Idea of Kernel  Transform the data into the high dimensional space through, and do the ranking projection on this space  The projection algorithm can be done by using the dot product, i.e.  Hence, we can define the term  is called the Gram matrix (the discussion of the validation of the kernel is skip here)  Several kernels:  Polynomial kernel  Gaussian kernel  Radius base kernel … etc. 16/30 Multi-Media Information Lab, NTHU

Non-linear Ranking Projection  Using “kernelized” approach to find a non-linear projection  Consider the criteria of basic linear case  Similar to the kernelized LDA (KDA), we can let the projection vector be 17/30 Multi-Media Information Lab, NTHU

Non-linear Ranking Projection  Then  Thus 18/30 Multi-Media Information Lab, NTHU

Non-linear Ranking Projection  The kernelized criteria becomes  Extending to ordinal weighting and grouped class is straightforward.  Extending to re-weighting is more delicate. 19/30 Multi-Media Information Lab, NTHU

Results  Experiments 1 Polynomial kernel, degree=2 Polynomial kernel, degree=3 20/30 Multi-Media Information Lab, NTHU Order: c3, c1, c4, c2

Results Gaussian kernel 21/30 Multi-Media Information Lab, NTHU Order: c3, c1, c4, c2

Results  Experiments 2 Gaussian kernel Polynomial kernel, degree=2 22/30 Multi-Media Information Lab, NTHU Order: c3, c2, c1

Results  Experiments 3 Polynomial kernel, degree=2 Gaussian kernel 23/30 Multi-Media Information Lab, NTHU Order: c3, c2, c1

Results  Experiments 4 Polynomial kernel, degree=2 Gaussian kernel 24/30 Multi-Media Information Lab, NTHU Order: c3, c2, c1

Results  Airplane dataset  214 data points  Feature dimension is 13  Scores: 1 to 7 25/30 Multi-Media Information Lab, NTHU

Results  Linear ranking projection 26/30 Multi-Media Information Lab, NTHU

Results Polynomial kernel, degree=2 Polynomial kernel, degree=5 Polynomial kernel, degree=10 27/30 Multi-Media Information Lab, NTHU

Results  Each data points are projected onto the same points due to the computer precision  Preserve the order well Gaussian kernel 28/30 Multi-Media Information Lab, NTHU

Future Work  Some works need to be done  For grouped classes  Time consuming  We can use “kernelized” K-means clustering to reduce the size of the data points  The re-weighting function in the high dimensional space (kernel approach) has not done yet  The precision problem in the kernelized approach  Potential work  Derives a probabilistic model?  How to cope with the “missing” data (i.e. some dimensions of features are missing)?  For what kernel is appropriate? 29/30 Multi-Media Information Lab, NTHU

Questions? 30/30 Multi-Media Information Lab, NTHU