Extending metric multidimensional scaling with Bregman divergences Mr. Jigang Sun Supervisor: Prof. Colin Fyfe Nov 2009.

Slides:



Advertisements
Similar presentations
Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Advertisements

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
A Short Introduction to Curve Fitting and Regression by Brad Morantz
MATH 685/ CSI 700/ OR 682 Lecture Notes
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Functional Methods for Testing Data. The data Each person a = 1,…,N Each person a = 1,…,N responds to each item i = 1,…,n responds to each item i = 1,…,n.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Extending metric multidimensional scaling with Bregman divergences
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Proximity matrices and scaling Purpose of scaling Similarities and dissimilarities Classical Euclidean scaling Non-Euclidean scaling Horseshoe effect Non-Metric.
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Proximity matrices and scaling Purpose of scaling Similarities and dissimilarities Classical Euclidean scaling Non-Euclidean scaling Horseshoe effect Non-Metric.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.
Proximity matrices and scaling Purpose of scaling Classical Euclidean scaling Non-Euclidean scaling Non-Metric Scaling Example.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
1 Introduction to Model Order Reduction Luca Daniel Massachusetts Institute of Technology
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Dimensionality reduction: Some Assumptions High-dimensional data often lies on or near a much lower dimensional, curved manifold. A good way to represent.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Neighbourhood relation preservation (NRP) A rank-based data visualisation quality assessment criterion Jigang Sun PhD studies finished in July 2011 PhD.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.
Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Finite Difference Methods Definitions. Finite Difference Methods Approximate derivatives ** difference between exact derivative and its approximation.
Dimensionality Reduction
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
12/24/2015 A.Aruna/Assistant professor/IT/SNSCE 1.
Principle Component Analysis and its use in MA clustering Lecture 12.
Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.
Multidimensional Scaling and Correspondence Analysis © 2007 Prentice Hall21-1.
Multidimensional Scaling
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Curvilinear Component Analysis and Bregman divergences
Accurate horizontal and vertical scales chosen. The independent variable is shown on the horizontal (across) axis. The dependent variable is.
EXAMPLE 1 Finding a Combined Area Architecture
Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing Equations.
Principal Component Analysis (PCA)
PSG College of Technology
SE301: Numerical Methods Topic 8 Ordinary Differential Equations (ODEs) Lecture KFUPM (Term 101) Section 04 Read , 26-2, 27-1 CISE301_Topic8L4&5.
Dimensionality Reduction
Feature space tansformation methods
Multidimensional Scaling
Plotting Points Guided Notes
30 m 2000 m 30 m 2000 m. 30 m 2000 m 30 m 2000 m.
Physics 451/551 Theoretical Mechanics
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

Extending metric multidimensional scaling with Bregman divergences Mr. Jigang Sun Supervisor: Prof. Colin Fyfe Nov 2009

Multidimensional Scaling(MDS) A group of information visualisation methods that projects data from high dimensional space, to a low dimensional space, often two or three dimensions, keeping inter-point dissimilarities (e.g. distances) in low dimensional space as close as possible to the original dissimilarities in high dimensional space. When Euclidean distances are used, it is Metric MDS.

basic MDS An example high dimensional space /data space/input space low dimensional space /latent space/output space

Basic MDS We minimise the stress function data space Latent space

Sammon Mapping (1969) Focuses on small distances: for the same error, the smaller distance is given bigger stress, thus on average the small distances are mapped more accurately than long distances. Small neighbourhoods are well preserved.

Bregman divergence is the Bregman divergence between p and q based on strictly convex function, F. Intuitively, t he difference between the value of F at point p and the value of the first-order Taylor expansion of F around point q evaluated at point p.

When F is in one variable, the Bregman Divergence is truncated Taylor series A useful property for MDS: Non-negativity: If is a function in p, p approaches q when it is minimised. Bregman divergence

MDS using Bregman divergence Bregmanised MDS Equivalent Expression: residual Taylor series

Basic MDS is a special BMMDS Base convex function is chosen as And higher order derivatives are So Is derived as

Example 2: Extended Sammon Base convex function This is equivalent to The Sammon mapping is rewritten as

Sammon and Extended Sammon The common term The Sammon mapping is considered to be an approximation to the Extended Sammon mapping using the common term. The Extended Sammon mapping will do more adjustments on the basis of the higher order terms.

An Experiment on Swiss roll data set

At a glance Basic MDS captures the global curve, but poorly differentiates local points of same X and Y coordinate but different Z coordinate. The Sammon mapping does better than BasicMDS. The Extended Sammon mapping is the best.

Distance preservation

Horizontal axis: mean distances in data space, 40 sets. Vertical axis: relative mean distances in latent space. Sammon is better than BasicMDS, Extended Sammon is better than Sammon: Small distances are mapped closer to their original value in data space; long distances are mapped longer. Distance preservation

Relative standard deviation

On short distances, Sammon has smaller variance than BasicMDS, Extended Sammon has smaller variance than Sammon, i.e. control of small distances is enhanced. Large distances are given more and more freedom in the same order as above.

LCMC: local continuity meta-criterion (L. Chen 2006) A common measure assesses projection quality of different MDS methods. In terms of neighbourhood preservation. Value between 0 and 1, the higher the better.

Quality accessed by LCMC

Stress comparison between Sammon and Extended Sammon

For the ExtendedSammon, a shorter distance error (e.g. if D ij -L ij =2) in latent space is penalized more than a longer distance error (e.g. if D ij – L ij =-2)in latent space.

Stress formation by items

Stress formation by terms Stress coming from the term of the Sammon mapping is the largest. It is the main part of stress. However, for small distances, the contribution from other terms is not negligible.

OpenBox, Sammon and FirstGroup

SecondGroup on OpenBox

Future work Combining two opposite strategies for choosing base convex functions. Right Bregman divergences is one kind of CCA.

Conclusion Applied Bregman divergences to multidimensional scaling. Shown that basic MMDS is a special case and Sammon mapping approximates a BMMDS. Improved upon both with 2 families of divergences. Shown results on two artificial data sets.