Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Biointelligence Laboratory, Seoul National University
Dimension reduction (1)
(Includes references to Brian Clipp
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Guillaume Bouchard Xerox Research Centre Europe
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Bayesian network inference
Variational Methods TCD Interests Simon Wilson. Background We are new to this area of research – so we can’t say very much about it – but we’re enthusiastic!
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
“Human Control of an Anthropomorphic Robot Hand”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Manifold Learning: ISOMAP Alan O'Connor April 29, 2008.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
End of Chapter 8 Neil Weisenfeld March 28, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Statistics Continued. Purpose of Inferential Statistics Try to reach conclusions that extend beyond the immediate data Make judgments about whether an.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
First topic: clustering and pattern recognition Marc Sobel.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Slide 14.1 Nonmetric Scaling MathematicalMarketing Chapter 14 Nonmetric Scaling Measurement, perception and preference are the main themes of this section.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Linear Models for Classification
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
1 G Lect 4W Multiple regression in matrix terms Exploring Regression Examples G Multiple Regression Week 4 (Wednesday)
Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.
Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Bayesian Enhancement of Speech Signals Jeremy Reed.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Clustering (3) Center-based algorithms Fuzzy k-means
Statistics and Shape Analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Robust Full Bayesian Learning for Neural Networks
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Will Penny Wellcome Trust Centre for Neuroimaging,
Presentation transcript:

Multidimensional Scaling By Marc Sobel

The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we observe their ‘proximity’ δ[i,j]. We would like to ‘project’ the data into a low-dimensional space. As an example, suppose we observe the correlations between the times at which different people go to work. We would like to create a map of the people in e.g., 2 dimensions which correctly position people relative to one another.

Methodology  To achieve this map, we assign unknown parametric points to i=2,…,n to each data point (except for the first). We regress the proximities on the parametric points via,

Simplest Possible Framework  The simplest possible framework makes the proximities ‘copies’ of the distances themselves. This is:  This leads to a Newton Raphson update of:

A Newton-Raphson based update

Controlled Newton Raphson:  Suppose we update with Newton Raphson but only move if the ‘stress’ is made smaller. That is, after each update we move only if,

Controlled MDS (better by a factor of 10)

Reducing the dimensionality of manifolds  Identifying and drawing inferences about high dimensional data sets play a fundamental role in statistical inference. Typically, what we would like to ascertain is elements of similarity between different (parts of the) data set(s). [For example, satellite data]. A famous example is the swiss roll data (pictured in the next slide). Here we would like to find neighboring points in the swiss roll manifold.

The Swiss Roll

For the Swiss Roll data sets (see A global geometric framework for Dimensionality Reduction by Tennenbaum, et al)  We represent X points lieing in a high dimensional space by trying to predict them from their neighbors (with proper weights W).  Having chosen the W’s optimally, we think of them as fixed and optimize: 

Applications of the Swiss Roll Algorithm  For high dimensional data, we’d like to learn the global structure or ‘data shape’. Using the swiss roll algorithm we can approximate geodesic distances between points on the underlying data manifold. (see A global Geometric Framework for Dimensionality Reduction by Tennenbaum, et al.)

Scaled Framework  The scaled framework makes the proximities ‘copies’ of the distances themselves but changes the error to a ‘relative error’:  This leads to a Newton Raphson update of:

Hybrid Bayes MDS  We can substitute Newton-Raphson updates with Hybrid Bayes MDS:  Where ‘p’ is kinetic energy (see MCMC part II).

A Full Bayesian Treatment  Assume,

Full Bayes Treatment  Form the posterior distribution by multiplying the terms in the last slide. Simulate using Hybrid Sampling (for X) and Gibbs Sampling for

Globally Scaled Framework  The scaled framework makes the proximities ‘copies’ of the distances themselves but changes the error to a ‘relative error’: