Diffusion Geometries in Document Spaces. Multiscale Harmonic Analysis. R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B.Nadler. F. Warner, S. Zucker. Mathematics.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Dimensionality Reduction PCA -- SVD
Princeton University Using the computer to select the right variables Rationale: Lake Carnegie, Princeton, NJ Straight Line Distance Actual transition.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Topology-Invariant Similarity and Diffusion Geometry
1 Numerical Geometry of Non-Rigid Shapes Diffusion Geometry Diffusion geometry © Alexander & Michael Bronstein, © Michael Bronstein, 2010 tosca.cs.technion.ac.il/book.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Introduction to Bioinformatics
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
SASH Spatial Approximation Sample Hierarchy
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Symmetric Matrices and Quadratic Forms
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Computer Vision Lecture 3: Digital Images
Diffusion Geometries, and multiscale Harmonic Analysis on graphs and complex data sets. Multiscale diffusion geometries, “Ontologies and knowledge building”
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Radial-Basis Function Networks
Diffusion Maps and Spectral Clustering
Clustering Unsupervised learning Generating “classes”
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Image Representation Gaussian pyramids Laplacian Pyramids
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
MMDS- Stanford 2008 Harmonic Analysis, diffusion geometries and Multi Scale organizations of data and matrices. R.R Coifman Department of Mathematics,Yale.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction to computer vision Chapter 2: Image.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Digital Image Processing CCS331 Relationships of Pixel 1.
The mathematics of graphs A graph has many representations, the simplest being a collection of dots (vertices) and lines (edges). Below is a cubic graph.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Non-Euclidean Example: The Unit Sphere. Differential Geometry Formal mathematical theory Work with small ‘patches’ –the ‘patches’ look Euclidean Do calculus.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Chapter 2: Getting to Know Your Data
Manifold learning: MDS and Isomap
Nonlinear Dimensionality Reduction Approach (ISOMAP)
So, what’s the “point” to all of this?….
02/05/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Mach Banding –Humans exaggerate sharp boundaries, but not fuzzy ones.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Ultra-high dimensional feature selection Yun Li
Topics 1 Specific topics to be covered are: Discrete-time signals Z-transforms Sampling and reconstruction Aliasing and anti-aliasing filters Sampled-data.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
Geometric diffusions as a tool for harmonic analysis and structure definition of data By R. R. Coifman et al. The second-round discussion* on * The first-round.
Digital Image Processing CCS331 Relationships of Pixel 1.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Introduction to Symmetry Analysis Brian Cantwell Department of Aeronautics and Astronautics Stanford University Chapter 1 - Introduction to Symmetry.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Spectral Methods for Dimensionality
Semi-Supervised Clustering
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Intrinsic Data Geometry from a Training Set
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Unsupervised Riemannian Clustering of Probability Density Functions
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Computer Vision Lecture 4: Color
Degree and Eigenvector Centrality
REMOTE SENSING Multispectral Image Classification
Symmetric Matrices and Quadratic Forms
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Diffusion Geometries in Document Spaces. Multiscale Harmonic Analysis. R.R. Coifman, S. Lafon, A. Lee, M. Maggioni, B.Nadler. F. Warner, S. Zucker. Mathematics Department Program of Applied Mathematics. Yale University

Our goal is to report on mathematical tools used in machine learning, document and web browsing, bio informatics, and many other data mining activities. The remarkable observation is that basic geometric harmonic analysis of empirical Markov processes provides a unified mathematical structure which encapsulates most successful methods in these areas. relations These methods enable global descriptions of objects verifying microscopic (like calculus). In particular we relate the spectral properties of Laplace operators (on discrete data ) with the corresponding intrinsic multiscale folder structure induced by the diffusion geometry of the data (generalized Heisenberg principle)

This calculus with digital data provides a first step in addressing and setting up many of the issues mentioned above,and much more, including multidimensional document rankings extending Google, information navigation, heterogeneous material modeling, multiscale complex structure organization etc. Remarkably this can be achieved with algorithms which scale linearly with the number of samples. The methods described below are known as nonlinear principal component analysis, kernel methods, support vector machines, spectral graph theory, and many more They are documented in literally hundreds of papers in various communities. A simple description is given through diffusion geometries. We will now provide a sketch of the basic ideas and potential applicability.

Diffusions between A and B have to go through the bottleneck,while C is easily reachable from B. The Markov matrix defining a diffusion could be given by a kernel, or by inference between neighboring nodes. The diffusion distance accounts for preponderance of inference. The shortest path between A and C is roughly the same as between B and C. The diffusion distance however is larger since diffusion occurs through a bottleneck.

Diffusion as a search mechanism. Starting with a few labeled points in two classes, the points are identified by the “preponderance of evidence”. (Szummer,Slonim, Tishby…)

Conventional nearest neighbor search, compared with a diffusion search. The data is a pathology slide,each pixel is a digital document (spectrum below for each class )

Another simple empirical diffusion matrix A can be constructed as follows Let represent normalized data,we “soft truncate” the covariance matrix as A is a renormalized Markov version of this matrix The eigenvectors of this matrix provide a local non linear principal component analysis of the data. Whose entries are the diffusion coordinates These are also the eigenfunctions of the discrete Graph Laplace Operator. This map is a diffusion (at time t) embedding into Euclidean space

As seen above on the spectra of various powers of a Diffusion operator A. The numerical rank of the powers are reduced. This corresponds to a natural multiresolution wavelet or Littlewood Paley analysis on the set. Orthonormal scaling functions and corresponding wavelets can be constructed (even in the non symmetric case)

A simple application of this diffusion on data,or data filters is the Feature based diffusion algorithms,sometimes called collaborative filtering. A simple application of this diffusion on data,or data filters is the Feature based diffusion algorithms,sometimes called collaborative filtering. Given an image, associate with each pixel p a vector v(p) of features. For example a spectrum, or the 5x5 subimage centered at the pixel,or any combination of features. Define a Markov filter as The various powers of A or polynomials in A provide filters which account for feature similarity between pixels.

Feature diffusion filtering (by A. Szlam) of the noisy Lenna image is achieved by associating with each pixel a feature vector (say the 5x5 subimage centerd at the pixel) this defines a Markov diffusion matrix which is used to filter the image,as was done in for the spiral in the preceding slide

The long term diffusion of heterogeneous material is remapped below. The left side has a higher proportion of heat conducting material,thereby reducing the diffusion distance among points, the bottle neck increases that distance

Diffusion map into 3 d of the heterogeneous graph The distance between two points measures the diffusion between them.

The First two eigenfunctions organize the small images which were provided in random order

Organization of documents using diffusion geometry

We claim that the self organization provided through the diffusion coordinates of the data,is mathematically equivalent to a multiscale “folder” structure on the data A structure that can be obtained directly through basic multiscale diffusion “book keeping” The characteristic functions of the folders can be used to define diffusion wavelets or filters. ( detailed Wavelet Analysis is provided by M.Maggioni in his talk.)

A very simple way to build a hierarchical multiscale folder structure is as follows. We define the diffusion distance between two subsets E and F as :

To build a multiscale hierarchy of folders we start with a cover of the “document graph” with disjoint sets of rough diameter 1 at scale 1. We then organize this metric space into a disjoint collection of folders whose diffusion diameter at scale 2 is roughly 1. Each such collection of folders is a parent folder, we repeat on the parent folders using the diffusion distance at scale 4, and rough diameter 1 to combine them into grandparents, etc. This construction extends the usual binary coordinates on the line and does not build clusters it merely organizes the data.

In general given a data matrix such as a word frequency matrix in a body of documents, there are two folder structures,one on the columns documents graph the other on the words graph. In the document graphs, folders correspond to affinity between documents while on the words, folders are meta words or conceptual functional groups (as seen in the documents). In the image below our “body of documents” are all 8x8 subimages of a simple image of a white disk on black background. The documents are labeled by a central pixel.The folders at different diffusion scales are the geometric features derived from this data set. The only input into the construction is the infinitesimal affinity between patches.

EEG Graphs Green = most visited state, Blue = no state, Red = 3 remaining states States defined via pattern of frontal electrodes (F7, Fp1,Fp2,F8) Three graphs for “graph” and three for Beltrami – one using only front, one using a mix (indicated in figure), and one using all

10-20 System of Electrode Placement for EEG