Path-integral distance for the data analysis

Slides:

Advertisements

Similar presentations

Discrete time Markov Chain

Advertisements

1 A camera is modeled as a map from a space pt (X,Y,Z) to a pixel (u,v) by ‘homogeneous coordinates’ have been used to ‘treat’ translations ‘multiplicatively’

Differential geometry I

Comparison Methodologies. Evaluating the matching characteristics Properties of the similarity measure Robustness of the similarity measure – Low variation.

Mathematical Analysis of Complex Networks and Databases Philippe Blanchard Dima Volchenkov.

1 Numerical Geometry of Non-Rigid Shapes Diffusion Geometry Diffusion geometry © Alexander & Michael Bronstein, © Michael Bronstein, 2010 tosca.cs.technion.ac.il/book.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Ronald R. Coifman , Stéphane Lafon, 2006

Gaussian Information Bottleneck Gal Chechik Amir Globerson, Naftali Tishby, Yair Weiss.

Chapter 1 Vector analysis

Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.

PHY 042: Electricity and Magnetism

1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.

A Study of The Applications of Matrices and R^(n) Projections By Corey Messonnier.

Lecture 19 Representation and description II

Ｍ ultiverse and the Naturalness Problem Hikaru KAWAI 2012/ 12/ 4 at Osaka University.

Principles of Pattern Recognition

Hamdy N.Abd-ellah حمدي نور الدين عبد الله Department of Mathematics, Faculty of Science, Assiut University Assiut, Egypt جامعة أم القرى قسم الرياضيات.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

BROWNIAN MOTION A tutorial Krzysztof Burdzy University of Washington.

Big Ideas Differentiation Frames with Icons. 1. Number Uses, Classification, and Representation- Numbers can be used for different purposes, and numbers.

Non-Euclidean Example: The Unit Sphere. Differential Geometry Formal mathematical theory Work with small ‘patches’ –the ‘patches’ look Euclidean Do calculus.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.

Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.

Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.

COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Rongjie Lai University of Southern California Joint work with: Jian Liang, Alvin Wong, Hongkai Zhao 1 Geometric Understanding of Point Clouds using Laplace-Beltrami.

Lecture from Quantum Mechanics. Marek Zrałek Field Theory and Particle Physics Department. Silesian University Lecture 6.

Lecture 11 Inner Product Spaces Last Time Change of Basis (Cont.) Length and Dot Product in R n Inner Product Spaces Elementary Linear Algebra R. Larsen.

Random Walks for Data Analysis Dima Volchenkov (Bielefeld University) Discrete and Continuous Models in the Theory of Networks.

Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)

Introduction to Random Walks and Diffusions to Network and Databases: from Electric Networks to Urban Spatial Networks Dimitri Volchenkov (Bielefeld University.

Is it possible to geometrize infinite graphs?

Mathematical Analysis of Complex Networks and Databases

Introduction to Vectors and Matrices

Institutions do not die

Geometrize everything with Monge-Kantorovich?

Random Walks for Data Analysis

Ca’ Foscari University of Venice;

Real world data analysis and interpretation

Object Orie’d Data Analysis, Last Time

Intrinsic Data Geometry from a Training Set

Data Analysis of Multi-level systems

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Structure creates a chance

Random remarks about random walks

LECTURE 10: DISCRIMINANT ANALYSIS

Mid-Term Exam preparation

Unsupervised Riemannian Clustering of Probability Density Functions

Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.

Spectral Methods Tutorial 6 1 © Maks Ovsjanikov

Quantum mechanics from classical statistics

Overview of our studies

Lecture on Linear Algebra

Filtering and State Estimation: Basic Concepts

Lecture 2 – Monte Carlo method in finance

Chapter 3 Linear Algebra

DIAGRAMMATIC MONTE CARLO:

LECTURE 09: DISCRIMINANT ANALYSIS

Discrete time Markov Chain

Introduction to Vectors and Matrices

Ajay S. Pillai, Viktor K. Jirsa Neuron

Ajay S. Pillai, Viktor K. Jirsa Neuron

Linear Discrimination

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

Marios Mattheakis and Pavlos Protopapas

Continuous Random Variables: Basics

Presentation transcript:

Path-integral distance for the data analysis Dmitry Volchenkov Project FP7 – ICT-318723 MATHEMACS

The big challenges of big data May 22, 2013 — A full 90% of all the data in the world has been generated over the last two years.

All possible paths are taken into account in the "path integral" distance, although some paths are more preferable then others.

Data interpretation = equivalence partition The data classification & interpretation is based on the equivalence partition on the set of walks over a database;

Data interpretation = equivalence partition The data classification & interpretation is based on the equivalence partition on the set of walks over a database; Classification as a walk over a table of morphological taxa; If two walks end at the same point, the species belong to the same class. Systema Naturæ (1735)

Data interpretation = equivalence partition The data classification & interpretation is based on the equivalence partition on the set of walks over a database; The nearest- neighbor random walks

Data interpretation = equivalence partition The data classification & interpretation is based on the equivalence partition on the set of walks over a database; Interpretation does not necessary reveal a "true meaning" of the data, but rather represent a self-consistent point of view on that. “Astrological” equivalence partition: walks of the given length n starting at the same node are equivalent (Same day born people inherit a same/similar personality).

Equivalent paths are taken as equiprobable Given an equivalence relation on the set of walks and a function such that we can always normalize it to be a probability function: all “equivalent” walks are equiprobable. …

Equivalent paths are taken as equiprobable

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales Time is introduced as powers of transition matrices Still far from stationary distribution! Stationary distribution is already reached! Defect insensitive. Low centrality (defect) repelling.

Random walks of different scales Time is introduced as powers of transition matrices Still far from stationary distribution! Stationary distribution is already reached! Defect insensitive. Low centrality (defect) repelling.

Equivalent paths are taken as equiprobable

Structure reinforces order A.) B.) C.) Maximal entropy RWs, “Theories”: Blind to defects & boundaries (repelling) Maximal complexity RWs, “Empiricism”: Localization within small scale structures The complexity-entropy diagram shows how information is stored, organized, and transformed across different scales of the structure.

“Probabilistic differential geometry” “Probabilistic geometry” “Probabilistic graph theory” “Probabilistic differential geometry” “Probabilistic geometry” “Path integral” distance weighted by scale- dependent random walks The Hessian function characterizing the local curvature of the probabilistic manifold at x has directions of positive and negative curvature.

“Probabilistic graph theory” The determinants of the the sth order minors define an orthonormal basis in the space of contra-variant forms: Example: The probability to find a random walker within a given subgraph during the transient processes within the given time scales.

Path integral in finite dimensions Path integral is an analytic continuation of RW summation. Path integral: a single classical trajectory is replaced with a sum over an infinity of possible trajectories to compute a propagator; Propagator is the Green’s function of the diffusion operator (the Schrödinger equation is a diffusion equation with an imaginary diffusion constant); Removal of ambiguities The Laplace operator diverges, the Green function is not unique: The Drazin generalized inverse (the group inverse w.r.t. matrix multiplication) preserves symmetries of the Laplace operator: From path integral to the Riemannian geometry Given two distributions x,y, their scalar product: The (squared) norm of a distribution: The Euclidean distance between two distributions : Feynman path integral: Removal of point-loops ambiguities trough finite part renormalization Transition to self-avoiding random walks (“no loops”).

Probabilistic geometry of graphs by the nearest -neighbor random walks First-passage time: Commute time: y1 First-passage time Commute time

Can we hear first-passage times? F. Liszt Consolation-No1 V.A. Mozart, Eine Kleine Nachtmusik Bach_Prelude_BWV999 R. Wagner, Das Rheingold (Entrance of the Gods) P. Tchaikovsky, Danse Napolitaine

Can we hear first-passage times? Recurrence time First-passage time Hierarchy of harmonic intervals Tonality of Western music The basic pitches for the E minor scale are "E", "F#", "G", "A", "B". The recurrence time vs. the first passage time over 804 compositions of 29 Western composers.

Can we see the first-passage times? (Mean) First passage time Tax assessment value of land ($) Manhattan, 2005 Federal Hall SoHo East Village Bowery East Harlem 10 100 1,000 5,000 10,000 (Mean) first-passage times in the city graph of Manhattan

Why are mosques located close to railways? NEUBECKUM: Social isolation from structural isolation

Principal components by random walks Representations of graphs & databases in the probabilistic geometric space are essentially multidimensional! 1000 × 1000 data table (or a connected graph of 1000 nodes) is embedded into 999-dimensional space! Dimensions are unequal! ~ Kernel principal component analysis (KPCA) with the kernel

Nonlinear principal components by random walks MILCH K = MILK In contrast to the covariance matrix which best explains the variance in the data with respect to the mean, the kernel G traces out all higher order dependencies among data entries.

Nonlinear principal components by random walks Fermi-Dirac statistics Maxwell-Boltzmann statistics Gaussian statistics In contrast to the covariance matrix which best explains the variance in the data with respect to the mean, the kernel G traces out all higher order dependencies among data entries.

First attaining times manifold The first-passage time can be calculated as the mean of all first hitting times with respect to the stationary distribution of random walks For any given starting distribution that differs from the stationary one, we can calculate the analogous quantity, We call it the first attaining time to the node j by the random walks starting at the distribution ϕ1.

First attaining times manifold ek are the direction cosines A manifold locally homeomorphic to Euclidean space

First attaining times manifold. The Morse eory Each node j is a critical point of the manifold of first attaining times, and the first passage times fj are the correspondent critical values.

First attaining times manifold Following the ideas of the Morse theory, we can perform the standard classification of the critical points, introducing the index g j of the critical point j as the number of negative eigenvalues of at j. The index of a critical point is the dimension of the largest subspace of the tangent space to the manifold at j on which the Hessian is negative definite).

First attaining times manifold. The Morse theory The Euler characteristic c is an intrinsic property of a manifold that describes its topological space’s shape regardless of the way it is bent. It is known that the Euler characteristic can be calculated as the alternating sum of Cg , the numbers of critical points of index c of the Hessian function,

First attaining times manifold. The Morse theory Amsterdam (57 canals) Venice (96 canals) The negative Euler characteristics could either come from a pattern of symmetry in the hyperbolic surfaces, or from a manifold homeomorphic multiple tori. The large positive value of the Euler characteristic can arise due to the well-known product property of Euler characteristics for any product space M ×N, or, more generally, from a fibration, when one topological space (called a fiber) is being ”parameterized” by another topological space (called a base).

Conclusions Markov chains are the stochastic automorphisms of graphs & databases Nonlinear (Kernel) Principal Component Analysis The method for summing up all RWs (“Path integral”) → Probabilistic geometry RWs formalize the process of data interpretation

Some references D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics , Vol. 10, Berlin / Heidelberg , ISBN 978-3-642-19591-4 (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN 978-3-540-87828-5, 181 pages (2009). Volchenkov, D., “Markov Chain Scaffolding of Real World Data”, Discontinuity, Nonlinearity, and Complexity 2(3) 289–299 (2013)| DOI: 10.5890/DNC.2013.08.005. Volchenkov, D., Jean-René Dawin, “Musical Markov Chains ”, International Journal of Modern Physics: Conference Series, 16 (1) , 116-135 (2012) DOI: 10.1142/S2010194512007829. Volchenkov, D., Ph. Blanchard, J.-R. Dawin, “Markov Chains or the Game of Structure and Chance. From Complex Networks, to Language Evolution, to Musical Compositions”, The European Physical Journal - Special Topics 184, 1-82 © Springer Berlin / Heidelberg (2010). Volchenkov, D., “Random Walks and Flights over Connected Graphs and Complex Networks”, Communications in Nonlinear Science and Numerical Simulation, 16 (2011) 21–55 http://dx.doi.org/10.1016/j.cnsns.2010.02.016 (2010).