Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Mathematical Analysis of Complex Networks and Databases Philippe Blanchard Dima Volchenkov.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
Ronald R. Coifman , Stéphane Lafon, 2006
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Lecture 21: Spectral Clustering
Diffusion Maps and Spectral Clustering
SVD(Singular Value Decomposition) and Its Applications
Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Slides are modified from Lada Adamic
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Chapter 8 Introduction to Alternating Current and Voltage.
Random Walks for Data Analysis Dima Volchenkov (Bielefeld University) Discrete and Continuous Models in the Theory of Networks.
individual objects recognized as nodes We have no a physical image of the network or database, but only individual objects recognized as nodes.
Introduction to Random Walks and Diffusions to Network and Databases: from Electric Networks to Urban Spatial Networks Dimitri Volchenkov (Bielefeld University.
Is it possible to geometrize infinite graphs?
Network (graph) Models
Mathematical Analysis of Complex Networks and Databases
Institutions do not die
Geometrize everything with Monge-Kantorovich?
Path-integral distance for the data analysis
Random Walk for Similarity Testing in Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Random Walks for Data Analysis
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Ca’ Foscari University of Venice;
Real world data analysis and interpretation
Intrinsic Data Geometry from a Training Set
Groups of vertices and Core-periphery structure
Data Analysis of Multi-level systems
3. Transformation
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Social Networks Analysis
Structure creates a chance
Random remarks about random walks
Search Engines and Link Analysis on the Web
Markov Chains Mixing Times Lecture 5
Applications of graph theory in complex systems research
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Network analysis.
Graph Analysis by Persistent Homology
Degree and Eigenvector Centrality
Section 7.12: Similarity By: Ralucca Gera, NPS.
Centrality in Social Networks
Grouping.
“Enter Group Name” Tyler Atkinson and Dylan Menchetti
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Centralities (4) Ralucca Gera,
Graphs Chapter 11 Objectives Upon completion you will be able to:
Neuro-Computing Lecture 4 Radial Basis Function Network
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
Practical Applications Using igraph in R Roger Stanton
NonLinear Dimensionality Reduction or Unfolding Manifolds
Mathematical Models of Control Systems
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
CS723 - Probability and Stochastic Processes
Presented by Nick Janus
Presentation transcript:

Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)

What is the problem with databases/networks? Complexity: no direct ordering of nodes/ entities; no direct ordering of nodes/ entities; incompleteness; incompleteness; can include information about processes evolving at different spatio- temporal scales; can include information about processes evolving at different spatio- temporal scales; → Lack of global intuitive geometric structure! ( binary relations- comparison - instead of geometry ) → Lack of global intuitive geometric structure! ( binary relations- comparison - instead of geometry )

Intuitive ideas The data may “live” on some geometric manifold. Missing parts of the data might be not that important for the process of data interpretation. We need a manifold learning strategy.

A network/ relational database is any method of sharing information between systems consisting of many individual units V, a measurable pattern of relationships between entities. A walk is a succession of n adjacent edges e 1 × e 2 ×... e n−1 connecting a series of vertices in the graph model. The data interpretation/classification/judgment is always based on introduction of equivalence relations on the set of walks over the database: Binary relations:

Linnaeus - Systema Naturæ (1735) Carl Linnaeus The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. Classis 12. Icosandria: flowers with 20 (or more) stamens (~“countable”) Classis 13. Polyandria: flowers with many stamens, inserted on the receptacle (~“uncountable”) The data interpretation/classification/judgment is always based on introduction of equivalence relations on the set of walks over the database: A finite depth of the classification process…

Given an equivalence relation on the set of walks and a function such that we can always normalize it to be a probability function: all “equivalent” walks are equiprobable. Partition into equivalence classes of walks The utility function for each equivalence class A random walk transition operator between eq. classes Equivalence partition of walks => random walk

The shortest-path distance, insensitive to the structure of the graph: A random walk to geometry The distance = “a Feynman path integral” sensitive to the global structure of the graph. Systems of weights are related to each other in a geometric fashion.

We proceed in two steps: Step 1: “Probabilistic graph theory” Nodes, subgraphs (sets of nodes), graphs are described by probability distributions & characteristic times w.r.t. different Markov chains; Step 2: “Geometrization of Data Manifolds” Establish geometric relations between those probability distributions whenever possible; 1. Coarse-graining/reduction/geodesic PCA for networks/databases → data analysis ; sensitivity to assorted data variations ; 2. Transport optimization(Monge-Kontorovich type problems) → distances between distributions ; 3. “Ricci flows” across different scales.

A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks:

A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks: Stochastic matrices:

A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks: Left eigenvectors (  =1) Centrality measures: Stochastic matrices: The “stationary distribution” of the nearest neighbor RW

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Stationary distribution is already reached! Low centrality (defect) repelling. Still far from stationary distribution! Defect insensitive. Random walks of different scales

Graph Subgraph (a subset of nodes) NodeTime scale Step 1: “Probabilistic graph theory” | det T | The probability that the RW revisits the initial node in N steps. Tr T The probability that the RW stays at the initial node in 1 step. Probabilistic graph invariants = the t -steps recurrence probabilities quantifying the chance to return in t steps. … Centrality measures (stationary distributions) Return times to a node “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Return times to the subgraphs within transients = 1/Pr{ … } Random target time Mixing times over subgraphs ( times until the Markov chain is "close" to the steady state distribution ) As soon as we define an equivalence relation …

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: As soon as we get probability distributions…

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: As soon as we get probability distributions…

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: Transport problems of the Monge-Kontorovich type “First-passage transportation” from x to y x y W(x→y) W(y→x) ≠ As soon as we get probability distributions…

Transport problems of the Monge-Kontorovich type Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: (Mean) first- passage time Commute time Electric potential Effective resistance distance Tax assessment land price in cities Musical diatonic scale degree … As soon as we get probability distributions… Musical tonality scale

Example 1: Nearest-neighbor random walks on undirected graphs 

The commute time, the expected number of steps required for a random walker starting at i ∈ V to visit j ∈ V and then to return back to i, The spectral representation of the (mean) first passage time, the expected number of steps required to reach the node i for the first time starting from a node randomly chosen among all nodes of the graph accordingly to the stationary distribution π. 

Example 2: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law:

Example 2: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law: Given an electric current from a to b of amount 1 A, the effective resistance of a network is the potential difference between a and b, The effective resistance allows for the spectral representation:

Impedance networks: The two-point impedance and LC resonances

Some places in urban environments are easily accessible, others are not; well accessible places are more favorable to public, while isolated places are either abandoned, or misused. In a long time perspective, inequality in accessibility results in disparity of land prices: the more isolated a place is, the less its price would be. In a lapse of time, structural isolation would cause social isolation, as a host society occupies the structural focus of urban environments, while the guest society would typically reside in outskirts, where the land price is relatively cheap. Example 3: First-passage times in cities (Mean) First passage time Tax assessment value of land ($) Manhattan, 2005 Neubeckum, Germany, 2012

Federal Hall Times Square SoHo East Village Bowery East Harlem (Mean) first-passage times in the city graph of Manhattan

 PCA Based on Geodesics P R N-1 Small data variations rise small changes to the eigenvectors (rotations) and eigenvalues of the symmetric transition operator, so that we can consider the image of the database as a “probabilistic manifold” in P R N-1. Geodesics on the sphere are “big circles”. PCA is performed in the tangential space, then “principal directions” are projected onto geodesics. The result is an ordered sum of assorted data variations.

Geodesics paths of language evolution Levenshtein’s distance (Edit distance): is a measure of the similarity between two strings: the number of deletions, insertions, or substitutions required to transform one string into another. MILCHK = MILK The normalized edit distance between the orthographic realizations of two words can be interpreted as the probability of mismatch between two characters picked from the words at random.

1.The four well-separated monophyletic spines represent the four biggest traditional IE language groups: Romance & Celtic, Germanic, Balto-Slavic, and Indo-Iranian; 2.The Greek, Romance, Celtic, and Germanic languages form a class characterized by approximately the same azimuth angle (belong to one plane); 3.The Indo-Iranian, Balto-Slavic, Armenian, and Albanian languages form another class, with respect to the zenith angle.

The systematic sound correspondences between the Swadesh’s words across the different languages perfectly coincides with the well-known centum-satem isogloss of the IE family (reflecting the IE numeral ‘100’), related to the evolution in the phonetically unstable palatovelar order.

The components probe for a sample of 50 AU languages immediately uncovers the both Formosan (F) and Malayo-Polynesian (MP) branches of the entire language family. Headhunters

Recurrence time First-passage time: Traps and landmarks Traps, “confusing environments”: can take long to reach, but often revisited Landmarks, “guiding structures”: firstly reached, seldom revisited

A “guiding structure”: Tonality scales in Western music V.A. Mozart, Eine-Kleine-Nachtmusik R. Wagner, Das Rheingold (Entrance of Gods) Increase of harmonic interval/ first –passage time The recurrence time vs. the first passage time over 804 compositions of 29 Western composers.

First-passage time Scale of RW … … The node belongs to a network “core”, consolidating with other central nodes Recurrence times The node belongs to a “cluster”, loosely connected with the rest of the network. Network geometry at different scales

Ricci flows and photo resolution

First-passage time Scale of RW … … Recurrence times Possible analogy with Ricci flows “Densification” of the network of “positive curvature” “Contraction” of a “probabilistic manifold” A “collapse” of the network of “negative curvature”

D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics, Vol. 10, Berlin / Heidelberg, ISBN (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN , 181 pages (2009). References