Random Walks for Data Analysis Dima Volchenkov (Bielefeld University) Discrete and Continuous Models in the Theory of Networks.

Slides:



Advertisements
Similar presentations
K The Common Core State Standards in Mathematics © Copyright 2011 Institute for Mathematics and Education Welcome to a clickable.
Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Mathematical Analysis of Complex Networks and Databases Philippe Blanchard Dima Volchenkov.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Ronald R. Coifman , Stéphane Lafon, 2006
Link Analysis, PageRank and Search Engines on the Web
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Absolute error. absolute function absolute value.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
Machine Vision for Robots
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
SUBDIFFUSION OF BEAMS THROUGH INTERPLANETARY AND INTERSTELLAR MEDIA Aleksander Stanislavsky Institute of Radio Astronomy, 4 Chervonopraporna St., Kharkov.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Modeling K The Common Core State Standards in Mathematics Geometry Measurement and Data The Number System Number and Operations.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
individual objects recognized as nodes We have no a physical image of the network or database, but only individual objects recognized as nodes.
Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)
Introduction to Random Walks and Diffusions to Network and Databases: from Electric Networks to Urban Spatial Networks Dimitri Volchenkov (Bielefeld University.
Is it possible to geometrize infinite graphs?
Mathematical Analysis of Complex Networks and Databases
Geometrize everything with Monge-Kantorovich?
Path-integral distance for the data analysis
SUR-2250 Error Theory.
Random Walks for Data Analysis
Ca’ Foscari University of Venice;
Real world data analysis and interpretation
Intrinsic Data Geometry from a Training Set
Data Analysis of Multi-level systems
3. Transformation
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Structure creates a chance
Instance Based Learning
Random remarks about random walks
Search Engines and Link Analysis on the Web
Ch9: Decision Trees 9.1 Introduction A decision tree:
Machine Learning Basics
CPSC 531: System Modeling and Simulation
Chapter 17: Networks of Cities
Quantum One.
K Nearest Neighbor Classification
Statistical Methods Carey Williamson Department of Computer Science
Quantum One.
REMOTE SENSING Multispectral Image Classification
Clustering and Multidimensional Scaling
Xaq Pitkow, Dora E. Angelaki  Neuron 
3D Transformation CS380: Computer Graphics Sung-Eui Yoon (윤성의)
Chapter 3 Linear Algebra
Feature space tansformation methods
The loss function, the normal equation,
Carey Williamson Department of Computer Science University of Calgary
Nearest Neighbors CSC 576: Data Mining.
Mathematical Foundations of BME Reza Shadmehr
Lecture # 2 MATHEMATICAL STATISTICS
Text Categorization Berlin Chen 2003 Reference:
Parametric Methods Berlin Chen, 2005 References:
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
NonLinear Dimensionality Reduction or Unfolding Manifolds
Math review - scalars, vectors, and matrices
Presentation transcript:

Random Walks for Data Analysis Dima Volchenkov (Bielefeld University) Discrete and Continuous Models in the Theory of Networks

Data come to us in a form of data tables: Binary relations:

Data come to us in a form of data tables: Binary relations: Classes of tasks: 1.Data interpretation; 2.Data validation & Network stability analysis; 3.Data modeling.

Data interpretation Only local information is available at a time; Lack of global intuitive geometric structure (binary relations/comparison instead of geometry). Intuitive ideas: The data may “live” on some geometric manifold. We need a manifold learning strategy. Data geometrization

Example: Data interpretation Nature as a data- network

Example: Data interpretation Nature as a data- network Linnaeus - Systema Naturæ (1735) The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. The data classification/ judgment is always based on introduction of equivalence relations on the set of walks over the database:

Example: Data interpretation Nature as a data- network Linnaeus - Systema Naturæ (1735) The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. The data classification/ judgment is always based on introduction of equivalence relations on the set of walks over the database: Theory of evolution LamarqueDarwin

Data validation & Network stability analysis BRAESS'S PARADOX: adding extra capacity to a network can in some cases reduce overall performance. Does data have an “internal logic” that could help to select proper values? Is there an “internal network dynamics”? Can the structure cause changes in itself?

Data modeling Algorithm of doing data science by a physicist:

Data modeling Algorithm of doing data science by a physicist: Apparent units/nodes are not “natural”; Too many degrees of freedom for any reasonable equation Only a few main traits can be modeled The system is rather complex: Collective variables Complexity reduction

The data classification is always based on introduction of equivalence relations on the set of walks over the database. Equivalence partitions of walks => random walks R x : walks of the given length n starting at the same node x are equivalent R y : walks of the given length n ending at the same node y are equivalent x R x  R y : walks of the given length n between the nodes x and y are equivalent Examples:

The data classification is always based on introduction of equivalence relations on the set of walks over the database. Equivalence partitions of walks => random walks Given an equivalence relation on the set of walks and a function such that we can always normalize it to be a probability function: all “equivalent” walks are equiprobable. Partition into equivalence classes of walks The utility function for each equivalence class A random walk transition operator between eq. classes

p1p1 p2p2 p4p4 p3p3 p5p5 P config =p 1 ·p 2 ·p 3 ·p 4 ·p 5 Maxwell–Boltzmann statistics Maxwell–Boltzmann distribution On equiprobable configurations … A classification  all “equivalent” walks are equiprobable

p1p1 p2p2 p4p4 p3p3 p5p5 P config =p 1 ·p 2 ·p 3 ·p 4 ·p 5 Maxwell–Boltzmann statistics Bose–Einstein statistics... P 1 = P 2 = … = P N Maxwell–Boltzmann distribution On equiprobable configurations … Gibrat’s Law: the probability of a new occurrence is proportional to the number of times it has occurred previously Pareto-Lévy distributions … “fat- tails “ A classification  all “equivalent” walks are equiprobable

We proceed in three steps: Step 0: Given an equivalence relation between paths, any transition can be characterized by a probability to belong to an equivalence class. Different equivalence relations  Different equivalence classes  Different probabilities Step 1: “Probabilistic graph theory” Nodes of a graph, subgraphs (sets of nodes) of the graph, the whole graph are described by probability distributions & characteristic times w.r.t. different Markov chains; Step 2: “Geometrization of Data Manifolds” Establish geometric relations between those probability distributions whenever possible; 1. Coarse-graining/reduction of networks & databases → data analysis ; sensitivity to assorted data variations ; 2. Monge-Kontorovich type problems, Optimal transport → distances between distributions ;

A variety of random walks at different scales An example of equivalence relation: Equiprobable walks: the nearest neighbor random walks Stochastic normalization R x : walks of the given length n starting at the same node x are equivalent Step 0

A variety of random walks at different scales An example of equivalence relation: Equiprobable walks: the nearest neighbor random walks Stochastic normalization Probability of a n -walk R x : walks of the given length n starting at the same node x are equivalent Step 0

A variety of random walks at different scales An example of equivalence relation: Equiprobable walks: Stochastic normalization Probability of a n -walk … … “Structure learning” R x : walks of the given length n starting at the same node x are equivalent Step 0

A variety of random walks at different scales An example of equivalence relation: Equiprobable walks: Stochastic normalization Probability of a n -walk … … “Structure learning” Stochastic normalization ≠ R x : walks of the given length n starting at the same node x are equivalent Step 0

What is a neighbourhood? Who are my neighbours in a given classification ? … … 1.Neighbours are next to me… 2.Neighbours are 2 steps apart from me… n - Neighbours are n steps apart from me … My neighbours are those, whom I can visit equiprobably (w.r.t. a chosen equivalence of paths)…

A variety of random walks at different scales An example of equivalence relation: … … Equiprobable walks: Stochastic matrices: R x : walks of the given length n starting at the same node x are equivalent Step 0

A variety of random walks at different scales An example of equivalence relation: … … Equiprobable walks: Left eigenvectors (  =1) Centrality measures: Stochastic matrices: The “stationary distribution” of the nearest neighbor RW R x : walks of the given length n starting at the same node x are equivalent Step 0

Random walks of different scales Time is introduced as powers of transition matrices

Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Random walks of different scales

Time is introduced as powers of transition matrices Stationary distribution is already reached! Low centrality (defect) repelling. Still far from stationary distribution! Defect insensitive. Random walks of different scales

“Maximal entropy” RWNearest neighbor RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Maximal entropy RWNearest neighbor RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Nearest neighbor RW“Maximal entropy” RW J. K. Ochab, Z. Burda Random walks for different equivalence relations

Step 1: “Probabilistic graph theory” The shortest-path distance, insensitive to the structure of the graph: The distance = “a Feynman path integral” sensitive to the global structure of the graph. Systems of weights are related to each other in a geometric fashion. As soon as we define an equivalence relation …

Graph Subgraph (a subset of nodes) NodeTime scale Step 1: “Probabilistic graph theory” | det T | The probability that the RW revisits the initial node in N steps. Tr T The probability that the RW stays at the initial node in 1 step. Probabilistic graph invariants = the t -steps recurrence probabilities quantifying the chance to return in t steps. … Centrality measures (stationary distributions) Return times to a node “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Return times to the subgraphs within transients = 1/Pr{ … } Random target time Mixing times over subgraphs ( times until the Markov chain is "close" to the steady state distribution ) As soon as we define an equivalence relation …

Graph Subgraph (a subset of nodes) NodeTime scale Step 1: “Probabilistic graph theory” | det T | The probability that the RW revisits the initial node in N steps. Tr T The probability that the RW stays at the initial node in 1 step. Probabilistic graph invariants = the t -steps recurrence probabilities quantifying the chance to return in t steps. … Centrality measures (stationary distributions) Return times to a node “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Return times to the subgraphs within transients = 1/Pr{ … } Random target time Mixing times over subgraphs ( times until the Markov chain is "close" to the steady state distribution ) As soon as we define an equivalence relation …

|I 1 | = Tr T is the probability that a random walker stays at a node in one time step, |I N | = |det T| expresses the probability that the random walks revisit an initial node in N steps. |I k | are the k -steps recurrence probabilities quantifying the chance to return in k steps. where the roots  are the eigenvalues of T, and {I k } N k=1 are its principal invariants, with I 0 = 1. Recurrence probabilities as principal invariants of the graph The Cayley – Hamilton theorem : Kolmogorov- Chapman equation:

Graph Subgraph (a subset of nodes) NodeTime scale Step 1: “Probabilistic graph theory” | det T | The probability that the RW revisits the initial node in N steps. Tr T The probability that the RW stays at the initial node in 1 step. Probabilistic graph invariants = the t -steps recurrence probabilities quantifying the chance to return in t steps. … Centrality measures (stationary distributions) Return times to a node “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Return times to the subgraphs within transients = 1/Pr{ … } Random target time Mixing times over subgraphs ( times until the Markov chain is "close" to the steady state distribution ) As soon as we define an equivalence relation …

Analogy with fermionic systems

The determinants of minors of the k th order of Ψ define an orthonormal basis in the

Analogy with fermionic systems The squares of these determinants define the probability distributions over the ordered sets of k indexes: satisfying the natural normalization condition,

Analogy with fermionic systems Describe currents of random walkers: The squares of these determinants define the probability distributions over the ordered sets of k indexes: satisfying the natural normalization condition, The simplest example is the stationary distribution of random walks:

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: As soon as we get probability distributions…

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and the angle The Euclidean distance: As soon as we get probability distributions…

Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: Transport problems of the Monge-Kontorovich type “First-passage transportation” from x to y x y W(x→y) W(y→x) ≠ As soon as we get probability distributions…

Transport problems of the Monge-Kontorovich type Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: (Mean) first- passage time Commute time Electric potential Effective resistance distance Tax assessment land price in cities Musical diatonic scale degree … As soon as we get probability distributions… Musical tonality scale

Example 1: Nearest-neighbor random walks on undirected graphs 

The commute time, the expected number of steps required for a random walker starting at i ∈ V to visit j ∈ V and then to return back to i, The spectral representation of the (mean) first passage time, the expected number of steps required to reach the node i for the first time starting from a node randomly chosen among all nodes of the graph accordingly to the stationary distribution π. 

Some places in urban environments are easily accessible, others are not; well accessible places are more favorable to public, while isolated places are either abandoned, or misused. In a long time perspective, inequality in accessibility results in disparity of land prices: the more isolated a place is, the less its price would be. In a lapse of time, structural isolation would cause social isolation, as a host society occupies the structural focus of urban environments, while the guest society would typically reside in outskirts, where the land price is relatively cheap. Example 2: First-passage times in cities (Mean) First passage time Tax assessment value of land ($) Manhattan, 2005 Neubeckum, Germany, 2012

Federal Hall Times Square SoHo East Village Bowery East Harlem (Mean) first-passage times in the city graph of Manhattan

Where could we make jogging trails?

First passage time ( expected random steps)

First passage time ( expected random steps)

ErwachseneKinderRentner Beweglichkeit ErwRentener Kinder Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Carlmeyerstr Unbekannt_haus haus_

The most isolated places The most integrated places Typical direction of movements are indicated by the blue arrows.

Physically shortest path Path for meeting as many people as possible Path for meeting as fewer people as possible Concept: Arrange seats for sport along less usable paths in the neighborhood. Reasons: 1.To split the public (business and needs) activity and the private (sport) activity ; 2.To prevent a social misuse of isolated places in the neighborhood;

Example 3: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law:

Example 2: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law: Given an electric current from a to b of amount 1 A, the effective resistance of a network is the potential difference between a and b, The effective resistance allows for the spectral representation:

Impedance networks: The two-point impedance and LC resonances

Resonances

 (Complexity reduction) PCA Based on Geodesics P R N-1 Small data variations rise small changes to the eigenvectors (rotations) and eigenvalues of the symmetric transition operator, so that we can consider the image of the database as a “probabilistic manifold” in P R N-1. Geodesics on the sphere are “big circles”. PCA is performed in the tangential space, then “principal directions” are projected onto geodesics. The result is an ordered sum of assorted data variations.

Geodesics paths of language evolution Levenshtein’s distance (Edit distance): is a measure of the similarity between two strings: the number of deletions, insertions, or substitutions required to transform one string into another. MILCHK = MILK The normalized edit distance between the orthographic realizations of two words can be interpreted as the probability of mismatch between two characters picked from the words at random.

1.The four well-separated monophyletic spines represent the four biggest traditional IE language groups: Romance & Celtic, Germanic, Balto-Slavic, and Indo-Iranian; 2.The Greek, Romance, Celtic, and Germanic languages form a class characterized by approximately the same azimuth angle (belong to one plane); 3.The Indo-Iranian, Balto-Slavic, Armenian, and Albanian languages form another class, with respect to the zenith angle.

The systematic sound correspondences between the Swadesh’s words across the different languages perfectly coincides with the well-known centum-satem isogloss of the IE family (reflecting the IE numeral ‘100’), related to the evolution in the phonetically unstable palatovelar order.

The normal probability plots fitting the distances r of language points from the ‘center of mass’ to univariate normality. The data points were ranked and then plotted against their expected values under normality, so that departures from linearity signify departures from normality.

The univariate normal distribution is closely related to the time evolution of a mass- density function under homogeneous diffusion in one dimension in which the mean value μ is interpreted as the coordinate of a point where all mass was initially concentrated, and variance σ 2 ∝ t grows linearly with time. 1.the last Celtic migration (to the Balkans and Asia Minor) (300 BC), 2.the division of the Roman Empire (500 AD), 3.the migration of German tribes to the Danube River (100 AD), 4.the establishment of the Avars Khaganate (590 AD) overspreading Slavic people who did the bulk of the fighting across Europe. Anchor events: The values of variance σ 2 give a statistically consistent estimate of age for each language group.

From the time–variance ratio we can retrieve the probable dates for: The break-up of the Proto-Indo-Iranian continuum. The migration from the early Andronovo archaeological horizon (Bryant, 2001). by 2,400 BC The end of common Balto-Slavic history before 1,400 BC The archaeological dating of Trziniec-Komarov culture The separation of Indo-Arians from Indo-Iranians. Probably, as a result of Aryan migration across India to Ceylon, as early as in 483BC (Mcleod, 2002) The division of Persian polity into a number of Iranian tribes, after the end of Greco-Persian wars (Green, 1996). before 400 BC

The Kurgan scenario postulating the IE origin among the people of “Kurgan culture”(early 4 th millennium BC) in the Pontic steppe (Gimbutas,1982). Einkorn wheat The Anatolian hypothesis suggests the origin in the Neolithic Anatolia and associates the expansion with the Neolithic agricultural revolution in the 8 th and 6 th millennia BC (Renfrew,1987). The graphical test to check three-variate normality of the distribution of the distances of the five proto-languages from a statistically determined central point is presented by extending the notion of the normal probability plot. The χ-square distribution is used to test for goodness of fit of the observed distribution: the departures from three-variant normality are indicated by departures from linearity. The use of the previously determined time–variance ratio then dates the initial break-up of the Proto-Indo-Europeans back to 7,400 BC pointing at the early Neolithic date.

By 550 AD pretty well before 600 –1200 AD …pretty well before 600 –1200 AD while descendants from Melanesia settled in the distant apices of the Polynesian triangle as evidenced by archaeological records (Kirch, 2000; Anderson and Sinoto,2002; Hurlesetal.,2003). An interaction sphere had existed encompassing the whole region

Nonliterate languages evolve EXPONENTIALLY FAST without extensive contacts with the remaining population. Isolation does not preserve a nonliterate language! Languages spoken in the islands of East Polynesia and of the Atayal language groups seem to evolve without extensive contacts with Melanesian populations, perhaps because of a rapid movement of the ancestors of the Polynesians from South-East Asia as suggested by the ‘express train’ model (Diamond, 1988) consistent with the multiple evidences on comparatively reduced genetic variations among human groups in Remote. Headhunters Mystery of the Tower of Babel

Recurrence time First-passage time: Traps and landmarks Traps, “confusing environments”: can take long to reach, but often revisited Landmarks, “guiding structures”: firstly reached, seldom revisited

The relations between notes in (*) are rather described in terms of probabilities and expected numbers of random steps than by physical time. Thus the actual length N of a composition is formally put N → ∞, or as long as you keep rolling the dice. (*) Musical Dice Game

F. Liszt Consolation-No1Bach_Prelude_BWV999 V.A. Mozart, Eine-Kleine-Nachtmusik R. Wagner, Das Rheingold (Entrance of the Gods)

A “guiding structure”: Tonality scales in Western music Increase of harmonic interval/ first –passage time The recurrence time vs. the first passage time over 804 compositions of 29 Western composers. Recurrence time First-passage time

Scale of RW … … The node belongs to a network “core”, consolidating with other central nodes The node belongs to a “cluster”, loosely connected with the rest of the network. Network geometry at different scales

First-passage time Scale of RW … … Possible analogy with Ricci flows “Densification” of the network of “positive curvature” “Contraction” of a “probabilistic manifold” A “collapse” of the network of “negative curvature”

Ricci flows and photo resolution

First-passage time Recurrence time Property of a node w.r.t. to a global structure Property of a node w.r.t. to a local structure Increase of harmonic interval/ first – passage time Intelligibility of a network/database

First-passage time Recurrence time Property of a node w.r.t. to a global structure Property of a node w.r.t. to a local structure n →  Increase of harmonic interval/ first – passage time Intelligibility of a network/database After enough learning, any structure becomes intelligible!

D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics, Vol. 10, Berlin / Heidelberg, ISBN (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN , 181 pages (2009). References