Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)

Similar presentations


Presentation on theme: "Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)"— Presentation transcript:

1 Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)

2 What is the problem with databases/networks? Complexity: no direct ordering of nodes/ entities; no direct ordering of nodes/ entities; incompleteness; incompleteness; can include information about processes evolving at different spatio- temporal scales; can include information about processes evolving at different spatio- temporal scales; → Lack of global intuitive geometric structure! ( binary relations- comparison - instead of geometry ) → Lack of global intuitive geometric structure! ( binary relations- comparison - instead of geometry )

3 Intuitive ideas The data may “live” on some geometric manifold. Missing parts of the data might be not that important for the process of data interpretation. We need a manifold learning strategy.

4 A network/ relational database is any method of sharing information between systems consisting of many individual units V, a measurable pattern of relationships between entities. A walk is a succession of n adjacent edges e 1 × e 2 ×... e n−1 connecting a series of vertices in the graph model. The data interpretation/classification/judgment is always based on introduction of equivalence relations on the set of walks over the database: Binary relations:

5 Linnaeus - Systema Naturæ (1735) Carl Linnaeus The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. Classis 12. Icosandria: flowers with 20 (or more) stamens (~“countable”) Classis 13. Polyandria: flowers with many stamens, inserted on the receptacle (~“uncountable”) The data interpretation/classification/judgment is always based on introduction of equivalence relations on the set of walks over the database: A finite depth of the classification process…

6 Given an equivalence relation on the set of walks and a function such that we can always normalize it to be a probability function: all “equivalent” walks are equiprobable. Partition into equivalence classes of walks The utility function for each equivalence class A random walk transition operator between eq. classes Equivalence partition of walks => random walk

7 The shortest-path distance, insensitive to the structure of the graph: A random walk to geometry The distance = “a Feynman path integral” sensitive to the global structure of the graph. Systems of weights are related to each other in a geometric fashion.

8 We proceed in two steps: Step 1: “Probabilistic graph theory” Nodes, subgraphs (sets of nodes), graphs are described by probability distributions & characteristic times w.r.t. different Markov chains; Step 2: “Geometrization of Data Manifolds” Establish geometric relations between those probability distributions whenever possible; 1. Coarse-graining/reduction/geodesic PCA for networks/databases → data analysis ; sensitivity to assorted data variations ; 2. Transport optimization(Monge-Kontorovich type problems) → distances between distributions ; 3. “Ricci flows” across different scales.

9 A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks:

10 A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks: Stochastic matrices:

11 A variety of random walks at different scales An example of equivalence relation: walks of the given length n starting at the same node are equivalent. … … Equiprobable walks: Left eigenvectors (  =1) Centrality measures: Stochastic matrices: The “stationary distribution” of the nearest neighbor RW

12 Random walks of different scales Time is introduced as powers of transition matrices

13 Random walks of different scales

14 Time is introduced as powers of transition matrices Random walks of different scales

15 Time is introduced as powers of transition matrices Random walks of different scales

16 Time is introduced as powers of transition matrices Random walks of different scales

17 Time is introduced as powers of transition matrices Random walks of different scales

18 Time is introduced as powers of transition matrices Random walks of different scales

19 Time is introduced as powers of transition matrices Random walks of different scales

20 Time is introduced as powers of transition matrices Random walks of different scales

21 Time is introduced as powers of transition matrices Random walks of different scales

22 Time is introduced as powers of transition matrices Random walks of different scales

23 Time is introduced as powers of transition matrices Random walks of different scales

24 Time is introduced as powers of transition matrices Random walks of different scales

25 Time is introduced as powers of transition matrices Random walks of different scales

26 Time is introduced as powers of transition matrices Random walks of different scales

27 Time is introduced as powers of transition matrices Random walks of different scales

28 Time is introduced as powers of transition matrices Random walks of different scales

29 Time is introduced as powers of transition matrices Random walks of different scales

30 Time is introduced as powers of transition matrices Random walks of different scales

31 Time is introduced as powers of transition matrices Stationary distribution is already reached! Low centrality (defect) repelling. Still far from stationary distribution! Defect insensitive. Random walks of different scales

32 Graph Subgraph (a subset of nodes) NodeTime scale Step 1: “Probabilistic graph theory” | det T | The probability that the RW revisits the initial node in N steps. Tr T The probability that the RW stays at the initial node in 1 step. Probabilistic graph invariants = the t -steps recurrence probabilities quantifying the chance to return in t steps. … Centrality measures (stationary distributions) Return times to a node “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Return times to the subgraphs within transients = 1/Pr{ … } Random target time Mixing times over subgraphs ( times until the Markov chain is "close" to the steady state distribution ) As soon as we define an equivalence relation …

33 Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: As soon as we get probability distributions…

34 Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: As soon as we get probability distributions…

35 Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: Transport problems of the Monge-Kontorovich type “First-passage transportation” from x to y x y W(x→y) W(y→x) ≠ As soon as we get probability distributions…

36 Transport problems of the Monge-Kontorovich type Step 2: “ Geometrization of Data Manifolds” Given T, L ≡ 1- T, the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and an angle The Euclidean distance: (Mean) first- passage time Commute time Electric potential Effective resistance distance Tax assessment land price in cities Musical diatonic scale degree … As soon as we get probability distributions… Musical tonality scale

37 Example 1: Nearest-neighbor random walks on undirected graphs 

38 The commute time, the expected number of steps required for a random walker starting at i ∈ V to visit j ∈ V and then to return back to i, The spectral representation of the (mean) first passage time, the expected number of steps required to reach the node i for the first time starting from a node randomly chosen among all nodes of the graph accordingly to the stationary distribution π. 

39 Example 2: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law:

40 Example 2: Electric Resistance Networks, Resistance distance An electrical network is considered as an interconnection of resistors: Kirchhoff circuit law: The currents are described by the Kirchhoff circuit law: Given an electric current from a to b of amount 1 A, the effective resistance of a network is the potential difference between a and b, The effective resistance allows for the spectral representation:

41 Impedance networks: The two-point impedance and LC resonances

42 Some places in urban environments are easily accessible, others are not; well accessible places are more favorable to public, while isolated places are either abandoned, or misused. In a long time perspective, inequality in accessibility results in disparity of land prices: the more isolated a place is, the less its price would be. In a lapse of time, structural isolation would cause social isolation, as a host society occupies the structural focus of urban environments, while the guest society would typically reside in outskirts, where the land price is relatively cheap. Example 3: First-passage times in cities (Mean) First passage time Tax assessment value of land ($) Manhattan, 2005 Neubeckum, Germany, 2012

43 Federal Hall Times Square SoHo East Village Bowery East Harlem (Mean) first-passage times in the city graph of Manhattan

44  PCA Based on Geodesics P R N-1 Small data variations rise small changes to the eigenvectors (rotations) and eigenvalues of the symmetric transition operator, so that we can consider the image of the database as a “probabilistic manifold” in P R N-1. Geodesics on the sphere are “big circles”. PCA is performed in the tangential space, then “principal directions” are projected onto geodesics. The result is an ordered sum of assorted data variations.

45 Geodesics paths of language evolution Levenshtein’s distance (Edit distance): is a measure of the similarity between two strings: the number of deletions, insertions, or substitutions required to transform one string into another. MILCHK = MILK The normalized edit distance between the orthographic realizations of two words can be interpreted as the probability of mismatch between two characters picked from the words at random.

46 1.The four well-separated monophyletic spines represent the four biggest traditional IE language groups: Romance & Celtic, Germanic, Balto-Slavic, and Indo-Iranian; 2.The Greek, Romance, Celtic, and Germanic languages form a class characterized by approximately the same azimuth angle (belong to one plane); 3.The Indo-Iranian, Balto-Slavic, Armenian, and Albanian languages form another class, with respect to the zenith angle.

47 The systematic sound correspondences between the Swadesh’s words across the different languages perfectly coincides with the well-known centum-satem isogloss of the IE family (reflecting the IE numeral ‘100’), related to the evolution in the phonetically unstable palatovelar order.

48 The components probe for a sample of 50 AU languages immediately uncovers the both Formosan (F) and Malayo-Polynesian (MP) branches of the entire language family. Headhunters

49

50 Recurrence time First-passage time: Traps and landmarks Traps, “confusing environments”: can take long to reach, but often revisited Landmarks, “guiding structures”: firstly reached, seldom revisited

51 A “guiding structure”: Tonality scales in Western music V.A. Mozart, Eine-Kleine-Nachtmusik R. Wagner, Das Rheingold (Entrance of Gods) Increase of harmonic interval/ first –passage time The recurrence time vs. the first passage time over 804 compositions of 29 Western composers.

52 First-passage time Scale of RW … … The node belongs to a network “core”, consolidating with other central nodes Recurrence times The node belongs to a “cluster”, loosely connected with the rest of the network. Network geometry at different scales

53 Ricci flows and photo resolution

54 First-passage time Scale of RW … … Recurrence times Possible analogy with Ricci flows “Densification” of the network of “positive curvature” “Contraction” of a “probabilistic manifold” A “collapse” of the network of “negative curvature”

55 D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics, Vol. 10, Berlin / Heidelberg, ISBN 978-3-642-19591-4 (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN 978-3-540-87828-5, 181 pages (2009). References


Download ppt "Random Walks and Diffusions on Networks and Databases Dimitri Volchenkov (Bielefeld University)"

Similar presentations


Ads by Google