Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Walks for Data Analysis

Similar presentations


Presentation on theme: "Random Walks for Data Analysis"— Presentation transcript:

1 Random Walks for Data Analysis
Dima Volchenkov (Bielefeld University) Discrete and Continuous Models in the Theory of Networks

2

3 Data come to us in a form of data tables:
Binary relations:

4 Data validation & Network stability analysis; Data modeling.
Data come to us in a form of data tables: Binary relations: Classes of tasks: Data interpretation; Data validation & Network stability analysis; Data modeling.

5 Data interpretation Only local information is available at a time;
Lack of global intuitive geometric structure (binary relations/comparison instead of geometry). Intuitive ideas: The data may “live” on some geometric manifold. We need a manifold learning strategy. Data geometrization

6 Example: Data interpretation
Nature as a data-network

7 Example: Data interpretation
Linnaeus - Systema Naturæ (1735) The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. The data classification/ judgment is always based on introduction of equivalence relations on the set of walks over the database of atributes: Nature as a data-network

8 Example: Data interpretation
Linnaeus - Systema Naturæ (1735) The Linnaean classes for plants: Classis 1. Monandria: flowers with 1 stamen Classis 2. Diandria: flowers with 2 stamens Classis 3. Triandria: flowers with 3 stamens Classis 4. Tetrandria: flowers with 4 stamens Classis 5. Pentandria: flowers with 5 stamens Classis 6. Hexandria: flowers with 6 stamens … etc. The data classification/ judgment is always based on introduction of equivalence relations on the set of walks over the database of attributes: Lamarque Darwin Nature as a data-network Theory of evolution

9 Data validation & Network stability analysis
Does data have an “internal logic” that could help to select proper values? Is there an “internal network dynamics”? Can the structure cause changes in itself? BRAESS'S PARADOX: adding extra capacity to a network can in some cases reduce overall performance.

10 Data modeling Apparent units/nodes are not “natural”;
Step1 Assign dynamical variable to each node/ data-entry/ entity Step 2 Write a “Schrödinger equation” from the latest physics paper Step 3 upload the new paper to arXiv Algorithm of doing data science by a “bad” physicist: The system is rather complex: Apparent units/nodes are not “natural”; Too many degrees of freedom for any reasonable equation Only a few main traits can be modeled Collective variables Complexity reduction

11 Equivalence partitions of walks => random walks
The data classification is always based on introduction of equivalence relations on the set of walks over the database. Examples: Rx: walks of the given length n starting at the same node x are equivalent Ry: walks of the given length n ending at the same node y are equivalent Rx  Ry : walks of the given length n between the nodes x and y are equivalent

12 Example of equivalence partitions over databases …
Astrology has been dated to the 3rd millennium BCE Same day born people inherit a same/similar personality….

13 Equivalence partitions of walks => random walks
The data classification is always based on introduction of equivalence relations on the set of walks over the database. Given an equivalence relation on the set of walks and a function such that we can always normalize it to be a probability function: all “equivalent” walks are equiprobable. Partition into equivalence classes of walks The utility function for each equivalence class A random walk transition operator between eq. classes

14 We proceed in three steps:
Step 0: Given an equivalence relation between paths, any transition can be characterized by a probability to belong to an equivalence class. Different equivalence relations  Different equivalence classes  Different probabilities Step 1: “Probabilistic graph theory” Nodes of a graph, subgraphs (sets of nodes) of the graph, the whole graph are described by probability distributions & characteristic times w.r.t. different Markov chains; Step 2: “Geometrization of Data Manifolds” Establish geometric relations between those probability distributions whenever possible; 1. Coarse-graining/reduction of networks & databases → data analysis; sensitivity to assorted data variations; 2. Monge-Kontorovich type problems, Optimal transport → distances between distributions;

15 An example of equivalence relation:
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Equiprobable walks: the nearest neighbor random walks Stochastic normalization

16 An example of equivalence relation:
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Equiprobable walks: the nearest neighbor random walks Stochastic normalization Probability of a n-walk

17 An example of equivalence relation:
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Equiprobable walks: Stochastic normalization Probability of a n-walk “Structure learning”

18 An example of equivalence relation:
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Equiprobable walks: Stochastic normalization Probability of a n-walk “Structure learning” Stochastic normalization

19 What is a neighbourhood?
Who are my neighbours in a given classification ? 1.Neighbours are next to me… 2.Neighbours are 2 steps apart from me… n - Neighbours are n steps apart from me … My neighbours are those, whom I can visit equiprobably (w.r.t. a chosen equivalence of paths)…

20 An example of equivalence relation:
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Equiprobable walks: Stochastic matrices:

21 A variety of random walks at different scales
Step 0 A variety of random walks at different scales An example of equivalence relation: Rx: walks of the given length n starting at the same node x are equivalent Left eigenvectors (m=1) Centrality measures: Equiprobable walks: Stochastic matrices: The “stationary distribution” of the nearest neighbor RW

22 Random walks of different scales
Time is introduced as powers of transition matrices

23 Random walks of different scales
Time is introduced as powers of transition matrices

24 Random walks of different scales
Time is introduced as powers of transition matrices

25 Random walks of different scales
Time is introduced as powers of transition matrices

26 Random walks of different scales
Time is introduced as powers of transition matrices

27 Random walks of different scales
Time is introduced as powers of transition matrices

28 Random walks of different scales
Time is introduced as powers of transition matrices

29 Random walks of different scales
Time is introduced as powers of transition matrices

30 Random walks of different scales
Time is introduced as powers of transition matrices

31 Random walks of different scales
Time is introduced as powers of transition matrices

32 Random walks of different scales
Time is introduced as powers of transition matrices

33 Random walks of different scales
Time is introduced as powers of transition matrices

34 Random walks of different scales
Time is introduced as powers of transition matrices

35 Random walks of different scales
Time is introduced as powers of transition matrices

36 Random walks of different scales
Time is introduced as powers of transition matrices

37 Random walks of different scales
Time is introduced as powers of transition matrices

38 Random walks of different scales
Time is introduced as powers of transition matrices

39 Random walks of different scales
Time is introduced as powers of transition matrices

40 Random walks of different scales
Time is introduced as powers of transition matrices

41 Random walks of different scales
Time is introduced as powers of transition matrices Still far from stationary distribution! Stationary distribution is already reached! Defect insensitive. Low centrality (defect) repelling.

42 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

43 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

44 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

45 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

46 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

47 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

48 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

49 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

50 Random walks for different equivalence relations
Nearest neighbor RW Maximal entropy RW J. K. Ochab, Z. Burda

51 Random walks for different equivalence relations
Nearest neighbor RW “Maximal entropy” RW J. K. Ochab, Z. Burda

52 Step 1: “Probabilistic graph theory”
As soon as we define an equivalence relation … In classical graph theory: The shortest-path distance, insensitive to the structure of the graph: The distance = “a Feynman path integral” sensitive to the global structure of the graph. Systems of weights are related to each other in a geometric fashion.

53 Step 1: “Probabilistic graph theory”
As soon as we define an equivalence relation … Graph Subgraph (a subset of nodes) Node Time scale Tr T The probability that the RW stays at the initial node in 1 step. “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Probabilistic graph invariants = the t-steps recurrence probabilities quantifying the chance to return in t steps. | det T | The probability that the RW revisits the initial node in N steps. Return times to the subgraphs within transients = 1/Pr{ … } Centrality measures (stationary distributions) Return times to a node Random target time Mixing times over subgraphs (times until the Markov chain is "close" to the steady state distribution)

54 Step 1: “Probabilistic graph theory”
As soon as we define an equivalence relation … Graph Subgraph (a subset of nodes) Node Time scale Tr T The probability that the RW stays at the initial node in 1 step. “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Probabilistic graph invariants = the t-steps recurrence probabilities quantifying the chance to return in t steps. | det T | The probability that the RW revisits the initial node in N steps. Return times to the subgraphs within transients = 1/Pr{ … } Centrality measures (stationary distributions) Return times to a node Random target time Mixing times over subgraphs (times until the Markov chain is "close" to the steady state distribution)

55 Recurrence probabilities as principal invariants of the graph
The Cayley – Hamilton theorem: where the roots m are the eigenvalues of T, and {Ik}Nk=1 are its principal invariants, with I0 = 1. Kolmogorov- Chapman equation: |Ik| are the k-steps recurrence probabilities quantifying the chance to return in k steps. |I1| = Tr T is the probability that a random walker stays at a node in one time step, |IN| = |det T| expresses the probability that the random walks revisit an initial node in N steps.

56 Step 1: “Probabilistic graph theory”
As soon as we define an equivalence relation … Graph Subgraph (a subset of nodes) Node Time scale Tr T The probability that the RW stays at the initial node in 1 step. “Wave functions” (Slater determinants) of transients (traversing nodes and subgraphs within the characteristic scales) return the probability amplitudes whose modulus squared represent the probability density over the subgraphs. Probabilistic graph invariants = the t-steps recurrence probabilities quantifying the chance to return in t steps. | det T | The probability that the RW revisits the initial node in N steps. Return times to the subgraphs within transients = 1/Pr{ … } Centrality measures (stationary distributions) Return times to a node Random target time Mixing times over subgraphs (times until the Markov chain is "close" to the steady state distribution)

57 Analogy with fermionic systems

58 Analogy with fermionic systems
The determinants of minors of the kth order of Ψ define an orthonormal basis in the

59 Analogy with fermionic systems
The squares of these determinants define the probability distributions over the ordered sets of k indexes: satisfying the natural normalization condition,

60 Describe currents of random walkers: Analogy with fermionic systems
The squares of these determinants define the probability distributions over the ordered sets of k indexes: satisfying the natural normalization condition, The simplest example is the stationary distribution of random walks:

61 As soon as we get probability distributions…
Step 2: “Geometrization of Data Manifolds” As soon as we get probability distributions… Given T, L ≡ 1- T , the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse:

62 As soon as we get probability distributions…
Step 2: “Geometrization of Data Manifolds” As soon as we get probability distributions… Given T, L ≡ 1- T , the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, The (squared) norm of a vector and the angle The Euclidean distance:

63 Transport problems of the Monge-Kontorovich type
Step 2: “Geometrization of Data Manifolds” As soon as we get probability distributions… Given T, L ≡ 1- T , the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, Transport problems of the Monge-Kontorovich type The (squared) norm of a vector and an angle The Euclidean distance: “First-passage transportation” from x to y x y W(x→y) W(y→x)

64 Transport problems of the Monge-Kontorovich type
Step 2: “Geometrization of Data Manifolds” As soon as we get probability distributions… Given T, L ≡ 1- T , the linear operators acting on distributions. The Green function is the natural way to find the relation between two distributions within the diffusion process Drazin’s generalized inverse: Given two distributions x,y over the set of nodes, we can define a scalar product, Transport problems of the Monge-Kontorovich type The (squared) norm of a vector and an angle The Euclidean distance: (Mean) first-passage time Commute time Electric potential Effective resistance distance Tax assessment land price in cities Musical diatonic scale degree Musical tonality scale

65 Example 1: Nearest-neighbor random walks on undirected graphs
y1

66 Example 1: Nearest-neighbor random walks on undirected graphs
y1 The spectral representation of the (mean) first passage time, the expected number of steps required to reach the node i for the first time starting from a node randomly chosen among all nodes of the graph accordingly to the stationary distribution π. The commute time, the expected number of steps required for a random walker starting at i ∈ V to visit j ∈ V and then to return back to i,

67 Example 2: First-passage times in cities
Manhattan, 2005 Neubeckum, Germany, 2012 Tax assessment value of land ($) (Mean) First passage time Some places in urban environments are easily accessible, others are not; well accessible places are more favorable to public, while isolated places are either abandoned, or misused. In a long time perspective, inequality in accessibility results in disparity of land prices: the more isolated a place is, the less its price would be. In a lapse of time, structural isolation would cause social isolation, as a host society occupies the structural focus of urban environments, while the guest society would typically reside in outskirts, where the land price is relatively cheap.

68 Around The City of Big Apple
Federal Hall Public places City CORE Times Square SoHo City CORE 10 steps 100 East Village steps 500 (Mean) first-passage times in the city graph of Manhattan steps 1,000 steps Bowery East Harlem City Decay steps 5,000 steps 10,000 SLUM

69 Example 3: Electric Resistance Networks, Resistance distance
An electrical network is considered as an interconnection of resistors: The currents are described by the Kirchhoff circuit law:

70 Example 2: Electric Resistance Networks, Resistance distance
An electrical network is considered as an interconnection of resistors: The currents are described by the Kirchhoff circuit law: Given an electric current from a to b of amount 1 A, the effective resistance of a network is the potential difference between a and b, The effective resistance allows for the spectral representation:

71 Impedance networks: The two-point impedance and LC resonances

72 Resonances

73 Resonances

74 (Complexity reduction) PCA Based on Geodesics
Small data variations rise small changes to the eigenvectors (rotations) and eigenvalues of the symmetric transition operator, so that we can consider the image of the database as a “probabilistic manifold” in PRN-1. Geodesics on the sphere are “big circles”. PRN-1 PCA is performed in the tangential space, then “principal directions” are projected onto geodesics. The result is an ordered sum of assorted data variations.

75 Geodesics paths of language evolution
Levenshtein’s distance (Edit distance): is a measure of the similarity between two strings: the number of deletions, insertions, or substitutions required to transform one string into another. MILCH K = MILK The normalized edit distance between the orthographic realizations of two words can be interpreted as the probability of mismatch between two characters picked from the words at random.

76 The four well-separated monophyletic spines represent the four biggest traditional IE language groups: Romance & Celtic, Germanic, Balto-Slavic, and Indo-Iranian; The Greek, Romance, Celtic, and Germanic languages form a class characterized by approximately the same azimuth angle (belong to one plane); The Indo-Iranian, Balto-Slavic, Armenian, and Albanian languages form another class, with respect to the zenith angle.

77 The systematic sound correspondences between the Swadesh’s words across the different languages perfectly coincides with the well-known centum-satem isogloss of the IE family (reflecting the IE numeral ‘100’), related to the evolution in the phonetically unstable palatovelar order.

78 The normal probability plots fitting the distances r of language points from the ‘center of mass’ to univariate normality. The data points were ranked and then plotted against their expected values under normality, so that departures from linearity signify departures from normality.

79 The univariate normal distribution is closely related to the time evolution of a mass-density function under homogeneous diffusion in one dimension in which the mean value μ is interpreted as the coordinate of a point where all mass was initially concentrated, and variance σ2 ∝ t grows linearly with time. The values of variance σ2 give a statistically consistent estimate of age for each language group. the last Celtic migration (to the Balkans and Asia Minor) (300 BC), the division of the Roman Empire (500 AD), the migration of German tribes to the Danube River (100 AD), the establishment of the Avars Khaganate (590 AD) overspreading Slavic people who did the bulk of the fighting across Europe. Anchor events:

80 From the time–variance ratio we can retrieve the probable dates for:
The break-up of the Proto-Indo-Iranian continuum. The migration from the early Andronovo archaeological horizon (Bryant, 2001). by 2,400 BC The end of common Balto-Slavic history before 1,400 BC The archaeological dating of Trziniec-Komarov culture The separation of Indo-Arians from Indo-Iranians. Probably, as a result of Aryan migration across India to Ceylon, as early as in 483BC (Mcleod, 2002) before 400 BC The division of Persian polity into a number of Iranian tribes, after the end of Greco-Persian wars (Green, 1996). before 400 BC

81 Proto-Indo-Europeans?
The Kurgan scenario postulating the IE origin among the people of “Kurgan culture”(early 4th millennium BC) in the Pontic steppe (Gimbutas,1982) . Einkorn wheat The Anatolian hypothesis suggests the origin in the Neolithic Anatolia and associates the expansion with the Neolithic agricultural revolution in the 8th and 6th millennia BC (Renfrew,1987). The graphical test to check three-variate normality of the distribution of the distances of the five proto-languages from a statistically determined central point is presented by extending the notion of the normal probability plot. The χ-square distribution is used to test for goodness of fit of the observed distribution: the departures from three-variant normality are indicated by departures from linearity. The use of the previously determined time–variance ratio then dates the initial break-up of the Proto-Indo-Europeans back to 7,400 BC pointing at the early Neolithic date.

82 In search of Polynesian origins
The components probe for a sample of 50 AU languages immediately uncovers the both Formosan (F) and Malayo-Polynesian (MP) branches of the entire language family. Headhunters

83

84 Mystery of the Tower of Babel
Nonliterate languages evolve EXPONENTIALLY FAST without extensive contacts with the remaining population. Isolation does not preserve a nonliterate language! Headhunters

85 I dress like everyone, I don’t care, I dress like no other
It is believed that fashion refers to a distinctive and often habitual trend in the style with which a person dresses. These trends are different for different cultures and places even across Europe. London Wrocław Is it possible to assess these trends quantitatively and to identify a personal strategy of self-representation in individuals?

86 Appearance assessment
Appearance is encoded by a string of attributes (symbols) 28 attributes for men, including age 47 attributes for women, including age

87 Appearance edit distance (Spot the Difference Game)
A string metric measures distance as the number of operations required to transform a string encoding one appearance into another.

88 Appearance edit distance (Spot the Difference Game)
A string metric measures distance as the number of operations required to transform a string encoding one appearance into another. Appearance edit distance = 8

89 The matrix of appearance edit distances for 28 women in London
Analyzed databases Men Women Wrocław 85 113 London 14 28 The matrix of appearance edit distances for 28 women in London 8 11 7 9 10 23 13 14 17 27 12 22 16 26 18 15 21 3 6 19 5 25 20 24

90 Phylogenetic (neighbor joining) trees for human appearance in Wrocław
The matrix of appearance edit distances can be visualized by phylogenetic unrooted trees that illustrate the relatedness of appearances. The goodness-of-fit to the distance matrix 62.2% The goodness-of-fit to the distance matrix 78.7% The simple relation of ancestry basic for a tree structure cannot grasp complex distinctive and often habitual trends in the style!

91 Phylogenetic (neighbor joining) trees for human appearance in London
The matrix of appearance edit distances can be visualized by phylogenetic unrooted trees that illustrate the relatedness of appearances. The goodness-of-fit to the distance matrix 85.3 % The goodness-of-fit to the distance matrix 89.5 % The simple relation of ancestry basic for a tree structure cannot grasp complex distinctive and often habitual trends in the style!

92 Geometrization of appearance by random walks
The appearance data for 113 women of Wrocław shown in the coordinates of 3 main style traits calculated over the edit distance data matrix. The matrix of appearance edit distances can be considered as an adjacency matrix of a complete graph with weighted edges.

93 Geometrization of appearance data
The main data trend → 2nd principal component of data → 1st principal component of data → The appearance data for 113 women of Wrocław shown in the coordinates of 2 main style traits calculated over the edit distance data matrix.

94 THREE statistically different types of appearance
Distribution of points along the linear trend The main data trend → 2nd principal component of data → 1st principal component of data → The appearance data for 113 women of Wrocław shown in the coordinates of 2 main style traits calculated over the edit distance data matrix. There are THREE statistically different types of appearance

95 THREE statistically different types of appearance
Distribution of points along the linear trend “I dress like everyone” (Gaussian statistics) “I don't care how I look” (Maxwell-Boltzman statistics) “I dress like no other” (Fermi-Dirac statistics)

96 I dress like everyone, I don’t care, I dress like no other
Women are more likely to follow a common style, to dress like everyone else.

97 I dress like everyone, I don’t care, I dress like no other

98 Traps and landmarks Recurrence time First-passage time:
Landmarks, “guiding structures”: firstly reached , seldom revisited Traps, “confusing environments”: can take long to reach, but often revisited

99 Musical Dice Game (*) The relations between notes in (*) are rather described in terms of probabilities and expected numbers of random steps than by physical time. Thus the actual length N of a composition is formally put N → ∞, or as long as you keep rolling the dice.

100 F. Liszt Consolation-No1
Bach_Prelude_BWV999 R. Wagner, Das Rheingold (Entrance of the Gods) V.A. Mozart, Eine-Kleine-Nachtmusik

101 A “guiding structure”: Tonality scales in Western music
Increase of harmonic interval/ first –passage time The recurrence time vs. the first passage time over 804 compositions of 29 Western composers. Recurrence time First-passage time

102 Network geometry at different scales
First-passage time Scale of RW The node belongs to a network “core”, consolidating with other central nodes The node belongs to a “cluster”, loosely connected with the rest of the network. Network geometry at different scales

103 Possible analogy with Ricci flows
“Densification” of the network of “positive curvature” “Contraction” of a “probabilistic manifold” First-passage time Scale of RW A “collapse” of the network of “negative curvature”

104 Ricci flows and photo resolution

105 Intelligibility of a network/database
Increase of harmonic interval/ first –passage time First-passage time Recurrence time Property of a node w.r.t. to a global structure Property of a node w.r.t. to a local structure

106 n →  Intelligibility of a network/database Recurrence time
Increase of harmonic interval/ first –passage time After enough learning, any structure becomes intelligible! Recurrence time Property of a node w.r.t. to a local structure n →  First-passage time Property of a node w.r.t. to a global structure

107 References D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics , Vol. 10, Berlin / Heidelberg , ISBN (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN , 181 pages (2009).


Download ppt "Random Walks for Data Analysis"

Similar presentations


Ads by Google