Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real world data analysis and interpretation

Similar presentations


Presentation on theme: "Real world data analysis and interpretation"— Presentation transcript:

1 Real world data analysis and interpretation
Dmitry Volchenkov Project FP7 – ICT MATHEMACS

2 Big challenges of big data
May 22, 2013 — A full 90% of all the data in the world has been generated over the last two years.

3

4 Such a “path integral” distance induces geometry (volumes)!
All possible paths are taken into account, some paths are more preferable then others. Such a “path integral” distance induces geometry (volumes)!

5 Data interpretation = equivalence partition
The data interpretation always is based on an equivalence partition on the set of walks over a database; Interpretation Evolution tree = the geometric image of the database under the equivalence partition suggested by Linnaeus Spices are the identical “walks” over the database of morphological taxa Systema Naturæ (1735)

6 Data interpretation = equivalence partition
The data interpretation always is based on an equivalence partition on the set of walks over a database; Interpretation does not necessary reveal a "true meaning" of the data, but rather represent a self-consistent point of view on that. “Astrological” equivalence partition: walks of the given length n starting at the same node are equivalent (Same day born people inherit a same/similar personality).

7 Plan of my talk Random Walks What is that?
Automated data interpretation Geometry of Data “Path integral” distance Examples How can we save Europe? Analysis of the GDP, Inequality, Polity data series

8 Random Walks: What is that?
Physical model Mathematical meaning Symmetry of route choice: the equivalent paths are equiprobable RW is a stochastic automorphism expressing structural symmetries: Equivalent walks are equiprobable

9 When equivalent paths are taken as equiprobable

10 Geometry of Data & Graphs
Path integral sums over all RWs to compute a propagator. Propagator is the Green’s function of the diffusion operator: The Drazin generalized inverse (the group inverse w.r.t. matrix multiplication) preserves symmetries of the Laplace operator: Given two distributions x,y, their scalar product: The (squared) norm of a distribution: The Euclidean distance:

11 Probabilistic geometry of graphs
First-passage time: Commute time: y1 First-passage time Commute time

12 Can we hear first-passage times?
F. Liszt Consolation-No1 V.A. Mozart, Eine Kleine Nachtmusik Bach_Prelude_BWV999 R. Wagner, Das Rheingold (Entrance of the Gods) P. Tchaikovsky, Danse Napolitaine

13 Can we hear first-passage times?
Recurrence time First-passage time Tonality: the hierarchy of harmonic intervals Tonality of music The basic pitches for the E minor scale are "E", "F#", "G", "A", "B". The recurrence time vs. the first passage time over 804 compositions of 29 Western composers.

14 Can we see the first-passage times?
(Mean) First passage time Tax assessment value of land ($) Manhattan, 2005 Federal Hall SoHo East Village Bowery East Harlem , , ,000 (Mean) first-passage times in the city graph of Manhattan

15 Why are mosques located close to railways?
NEUBECKUM: Social isolation vs. structural isolation

16 Principal components by random walks
Representations of graphs & databases in the probabilistic geometric space are essentially multidimensional! 1000 × 1000 data table (or a connected graph of 1000 nodes) is embedded into 999-dimensional space! Dimensions are unequal! ~ Kernel principal component analysis (KPCA) with the kernel

17 Nonlinear principal components by random walks
MILCH K = MILK In contrast to the covariance matrix which best explains the variance in the data with respect to the mean, the kernel G traces out all higher order dependencies among data entries.

18 Crisis for Europe as trust hits record low
How can we save Europe? Could random walks help us to approach the problem? Crisis for Europe as trust hits record low

19 … if we play the previous history.
No common trends for EU Maddison historical GDP data Kalman filter based on GDP data + Average over many evolution scenarios … if we play the previous history.

20 … if we play the previous history.
No common trends for EU SCENARIO #1 SCENARIO #2 … if we play the previous history. Economic recovery after the WWII came at different rates in different parts of Europe.

21 Traditional capital shelters thrive for larger variations
Maddison’s database retells us the story about recovering after the WWII Industrial countries have an edge on competitors as GDP variations are limited to ± $500/year at most Traditional capital shelters thrive for larger variations Fancy years are shown to elucidate tendencies

22 Fancy years are shown to elucidate tendencies
Maddison’s database predicts bankruptcy to the countries that remained uninvolved in the global recovery process. IRAQ Fancy years are shown to elucidate tendencies

23 To catch up with new tendencies, we have to add more databases
Evolution of political Regimes Inequality Democracy/Autocracy indices Top income shares

24 Trends in Governance in 1810 Trends in Governance in 2012
Polity IV tells us that “Political distance” – the minimal number of political changes (reforms) required to convert the political system of one country into that of another. Trends in Governance in 1810 Trends in Governance in 2012

25 Polity IV tells us that Positive feedback, reinforcing the multiplication of the number of polities; We witness the very beginning of a chain reaction process (of atomization of the polity landscape)

26 The World Top Income database
There are many inequality metrics Rising inequality marks wars Too poor to be rich, too rich to be poor I used the inverse Pareto-Lorenz coefficient (IPLC) The Pareto principle: income follows a power law probability distribution. → number of people → Vilfredo Pareto → wealth → Parabolic fit (sic!) … and wars multiply states

27 The World Top Income database
Rapidly rising inequality marks wars & conflicts Too poor to be rich, too rich to be poor Parabolic fit(!) … and wars multiply polities If the GDP-gain substantially outmatches/ lags below the mean (red line), it probably comes at the cost of increasing inequality.

28 7,563 governance configurations
Regulation of chief executive recruitment Openness of Executive Recruitment Competitiveness of Executive Recruitment Executive constrains Regulation of Participation Competitiveness of participation Unregulated Closed Unlimited Authority Intermediate Multiple identity Repressed Transitional Dual executive designation Selection Slight to moderate limitations Sectarian Suppressed Restricted Factional Dual executive election Substantial limitations Regulated Open Dual hereditary/competitive Competitive Executive Parity + Interruption (foreign occupation) + Interregnum (anarchy) + Transitional

29 232 configurations have been observed since 1800
"Tajikistan", 2013 Foreign interruption "Nepal", 1945 "Korea North", 2013 "Cuba", 2005 "Libya", 2010 "Thailand", 2013 "Korea South", 2013 "United States", 2013 "Czech Republic", 2013 New configurations arise from time to time … "Estonia", 2013

30 Random walks on the graph of political regimes
Transition matrix between types of governance Each political regime has its own dynamics for GDP and IPLC Process starts from the actual data (GDPPC & IPLC) for 2013 + Averaging over all collected histories Most transitions happen within the groups of authoritarian states and presidential republics, while liberal democracies and dictatorships are quite “sticky”.

31 Random walks on the graph of political regimes
A state insists on a common economic and political destiny of its citizens. However, the actual trends of different economic groups might be statistically inconsistent.

32 There can be a common European trend if …..
Germany vs. Greece if the workforce will be able to migrate freely, and polities will be able to split without wars. Back to the City-States? Possible splitting of a country is visible as the statistically inconsistent trends.

33 Polities proliferation score
Main factors resulting in multiplying scores: 1. inequality (stretches bandwidth of boxes); 2. Authoritarian regimes are short-lived, quickly transforming to other modes of authoritarianism, provoking instability Greece vs. Russia Expected number of countries

34 Strong inequality worsens perspectives, authoritarian governance worsens perspectives
USA vs. China IPLC ~ O(GDPPC2)

35 Battle in Asia, concord in Europe
Germany (dark) vs. Austria (light) China (red) vs. Indonesia (blue)

36 Conclusions RWs formalize the process of data interpretation playing the role of stochastic automorphisms RWs can be used in order to combine different (incomplete) databases Kernel Principal Component Analysis handles high-order dependences in data The method for summing up all RWs → Probabilistic geometry

37 Some references D.V., Ph. Blanchard, “Introduction to Random Walks on Graphs and Databases”, © Springer Series in Synergetics , Vol. 10, Berlin / Heidelberg , ISBN (2011). D.V., Ph. Blanchard, Mathematical Analysis of Urban Spatial Networks, © Springer Series Understanding Complex Systems, Berlin / Heidelberg. ISBN , 181 pages (2009). Volchenkov, D., “Markov Chain Scaffolding of Real World Data”, Discontinuity, Nonlinearity, and Complexity 2(3) 289–299 (2013)| DOI: /DNC Volchenkov, D., Jean-René Dawin, “Musical Markov Chains ”, International Journal of Modern Physics: Conference Series, 16 (1) , (2012) DOI: /S Volchenkov, D., Ph. Blanchard, J.-R. Dawin, “Markov Chains or the Game of Structure and Chance. From Complex Networks, to Language Evolution, to Musical Compositions”, The European Physical Journal - Special Topics 184, 1-82 © Springer Berlin / Heidelberg (2010). Volchenkov, D., “Random Walks and Flights over Connected Graphs and Complex Networks”, Communications in Nonlinear Science and Numerical Simulation, 16 (2011) 21–55 (2010).


Download ppt "Real world data analysis and interpretation"

Similar presentations


Ads by Google