Download presentation
Presentation is loading. Please wait.
Published byJacob Hodges Modified over 9 years ago
2
Data and networks GIACS Conference Palermo 9-4-08
3
GIACS PALERMO 9-4-08 Networks
4
GIACS PALERMO 9-4-08 Correlation based Minimal Spanning Tree 1071 stocks traded at NYSE between 1987-1998 Different colours refers to different SIC sectors Correlation based Minimal Spanning Tree Artificial market of 1071 stocks According the one factor model. Different colours refers to different SIC sectors Networks as an instrument of Data Filtering Topology of correlation based minimal spanning trees in real and model markets G. Bonanno, G. Caldarelli F. Lillo, R. Mantegna, Physical Review E 68 046130 (2003). Networks of equities in financial markets G. Bonanno, GC, F. Lillo, S. Miccichè, N. Vandewalle, R. N. Mantegna, European Physical Journal B 38 363-372 (2004).
5
GIACS PALERMO 9-4-08 COSIN (official number IST-20001-33555) was a Research Project financed by European Commission through the Fifth Framework Programme. COSIN is part of the actions taken by the Future and Emerging Technologies (FET) in the priority area of research of Information Society Technologies (IST) (http://www.cordis.lu/IST/FET) Documents at http://www.cosinproject.org The Cosin project
6
GIACS PALERMO 9-4-08 COSIN involves 7 different nodes in 5 countries A. (Ph +CS) Roma, Italy B. (Ph) Barcelona, Spain C. (Ph) Lausanne, Switzerland D. (Ph) Ens, Paris, France E. (CS) Karlsruhe, Germany F. (Ph) Upsud, Paris, France EU countries 2001 Non EU countries 2001 EU COSIN participant Non EU COSIN participant The Cosin project
7
GIACS PALERMO 9-4-08 G. Bonanno, G. Caldarelli, F.Colaiori, G. Di Battista, D. Donato, S. Leonardi, R. Mantegna, A. Marchetti-Spaccamela, M. Patrignani, L. Pietronero, V. Servedio A. Arenas, M. Boguña, A. Díaz-Guilera, R. Ferrer i Cancho, M.A. Muñoz, M.A Serrano, R. Pastor-Satorras G. Bianconi, A. Capocci, P. De Los Rios, T. Erlebach, T. Petermann, Y.-C. Zhang A. Barrat. S. Battiston, P. Nadal, A. Vespignani, G. Weisbuch, U. Brandes, M. Gaertler, M. Kaufmann, D. Wagner, Some of the Cosin people
8
GIACS PALERMO 9-4-08 1. To develop a unified set of Complex Systems theoretical methodologies for the characterization of Complex Networks, 2. To develop statistical models for networks growth and evolution. 3. To collect data mainly for Internet and World Wide Web 4. To extend analysis to social and economic networks 5. To develop visualization tools for large scale systems 6. To disseminate results through publication, conferences and project web site. The Cosin project
9
GIACS PALERMO 9-4-08 1.After three years of activity we have a common ground of methodologies and tools at least between computer scientists and physicists (also some economists). Some more effort would be necessary to integrate social scientists. 2.We provided a class of models for network growth and evolution, moreover we addressed the study of statistical properties of weighted networks. 3.Data collection for Internet and World Wide Web resulted much more difficult than expected. Actually larger consortia have been funded specifically for this task in the meanwhile. Thank to external collaboration we still found the data to validate the models we produced A Cosin summary
10
GIACS PALERMO 9-4-08 4.In economic and financial networks, COSIN people are on the frontline of this very new field of research. This new approach attracted the interest of the community at level of Nobel laureates. Less successful has been the impact in social science. Unexpected and very successful has been the impact on biology (botany, zoology). 5.Standard visualization problem wants to keep all the graph structure and present it suitably. On this point some progress has been made, it is worth to mention that several ideas are now under consideration for the visualization of ``simplified graphs’’. 6.The project had a considerable impact on the scientific community in terms of citations, visibility, conferences, schools, books and data download from site. Maybe some more work could be done for the general public.
11
GIACS PALERMO 9-4-08 The graph of scientific collaborations on scale-free networks in statistical physics M.E.J Newman PRE 69 026113 (2004)
12
GIACS PALERMO 9-4-08 More than 150 referred papers (some of them Nature, PNAS, PRL, LNCS) Lectures and talks in the various world conference (for physics STATPHYS, APS Meetings) and invited talks in various institutions Books Dissemination
13
GIACS PALERMO 9-4-08 The Sitges Conference published the proceedings of the most interesting talks on a special volume Statistical Mechanics of Complex Networks Series: Lecture Notes in Physics, Vol. 625 Pastor-Satorras, Romualdo; Rubi, Miguel; Diaz-Guilera, Albert (Eds.) 2003, XII, 206 p., Hardcover ISBN: 3-540-40372-8 The Rome Conference published the proceeding on a special issue of the European Physical Journal B
14
GIACS PALERMO 9-4-08 Web site
15
GIACS PALERMO 9-4-08 Trivially, the access to data was crucial for the project We had that in some cases we found very nice datasets and could work on them 1. Internet (AS topology) 2. Wikipedia. In presence of poor or no data, we obtained (of course) only partial results 1. Liquidity shocks, 2. River networks What about data?
16
GIACS PALERMO 9-4-08 STATISTICAL PROPERTIES OF THE WIKIGRAPH L.S. Buriol A. Capocci, F. Colaiori, D. Donato, S. Leonardi, F. Rao, V. Servedio, GC Centro “E. Fermi” 1.Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia A.Capocci, F. Rao, GC Europhysics Letters 81 28008 (arXiv:0710.3058) (2008) 2.Preferential attachment in the growth of social networks: the Internet encyclopedia Wikipedia A. Capocci, V.D.P. Servedio, F. Colaiori, L.S. Buriol, D. Donato, S. Leonardi, GC Physical Review E 74 036116 (2006).
17
GIACS PALERMO 9-4-08 Wikipedia intro
18
GIACS PALERMO 9-4-08 Wikipedia in other languages You may read and edit articles in many different languages: Wikipedia encyclopedia languages with over 100,000 articles Deutsch (German) · Français (French) · Italiano (Italian) · (Japanese) · Nederlands (Dutch) · Polski (Polish) · Português (Portuguese) · Svenska (Swedish) Wikipedia encyclopedia languages with over 10,000 articles العربية (Arabic) · Български (Bulgarian) · Català (Catalan) · Česky (Czech) · Dansk (Danish) · Eesti (Estonian) · Español (Spanish) · Esperanto · Galego (Galician) · עברית (Hebrew) · Hrvatski (Croatian) · Ido · Bahasa Indonesia (Indonesian) · 한국어 (Korean) · Lietuvių (Lithuanian) · Magyar (Hungarian) · Bahasa Melayu (Malay) · Norsk bokmål (Norwegian) · Norsk nynorsk (Norwegian) · Română (Romanian) · Русский (Russian) · Slovenčina (Slovak) · Slovenščina (Slovenian) · Српски (Serbian) · Suomi (Finnish) · Türkçe (Turkish) · Українська (Ukrainian) · 中文 (Chinese) Wikipedia encyclopedia languages with over 1,000 articles Alemannisch (Alemannic) · Afrikaans · Aragonés (Aragonese) · Asturianu (Asturian) · Azərbaycan (Azerbaijani) · Bân-lâm-gú (Min Nan) · Беларуская (Belarusian) · Bosanski (Bosnian) · Brezhoneg (Breton) · Чăваш чěлхи (Chuvash) · Corsu (Corsican) · Cymraeg (Welsh) · Ελληνικά (Greek) · Euskara (Basque) · فارسی (Persian) · Føroyskt (Faroese) · Frysk (Western Frisian) · Gaeilge (Irish) · Gàidhlig (Scots Gaelic) · हिन्दी (Hindi) · Interlingua · Íslenska (Icelandic) · Basa Jawa (Javanese) · ქართული (Georgian) · ಕನ್ನಡ (Kannada) · Kurdî / كوردی (Kurdish) · Latina (Latin) · Latviešu (Latvian) · Lëtzebuergesch (Luxembourgish) · Limburgs (Limburgish) · Македонски (Macedonian) · मराठी (Marathi) · Napulitana (Neapolitan) · Occitan · Ирон (Ossetic) · Plattdüütsch (Low Saxon) · Scots · Sicilianu (Sicilian) · Simple English · Shqip (Albanian) · Sinugboanon (Cebuano) · Srpskohrvatski/Српскохрватски (Serbo–Croatian) · தமிழ் (Tamil) · Tagalog · ภาษาไทย (Thai) · Tatarça (Tatar) · తెలుగు (Telugu) · Tiếng Việt (Vietnamese) · Walon (Walloon) Complete list · Multilingual coordination · Start a Wikipedia in another language Wikipedia intro
19
GIACS PALERMO 9-4-08 The datasets of each language are available in two selfextracting files for mysql database. The table cur contains the current on- line articles, whereas the table old contains all previous versions of each current article. Old versions of an article are identified for using the same title, and not the same id. The dataset dumps are updated almost weekly, so the current graph is usually not more than a week old. For generating a graph from the link structure of a dataset, each article is considered a node and each hyperlink between articles is a link in this graph. In the wikipedia datasets, each webpage is a single article. An article also might contain some external links that point pages outside the dataset. Usually wikipedia articles has no external links, or just a few of them. These kind of links are not considered for generating the wikigraphs, since we want to restrict the graph to pages into the set being analyzed. Wikipedia intro
20
GIACS PALERMO 9-4-08 sociological reasons: the encyclopedia collects pages written by a number of indipendent and eterogeneous individuals. Each of them autonomously decides about the content of the articles with the only constraint of a prefixed layout. The autonomy is a common feature of the content creation in the Web. The wikipedia authors’ community is formed by members whose only wish is to make available to the world concepts and topics that they consider meaningful. In some sense, tracing the evolution of the wikipedia subsets should mirror the develop of significant trends within each linguistic community. generation on time: wikipedia provides time information associated with nodes. Moreover, it provides old information: time information for the creation and the modifications for each page on the dataset. independency of external links: wikipedia articles link mainly to articles on the same dataset. variety of graph sizes: it can be collected one graph by language, and the graph dimensions vary from a few hundred pages up to half million pages. Wikipedia interests
21
GIACS PALERMO 9-4-08 Summarizing: We have available all the history of growth, so that we can study the evolution We have an example of a “social” network of huge size We can compare the system produced by users of different language, thereby measuring the effect of different cultures. We can study Wikipedia as a case study for the World Wide Web WE RECOVER A PREFERENTIAL ATTACHMENT MECHANISM FROM THE DATA. DIFFERENT LANGUAGES PRODUCE SIMILAR STRUCTURES WE FIND A SYSTEM SIMILAR TO THE WWW EVEN IF THE MICROSCOPIC RULE OF GROWTH IS VERY DIFFERENT. Results
22
GIACS PALERMO 9-4-08 We generated six wikigraphs, wikiEN, wikiDE, wikiFR, wikiES, wikiIT and wikiPT, generated from the English, German, French, Spanish, Italian and Portuguese datasets, respectively. The graphs were obtained from an old dump of June 13, 2004. We are not using the current data due to disk space restrictions. The English dataset of June 2005 has more than 36 GB compacted, that is about 200 GB expanded. The page that was mostly visited was the main pages for wikiEN, wikiDE, wikiFR and wikiES, while that for the datasets wikiIT and wikiPT there were no visits associated with the pages. The Wiki graphs
23
GIACS PALERMO 9-4-08 SCC (Strongly Connected Component) includes pages that are mutually reachable by traveling on the graph IN component is the region from which one can reach SCC OUT component encompasses the pages reached from SCC. TENDRILS are pages reacheable from the IN component,and not pointing to SCC or OUT region TENDRILS also includes those pages that point to the OUT region not belonging to any of the other defined regions. TUBES connect directly IN and OUT regions, DISCONNECTED regions are those isolated from the rest. The Bow-tie structure, found in the WWW (Broder et al. Comp. Net. 33, 309, 2000)
24
GIACS PALERMO 9-4-08 The percentage of the various components of the Wikigraph for the various languages. The measure/size of the Wikigraph for the various languages. The Wikigraphs
25
GIACS PALERMO 9-4-08 in–degree(empty) and out–degree(filled) Occurrency distributions for the Wikgraph in English ( ○ ) and Portuguese ( ). The Degree shows fat tails that can be approximated by a power-law function of the kind P(k) ~ k - Where the exponent is the same both for in-degree and out- degree. In the case of WWW 2 ≤ in ≤ 2.1 Power laws (what else? )
26
GIACS PALERMO 9-4-08 The average neighbors’ in–degree, computed along incoming edges, as a function of the in– degree for the English (○) and Portuguese ( ) As regards the assortativity (as measured by the average degree of the neighbours of a vertex with degree k) there is no evidence of any assortative behaviour. Correlations
27
GIACS PALERMO 9-4-08 The pagerank distribution for wikiEN is a power law function with γ = 2.1. Previous measures in webgraphs also exhibit the same behaviour for the pagerank distribution. We list the number of visits of the top ranked pages just to show that this value is not related with the pagerank values. We confirm that very little correlation was found between the link analysis characteristics and the actual number of visits. PageRank
28
GIACS PALERMO 9-4-08 Given the history of growth one can verify the hypothesis of preferential attachment. This is done by means of the histogram (k) who gives the number of vertices (whose degree is k) acquiring new connections at time t. This is quantity is weighted by the factor N(t)/n(k,t) We find preferential attachment for in and out degree. English (○) and Portuguese ( ). White= in-degree Filled = out-degree Preferential attachment
29
GIACS PALERMO 9-4-08 In our opinion the nature of this preferential attachment is effective ratther than the real driving force in the phenomenon. In other words the linear preferential attachment can be originated by a copying procedure (new vertices are introduced by copying old ones and keeping most of the edges). Also we could have a sort of fitness for the various entries (but in this case one has a multidimensional series of quantities describing the importance of one page). Apart the interpretation the data show a rather clear LINEAR PREFERENTIAL ATTACHMENT Preferential attachment
30
GIACS PALERMO 9-4-08 Other power-laws related to dyamics need to be explained For example the number of updates also follows a power law. Each point presents the number of nodes (y axis) that were updated exactly x times. Updates’ statistics
31
GIACS PALERMO 9-4-08 We introduced an evolution rule, similar to other models of rewiring already considered*, At each time step, a vertex is added to the network. It is connected to the existing vertices by M oriented edges; the direction of each edge is drawn at random: with probability R 1 the edge leaves the new vertex pointing to an existing one chosen with probability proportional to its in– degree; with probability R 2, the edge points to the new vertex, and the source vertex is chosen with probability proportional to its out–degree. Finally, with probability R 3 = 1 − R 1 − R 2 the edge is added between existing vertices: the source vertex is chosen with probability proportional to the out–degree, while the destination vertex is chosen with probability proportional to the in–degree. * See for example Krapivsky Rodgers and Redner PRL 86 5401 (2001) Wikipedia growth model
32
GIACS PALERMO 9-4-08 Actually 1)This network is oriented. 2)The preferential attachment in Wikipedia has a somewhat different nature. Here, most of the times, the edges are added between existing vertices differently from the BA model. For instance, in the English version of Wikipedia a largely dominant fraction 0.883 of new edges is created between two existing pages, while a smaller fraction of edges points or leaves a newly added vertex (0.026 and 0.091 respectively). From these data it seems that a model in the spirit of BA could reproduce most of the features of the system. Wikipedia growth model
33
GIACS PALERMO 9-4-08 The model can be solved analytically P(k in ) ~ k in - in in 1-R 2 )) P(k out ) ~ k out out out 1-R 1 )) We can use for the model the empirical values of R 1 =0.026 R 2 =0.091 R 3 =0.883 Already measured for the English version of Wikigraph in out Wikipedia growth model
34
GIACS PALERMO 9-4-08 The model can be solved analytically K nn in (k in ) ~ M N 1-R 1 R 1 R 2 /R 3 (R 3 ≠0) K nn in (k in ) ~ M R 1 R 2 ln (N) (R 3 =0) Both cases is constant The value of the constant depends also upon the initial conditions. The two lines refer to two realizations of the model where in one case the 0.5% of the first vertices has been removed. Wikipedia growth model
35
GIACS PALERMO 9-4-08 We have a structure that resembles the bow-tie of the WWW We have a power-law decay for the degree distributions and also a power-law decay for the number of one page updates Preferential Attachment in the Rewiring seems to be the driving force in the evolution of the system The microscopic structure of rewiring is very different from that of WWW In principle a user can change any series of edges and add as many pages as wanted. Still most of the quantities are similar Wikipedia growth model
36
GIACS PALERMO 9-4-08 It turns out that the pagerank of the pages is not related with the number of visit opens a very interesting scenario for further research work. Since, by definition, pagerank should give us the visit time of the page and since actually it is complety indipendent by the number of visits, we wonder if pagerank is a good measure of the authoritativeness of the pages in wikigraphs and which modifications should be introduced in order to tune its performances. Wikipedia growth model
37
GIACS PALERMO 9-4-08 River Networks
38
GIACS PALERMO 9-4-08 River Networks
39
GIACS PALERMO 9-4-08 River Networks
40
GIACS PALERMO 9-4-08 River Networks
41
GIACS PALERMO 9-4-08 From satellite images one gets Digital Elevation Models (DEM) 156.4132.4111.4 170.8161.3 108.2 182.4 154.5 106.0 From DEM a spanning tree is computed (via steepest descent) From the spanning tree, the number of points uphill is computed 234 11 6 1 2 9 River Networks
42
GIACS PALERMO 9-4-08 HACK’S LAW L // ~ A h River Networks
43
GIACS PALERMO 9-4-08 River Networks
44
GIACS PALERMO 9-4-08 Data on Mars topography were collected through the Mars Orbiter Laser Altimeter (MOLA) River Networks
45
GIACS PALERMO 9-4-08 River Networks
46
GIACS PALERMO 9-4-08 River Networks
47
GIACS PALERMO 9-4-08 Results are that we can distinguish regions whose DEM networks have properties similar to River Networks on Earth. For River on Earth P(A) A -1.43 River Networks
48
THE LIQUIDITY MARKET Monetary Policy Banks get liquidity from ECB through auctions Monetary policy realised by ECB to control interest rates BANKS MANAGE THEIR LIQUIDITY IN THE INTERBANK MARKET Reserves ECB
49
The Market Money Market EUROPEAN CENTRAL BANK provides LIQUIDITY to European Banks, through weekly auctions. EVERY BANK must DEPOSIT to NATIONAL CENTRAL BANK the 2% of all deposits and debts issued in the last two years. This reserves are supposed to help in the case of liquidity shocks 2% value fluctuates in time and it is recomputed every month. Banks sell and buy liquidity to adjust their liquidity needs and at the same time tend to reduce the value of reserve. ECB
50
The Market Market Data The interbank markets are basically managed by each European country. These markets are in almost all case phone-based, that means that each bank has some brokers doing their transactions by phone. The only exception is the Italian market, which is totally screen-based, implying that each banks operator can see real time quotes of all other banks and do its transaction. The recent paper by Boss et al. investigate the network of overall credit relationships in the Austrian Interbank market. In their study the authors analyze all the liabilities for ten quarterly single months periods, between 2000 and 2003, among 900 banks. They find a power-law distribution of contract sizes, and a power-law decay of the distribution of incoming and outgoing links (a link between two banks exists if the banks have an overall exposure with each other). Furthermore they show that the most vulnerable vertices are those with the highest centrality (measured by the number of paths that go through them). A different issue has been explored by Cocco et al. who have investigated the nature of lending relationships in the fragmented Portuguese interbank market over the period 1997-2001. In fragmented markets the amount and the interest rate on each loan are agreed on a one-to-one basis between borrowing and lending institutions. Other banks do not have access to the same terms, and no public information regarding the loan is available. The authors showed that frequent and repeated interactions between the same banks appear with a probability higher than those expected for random matching. In addition they found that during illiquid periods, and in particular during the Russian financial crisis preferential lending relationships increased.
51
The Market Market Data Italian Interbank Money Market Banks operating on the Italian market, this market is fully electronic for interbank deposit since 1990 (e-Mid) *) Daily volume 18 billion Euros *) 200 participants We report here the analysis on 196 Italian banks (plus 18 banks from abroad who interact with them) who did 85202 transactions in 2000.
52
INTRODUCTION Time activity two time scales: day one month maintenance period
53
Statistical Properties Market Data The network shows a rather peculiar architecture The banks form a disassortative network where large banks interact mostly with small ones.
54
Statistical Properties Market Data Actually the banks form different groups roughly related to their “size” when considering the average volume of money exchanged.
55
Statistical Properties Degree Distributions Using the latter quantity we can divide banks in four groups (same number of classes of the Bank of Italy classification). Group 1 with volume in the range 0-23 million Euro per day, Group 2 in the range 23-70 million Euro per day, Group 3 in the range 70-165 million Euro per day, Group 4 over 165 million Euro per day. In this way we find an overlap of more than 90% between the two classifications.
56
Communities Separation of business Two main communities emerge Many small banks and few little banks. Second eigenvector of the normal matrix
57
Modelling Model of bank network We assign to the N nodes (N is the size of the system) a value drawn from the previous distribution. Vertices origin and destination for one edge are chosen with a probability pij proportional to the sum of respective sizes v i and v j. In formulas
58
Modelling Market Data
59
MODELLING Model and clustering To quantify the agreement between experimental and simulated networks we also define an overlap parameter m specifying how good is the behavior of the model in reproducing the observed clustering. To quantify the agreement between experimental and simulated networks, we proceed in the following way. We define a matrix E, that is a weighted matrix 4 × 4, where the weights represent the number of connections between groups. In order to measure the overlap between the matrices obtained by data and by computer model, we define a distance based on the differences between the elements of the matrices.
60
MODELLING Model and clustering We can define a distance between the number of intergroup edges in experimental data and numerical simulation. The sum of all elements, is equal to Etot in both cases. Therefore the maximum possible difference is 2E tot. This happens when all the links are between two groups in one case and in other two groups in the other. We use this maximum value to normalize the above expression and we than define the overlap parameter m: m = 1 − d/2E tot WE HAVE AN OVERLAP m=98%
61
MODELLING Model and clustering To evaluate the relevance of division in classes, we have to compare the value of E g,k with the corresponding quantity E null g,k for a network where there is not a division in classes (null hypothesis). The analytical expression for the null case is E null g,k = E tot /10 where 10 is the number of possible couplings between the 4 groups. The comparison between the two networks evidences that in the real case emerges the division in groups: in Table for each possible combination of groups is reported the value E g,k /E tot. In the null case, each element of the same matrix should be equal to 10. Group1234 10648 263817 348527 48172722
62
CONCLUSIONS Market Data Financial Networks can help 1.In distinguishing behaviour of different markets 2.In visualizing important features as the business role 3.In testing the validity of market models They might be an example of scale-free networks even more general than those described by growth and preferential attachment.
63
CONCLUSIONS Thanks to Giulie Giulia De Masi, Dep. Economics Università delle Marche Italy Giulia Iori, Department of Economics, School of Social Science City University, London UK
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.