Networks of Companies from Stock Price Correlations J. Kertész 1,2, L. Kullmann 1, J.-P. Onnela 2, A. Chakraborti 2, K. Kaski 2, A. Kanto 3 1 Department.

Slides:

Advertisements

Similar presentations

Ordinary least Squares

Advertisements

Lecture 15. Graph Algorithms

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.

CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).

Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.

Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter.

Network Morphospace Andrea Avena-Koenigsberger, Joaquin Goni Ricard Sole, Olaf Sporns Tung Hoang Spring 2015.

3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.

Financial Networks with Static and dynamic thresholds Tian Qiu Nanchang Hangkong University.

Dynamics of the FX Market: A Minimal Spanning Tree Approach Omer Suleman OCCF and Department of Physics University of Oxford Collaborators: N F Johnson,

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.

Chapter 6 An Introduction to Portfolio Management.

Cluster Validation.

Duan Wang Center for Polymer Studies, Boston University Advisor: H. Eugene Stanley.

Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –

Computer Science 1 Web as a graph Anna Karpovsky.

The Erdös-Rényi models

Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.

DYNAMICS OF COMPLEX SYSTEMS Self-similar phenomena and Networks Guido Caldarelli CNR-INFM Istituto dei Sistemi Complessi

Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.

FLUCTATION SCALING: TAYLOR’S LAW AND BEYOND János Kertész Budapest University of Technology and Economics.

© 2012 Cengage Learning. All Rights Reserved. May not scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter.

Portfolio Management Lecture: 26 Course Code: MBF702.

© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.

Portfolio Management-Learning Objective

Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.

Some Background Assumptions Markowitz Portfolio Theory

Investment Analysis and Portfolio Management Chapter 7.

The importance of enzymes and their occurrences: from the perspective of a network W.C. Liu 1, W.H. Lin 1, S.T. Yang 1, F. Jordan 2 and A.J. Davis 3, M.J.

LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.

Migration Motif: A Spatial-Temporal Pattern Mining Approach for Financial Markets Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, John H.Thornton Jr.

A Graph-based Friend Recommendation System Using Genetic Algorithm

Investment Analysis and Portfolio Management First Canadian Edition By Reilly, Brown, Hedges, Chang 6.

Skewness & Kurtosis: Reference

Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.

Tanja Magoč, François Modave, Xiaojing Wang, and Martine Ceberio Computer Science Department The University of Texas at El Paso.

Asmah Mohd Jaapar  Introduction  Integrating Market, Credit and Operational Risk  Approximation for Integrated VAR  Integrated VAR Analysis:

Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.

Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.

Workshop on Optimization in Complex Networks, CNLS, LANL (19-22 June 2006) Application of replica method to scale-free networks: Spectral density and spin-glass.

Slides are modified from Lada Adamic

Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:

1 MODERN PORTFOLIO THEORY AND MARKET EFFICIENCY BY PROF. SANJAY SEHGAL DEPARTMENT OF FINANCIAL STUDIES UNIVERSITY OF DELHI SOUTH CAMPUS NEW DELHI

Travis Wainman partner1 partner2

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Informatics tools in network science

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.

Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.

::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.

Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.

Data Transformation: Normalization

Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.

Networks of Companies from Stock Price Correlations

CSE 4705 Artificial Intelligence

Graph Operations And Representation

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

Presentation transcript:

Networks of Companies from Stock Price Correlations J. Kertész 1,2, L. Kullmann 1, J.-P. Onnela 2, A. Chakraborti 2, K. Kaski 2, A. Kanto 3 1 Department of Theoretical Physics Budapest University of Technology and Economics, Hungary 2 Laboratory of Computational Engineering Helsinki University of Technology, Finland 3 Dept of Quantitative Methods in Economics and Management Science Helsinki School of Economics, Finland

Motivation Financial market is a self-adaptive complex system; many interacting units, obvious networking. Networks: Cooperation Most important and most difficult Activity, ownership Similarity Temporal aspects Networks generated by time dependencies Time dependent networks Revealing NW structure is crucial for understanding and also for pragmatic reasons (e.g., portfolio opt.) Many groups active: Palermo, Rome, Seoul etc.

Outline Classification by Minimum Spanning Trees (MST) (Mantegna) Temporal evolution Relation to portfolio optimization Correlations vs. noise: Parametric aggregational classification Temporal correlations: Directed NW of influence

Daily price data for N=477 of NYSE stocks (CRSP of U. of Chicago), such as GE, MOT, and KO Time span S=5056 trading days: Jan 1980 – Dec 1999 Daily closure price of GE: P GE (t) Daily logarithmic price: lnP GE (t) Daily logarithmic return: r GE (t)=lnP GE (t) – lnP GE (t-1) Data: price and return

Time series of asset returns Return matrix: Data is divided into M time-windows of width T displaced by  T, thus getting M matrices window width T step length  T time t

For each window a correlation matrix is defined with elements being the equal time correlation coefficients: where r i, r j  R t, ..  denotes time average. Transformation to distance-matrix with elements: Minimum spanning tree (MST), which is a graph linking N vertices (stocks) with N-1 edges such that the sum of distances is minimum. Efficient algorithms. Correlations and distances

Central vertex To characterise positions of companies in the tree the concept of central vertex is introduced: Reference vertex to measure locations of other vertices, needed to extract further information from asset trees Central vertex should be a company whose price changes strongly affect the market; three possible criteria: (1) Vertex degree criterion: vertex with the highest vertex degree, i.e., the number of incident edges; Local. (2) Weighted vertex degree criterion: vertex with the highest correlation coefficient weighted vertex degree; Local. (3) Center of mass criterion: vertex v i giving minimum value for mean occupation layer ( l(t,v i ) ); Global.

Central vertex: comparison (1) Vertex degree criterion (local): GE: 67.2% (2) Weighted vertex degree criterion (local): GE: 65.6% (3) Center of mass criterion (global): GE: 52.8%

Asset tree and clusters Business sectors (Forbes) Yahoo data

Potts superparamagnetic clustering Kullmann, JK, Mantegna Antiferromagnetic bonds

Mismatch between tree clusters and business sectors? 1.Random price fluctuations introduce noise to the system 2.Business sector definitions vary by institutions (Forbes…) 3.Historical data should be matched with a contemporary business sector definition 4.Classifications are ambiguous and less informative for highly diversified companies 5.MST classification mechanism imposes constraint 6.Uniformity and strength of correlations vary by business sector (c.f. Energy sector vs. Technology) Asset tree clustering

Mean occupation layer In order to characterise the spread of vertices on the asset tree, concept of mean occupation layer is introduced: where v c is the central vertex, lev(v i ) denotes the level of vertex v i, such that lev(v c ) = 0. Both static and dynamic central vertex may be used: exhibit similar behaviour  Robustness

Asset tree: topology change Normal market topology crash topology Yahoo data

Robustness of dynamic asset tree topology measured as the ratio of surviving connections when moving by one step: Single-step survival ratio: Robustness: single-step survival T = 4 years,  T = 1 month

Tree evolution: multi-step survival Within the first region decay is exponential After this there is cross-over to power law behaviour:  (t,k) ~ t - - z T (y) t 1/2 (y) t 1/2 =0.12 T Half life vs. window width Connections survived vs. time Power law decay: z ≈1.2

Evolution of graphs and trees Overlap of edges in asset graph G t and asset tree T t as a function of time Overlap of edges in asset graph G t and asset tree T t as a function of normalized number of edges, averaged over time

k=0

k=1

k=2

k=4

k=6

k=12

k=24

k=36

k=48

Distribution of vertex degrees The topological nature of the network is studied by analysing the distribution of vertex degrees: Power law distribution would indicate scale-free topology, a feature unexpected by random network models Vandewalle et al. find for one year data while we found Power law fit ambiguous due to limited range of data

Distribution of vertex degrees L: normal R: crash

Portfolio optimisation In the Markowitz portfolio optimisation theory risks of financial assets are characterised by standard deviations of average returns of assets: The aim is to optimise the asset weights w i so that the overall portfolio risk is minimized for a given portfolio return (minimum risk portfolio is uniquely defined)

Weighted portfolio layer How are minimum risk portfolio assets located on graph? Weighted portfolio layer is defined by imposing no short-selling, i.e.  w i  0, and it is compared with the mean occupation layer l(t).

Portfolio layer No short-sellingShort-selling portfolio layer mean occupation layer Static c.v. Dynamic c.v.

Correlations vs. noise Correlation matrix contains systematics and noise. MST: Non-parametric, unique classification scheme, but! Even for uncorrelated random matrix MST would lead to classification… Meaningful clustering and robustness already signalize significance. Different methods to separate noise from information: Eigenvalue spectra (Boston, Paris) Independent/principal component analysis (economists )

Here: Building up the FCG Tree condition may ignore important correlations. (General classification problem) Visualization through Parametrized Aggregated Classification (PAC): Add links one by one to the graph, according to their rank, started by the strongest and ended with a Fully Connected Graph (FCG). Strongly correlated parts get early interconnected, clustering coefficient becomes high. Price time series data for a set of 477 companies. Window width T=1000 business days (4 years), located at the beginning of the 1980’s Comparison with random graph (obtained by shuffling the data) C i = # of  -s / [k(k-1) / 2] where k is the degree of node i

size= 0

size= 10

size= 20

size= 30

size= 40

size= 50

size= 60

size= 70

size= 80

size= 90

size= 100

size= 120

size= 140

size= 160

size= 180

size= 200

size= 300

size= 400

size= 500

size= 600

size= 700

size= 800

size= 900

size=1000

Elementary graph concepts Graph size: number of edges in the graph (variable) Graph order: number of vertices in the graph (constant) Spanned graph order: number of vertices in the subgraph spanned by the edges, thus excluding the isolated vertices (variable) These definitions can be applied also to clusters (two types) (1) edge cluster (2) vertex cluster Edge clusters are more meaningful in the asset graph context

Cluster growth The growth patterns of clusters can be divided into four topologically different types: (I) Create a new cluster (two nodes and the incident edge) when neither of the two end nodes are part of an existing edge cluster (spanned cluster order +2, size +1) (II) Add a node and the incident edge to an already existing edge cluster (spanned cluster order +1, size +1) (III) Merge two edge clusters by adding an edge between them (combined spanned cluster size +1) (IV) Add an edge to an already existing edge cluster, thus creating a cycle in it (spanned cluster size +1)

Cluster growth empiricalrandom N=477

Spanned graph order empiricalrandom N=477

Number of vertex clusters empiricalrandom N=477

Cluster size for edge clusters empiricalrandom N=477

Vertex degree distribution empiricalrandom p=0.01 N=477

Vertex degree distribution empiricalrandom p=0.25 N=477

Clustering coefficient empiricalrandom N=116

Mean clustering coefficient empiricalrandom N=116

NO TIME REVERSAL SYM. ON THE MARKETS Physics close to equilibrium: Time reversal symmetry (TRS)  Detailed balance  Symmetric correlation functions, Fluctuation Dissipation Th. (FDT) No fundamental principle forcing TRS on the market. In contrast: The elementary process, a transaction is irreversible: Though the price is set by equilibrating supply and demand, both parties (or at least one of them ) feel that the transaction is for their advantage and would not agree to revert it. Possibility of Asymmetry in the cross correlation functions Differences between the decay of spontaneous fluctuations and of response to external perturbations

Time dependent cross correlations log return of stock A between t and t  t Correlation fn between returns of company A and B It depends on  t and . Is it symmetric? Difficulties: trade not syncronized, frequencies are very different bad signal/noise ratio Approptiate averaging

Toy model to test the method: Persistent 1d random walk (increment x   1): We take two such walks, which are correlated, with increments x and y The correlation function can be calculated: We corrupt the data to have similar quality to real ones Only 1% of the data are kept. (  o =200,  =1000,  =0.99)

The measured correlations on a finite set of data depends on the averaging procedure (moving average) The appropriate choice is  t min   t   o DATA set: Trade And Quote, companies tick by tick 54 days: 195 companies traded more than times  t = 100s but results checked for s.

Results We measure  max, C(  max ), and R = C(  max )/noise Consider I  max I > 100, C(  max ) > 0.04, and R > 6 as ‘effect’ Not all pairs of comp’s show the effect Peak not only shifted but also asymmetric Large, frequently traded companies ‘pull’ the smaller ones Weak effect and short characteristic time (minutes) XON: Exxon (oil) ESV: Ensco (oil wells)

No chains Many leaders for a follower Many followers for a leader Disconnected graph Directed network of influence

Conclusions Networks constructed from cross correlations of stock price time series (MST, PAC) Though C ij noisy, much information content, useful for portfolio optimization MST robust, reasonable classification, interesting dyn. at crash-time Clusters (branches) not equally correlated, PAC reveals differences, separation of noise from info Asymmetric time dependent cross correlations lead to directed network of influence