Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex.

Similar presentations


Presentation on theme: "1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex."— Presentation transcript:

1 1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex Systems EPL 78 (2007) 30006

2 2 Motivation In many systems the dynamics of N elements is monitored by sampling each time series with T observations One way of quantifying the interaction between elements is the correlation matrix Since there are TN observations and the number of parameters in the correlation matrix is O(N 2 ), the estimated correlation matrix unavoidably suffers by statistical uncertainty due to the finiteness of the sample.

3 3 Questions How can one process (filter) the correlation matrix in order to get a statistically reliable correlation matrix ? How can one build a time series factor model which describes the dynamics of the system? How can one compare the characteristics of different correlation matrix filtering procedures ?

4 4 A real example Ln P(t) Correlation Matrix C=(  ij ) As an example we consider the time series of price return of a set of stocks traded in a financial market Similarity measure between stock i and j = Correlation coefficient  ij

5 5 Factor models Factor models are simple and widespread model of multivariate time series A general multifactor model for N variables x i (t) is is a constant describing the weight of factor j in explaining the dynamics of the variable x i (t). The number of factors is K and they are described by the time series f j (t). is a (Gaussian) zero mean noise with unit variance

6 6 Factor models: examples Multifactor models have been introduced to model a set of asset prices, generalizing CAPM where now B is a (NxK) matrix and f(t) is a (Kx1) vector of factors. The factors can be selected either on a theoretical ground (e.g. interest rates for bonds, inflation, industrial production growth, oil price, etc.) or on a statistical ground (i.e. by applying factor analysis methods, etc.) Examples of multifactor models are Arbitrage Pricing Theory (Ross 1976) and the Intertemporal CAPM (Merton 1973).

7 7 Factor models and Principal Component Analysis (PCA) A factor is associated to each relevant eigenvalue-eigenvector h-th factor Idiosyncratic term Number of relevant eigenvalues i-th component of the h-th eigenvector of C h-th eigenvalue How many eigenvalues should be included ?

8 8 Random Matrix Theory The idea is to compare the properties of an empirical correlation matrix C with the null hypothesis of a random matrix. Density of eigenvalues of a Random Matrix

9 9 Random Matrix Theory L.Laloux et al, PRL 83, 1468 (1999) Random Matrix Theory helps to select the relevant eigenvalues

10 10 A simple (hierarchical) model C =C =

11 11 Spectral Analysis 2 large eigenvalues 2 corresponding eigenvectors PCA is not able to reconstruct the true model and/or to give insights about its hierarchical features

12 12 Hierarchical organization Many natural and artificial systems are intrinsically organized in a hierarchical structure. This means that the elements of the system can be partitioned in clusters which in turn can be partitioned in subclusters and so on up to a certain level. –How is it possible to detect the hierarchical structure of the system ? –How is it possible to model the time series dynamics of the system ?

13 13 Clustering algorithms The natural answer to the first question is the use of clustering algorithms » Clustering algorithms are data analysis techniques that allows to extract a hierarchical partitioning of the data »We are mainly interested in hierarchical clustering methods which allows to display the hierachical structure by means of a dendrogram »We focus our attention on two widely used clustering methods: -) the single linkage cluster analysis (SLCA) -) the average linkage cluster analysis (ALCA)

14 14 How is it possible to extract a time series model for the stocks which takes into account the structure of the dendrogram? Daily return of 100 stocks traded at NYSE in the time period 1/1995-12/1998 (T=1011) SECTORS Energy Technology Financial Healthcare Basic Material Services Utilities ALCA

15 15 Hierarchical clustering approach whereis the first node where elements i and j merge together Dendrograms obtained by hierarchical clustering are naturally associated with a correlation matrix C < given by We propose to use as a model of the system the factor model whose correlation matrix is C < The motivations are The hierarchical structure is revealed by the dendrogram The algorithm often filters robust information of the time series

16 16 Hierarchical Clustering (HC) The application of both the ALCA and SLCA to C allows to reveal the hierarchical structure of the model. Is it possible to recover the 3-factor model starting from such a dendrogram?

17 17 Hierarchically Nested Factor Model (HNFM) A factor is associated to each node  h -th factor Idiosyncratic term The model explains

18 18 We have shown that it is possible to associate a factor model to a dendrogram If the system has a hierarchical structure and if the clustering algorithm is able to detect it, it is likely that the factor model describes the hierarchical features of the systems. If the system has N elements the factor model has N factors –How is it possible to reduce the dimensionality of the model ? –Principal Component Analysis prescribes to use the k largest eigenvalues and (the corresponding eigenvectors) to build a k-factor model

19 19 Statistical uncertainty and necessity of node reduction dendrogram of the model 3 nodes (factors) dendrogram from a realization of finite length 99 nodes (factors)

20 20 Bootstrap procedure HC is applied to the data set. The result is the dendrogram. HC is applied to the N surrogated data matrices getting the set of surrogated dendrograms. For each node   of D, the bootstrap value is computed as the percentage of surrogated dendrograms in which the node   is preserved. A node is preserved in the bootstrap if it identifies a branch composed by the same elements as in the real data dendrogram

21 21 Example Daily return of 100 stocks traded at NYSE in the time period 1/1995-12/1998 (T=1011) ALCAbootstrap value distribution

22 22 Node-factor reduction Select a bootstrap value threshold. For any node   with bootstrap value If then merge the node   with his first ancestor  q in the path to the root such that We do not choose a priori the value of bt but we infer the optimal value from the data in a self consistent way (cfr Hillis and Bull, Syst. Biol. 1993)

23 23 Empirical Application: node reduction Daily return of 100 stocks traded at NYSE in the time period 1/1995-12/1998 (T=1011) 23 nodes 19 9 23 node model E1=oil well and services, E2= oil and gas integrated S1=communication services, S2=retail H=major drugs U=electric utilities

24 24 Meaning of factors in the HNFM HNFM associated to the reduced dendrogram with 23 nodes. Equations for stocks belonging to the Technology and Financial Sectors. Technology Factor Financial Factor

25 25 Comparing filtering procedures A filtering procedure is a recipe to replace a sample correlation matrix with another one which is supposed to better describe the system How can we compare different filtering procedures? A good filtering procedure should be able to –remove the right amount of noise from the matrix to reveal the underlying model –be statistically robust to different realizations of the process

26 26 Kullback-Leibler distance For multivariate normally distributed random variables we have: Mutual information:, where p and q are pdf’s. Minimizing the Kullback-Leibler distance is equivalent to maximize the likelihood in the MLFA. We propose to use the Kullback-Leibler distance to quantify the performance of different filtering procedures of the correlation matrix

27 27 By applying the theory of Wishart matrices it is possible to show that where  is the model correlation matrix of the system while S 1 and S 2 are two sample correlation matrices obtained from two independent realizations each of length T The three expectation values are independent from , i.e they do not depend from the underlying model

28 28 Filtered correlation matrices We consider two filtered correlation matrices,, both obtained by comparing the empirical correlation matrix eigenvalues with the expectations of Random Matrix Theory. We consider two filtered correlation matrices,, obtained by applying the ALCA and the SLCA to the empirical correlation matrix respectively.

29 29 Filtered correlation matrix (1) M. Potters, J.-P. Bouchaud & L. Laloux, Acta Phys. Pol. B 36 (9), pp. 2767-2784 (2005).

30 30 Filtered correlation matrix (2) B. Rosenow, V. Plerou, P. Gopikrishnan & H.E. Stanley, Europhys. Lett. 59 (4), pp. 500-506 (2002)

31 31 Comparison of filtered correlation matrices Block diagonal model with 12 factors. N=100, T=748. Gaussian random Variables.

32 32 Comparison of filtered correlation matrices Block diagonal model with 12 factors. N=100, T=748. Gaussian random Variables.

33 33 Comparison of filtered correlation matrices

34 34 Conclusions It is possible to associate a time series factor model to a dendrogram, output of a hierarchical clustering algorithm The robustness of the factors with respect to statistical uncertainty can be determined by using the bootstrap technique The Kullback-Leibler distance allows to compare the characteristics of different filtering procedure taking also into account the noise due to the finiteness of time series This suggests the existence of a tradeoff between information and stability


Download ppt "1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex."

Similar presentations


Ads by Google