1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

A.M. Alonso, C. García-Martos, J. Rodríguez, M. J. Sánchez Seasonal dynamic factor model and bootstrap inference: Application to electricity market forecasting.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Financial Applications of RMT Max Timmons May 13, 2013 Main Application: Improving Estimates from Empirical Covariance Matricies Overview of optimized.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
STAT 497 APPLIED TIME SERIES ANALYSIS
Observers and Kalman Filters
Introduction to Bioinformatics
Traditional and innovative technologies and its impact on the long run economic growth in Armenia Karen Poghosyan Central bank of Armenia.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 7: Principal component analysis (PCA)
An introduction to Principal Component Analysis (PCA)
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Principal Component Analysis
Mutual Information Mathematical Biology Seminar
Financial Networks with Static and dynamic thresholds Tian Qiu Nanchang Hangkong University.
Dimensional reduction, PCA
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Bootstrap in Finance Esther Ruiz and Maria Rosa Nieto (A. Rodríguez, J. Romo and L. Pascual) Department of Statistics UNIVERSIDAD CARLOS III DE MADRID.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Bayesian belief networks 2. PCA and ICA
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Duan Wang Center for Polymer Studies, Boston University Advisor: H. Eugene Stanley.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Techniques for studying correlation and covariance structure
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Summarized by Soo-Jin Kim
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Principles of Pattern Recognition
Empirical Financial Economics Asset pricing and Mean Variance Efficiency.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Yaomin Jin Design of Experiments Morris Method.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
No-Arbitrage Testing with Single Factor Presented by Meg Cheng.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Randomized Algorithms for Bayesian Hierarchical Clustering
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
CSE 185 Introduction to Computer Vision Face Recognition.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Cluster validation Integration ICES Bioinformatics.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Principle Component Analysis and its use in MA clustering Lecture 12.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Spectrum Sensing In Cognitive Radio Networks
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Ultra-high dimensional feature selection Yun Li
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 3: Maximum-Likelihood Parameter Estimation
Principal Component Analysis
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Outlier Processing via L1-Principal Subspaces
Principal Component Analysis (PCA)
Unfolding Problem: A Machine Learning Approach
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Presented by Nagesh Adluru
EE513 Audio Signals and Systems
3.3 Network-Centric Community Detection
Feature space tansformation methods
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Presentation transcript:

1 Hierarchically nested factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) Rome, June 20, 2007 Observatory of Complex Systems EPL 78 (2007) 30006

2 Motivation In many systems the dynamics of N elements is monitored by sampling each time series with T observations One way of quantifying the interaction between elements is the correlation matrix Since there are TN observations and the number of parameters in the correlation matrix is O(N 2 ), the estimated correlation matrix unavoidably suffers by statistical uncertainty due to the finiteness of the sample.

3 Questions How can one process (filter) the correlation matrix in order to get a statistically reliable correlation matrix ? How can one build a time series factor model which describes the dynamics of the system? How can one compare the characteristics of different correlation matrix filtering procedures ?

4 A real example Ln P(t) Correlation Matrix C=(  ij ) As an example we consider the time series of price return of a set of stocks traded in a financial market Similarity measure between stock i and j = Correlation coefficient  ij

5 Factor models Factor models are simple and widespread model of multivariate time series A general multifactor model for N variables x i (t) is is a constant describing the weight of factor j in explaining the dynamics of the variable x i (t). The number of factors is K and they are described by the time series f j (t). is a (Gaussian) zero mean noise with unit variance

6 Factor models: examples Multifactor models have been introduced to model a set of asset prices, generalizing CAPM where now B is a (NxK) matrix and f(t) is a (Kx1) vector of factors. The factors can be selected either on a theoretical ground (e.g. interest rates for bonds, inflation, industrial production growth, oil price, etc.) or on a statistical ground (i.e. by applying factor analysis methods, etc.) Examples of multifactor models are Arbitrage Pricing Theory (Ross 1976) and the Intertemporal CAPM (Merton 1973).

7 Factor models and Principal Component Analysis (PCA) A factor is associated to each relevant eigenvalue-eigenvector h-th factor Idiosyncratic term Number of relevant eigenvalues i-th component of the h-th eigenvector of C h-th eigenvalue How many eigenvalues should be included ?

8 Random Matrix Theory The idea is to compare the properties of an empirical correlation matrix C with the null hypothesis of a random matrix. Density of eigenvalues of a Random Matrix

9 Random Matrix Theory L.Laloux et al, PRL 83, 1468 (1999) Random Matrix Theory helps to select the relevant eigenvalues

10 A simple (hierarchical) model C =C =

11 Spectral Analysis 2 large eigenvalues 2 corresponding eigenvectors PCA is not able to reconstruct the true model and/or to give insights about its hierarchical features

12 Hierarchical organization Many natural and artificial systems are intrinsically organized in a hierarchical structure. This means that the elements of the system can be partitioned in clusters which in turn can be partitioned in subclusters and so on up to a certain level. –How is it possible to detect the hierarchical structure of the system ? –How is it possible to model the time series dynamics of the system ?

13 Clustering algorithms The natural answer to the first question is the use of clustering algorithms » Clustering algorithms are data analysis techniques that allows to extract a hierarchical partitioning of the data »We are mainly interested in hierarchical clustering methods which allows to display the hierachical structure by means of a dendrogram »We focus our attention on two widely used clustering methods: -) the single linkage cluster analysis (SLCA) -) the average linkage cluster analysis (ALCA)

14 How is it possible to extract a time series model for the stocks which takes into account the structure of the dendrogram? Daily return of 100 stocks traded at NYSE in the time period 1/ /1998 (T=1011) SECTORS Energy Technology Financial Healthcare Basic Material Services Utilities ALCA

15 Hierarchical clustering approach whereis the first node where elements i and j merge together Dendrograms obtained by hierarchical clustering are naturally associated with a correlation matrix C < given by We propose to use as a model of the system the factor model whose correlation matrix is C < The motivations are The hierarchical structure is revealed by the dendrogram The algorithm often filters robust information of the time series

16 Hierarchical Clustering (HC) The application of both the ALCA and SLCA to C allows to reveal the hierarchical structure of the model. Is it possible to recover the 3-factor model starting from such a dendrogram?

17 Hierarchically Nested Factor Model (HNFM) A factor is associated to each node  h -th factor Idiosyncratic term The model explains

18 We have shown that it is possible to associate a factor model to a dendrogram If the system has a hierarchical structure and if the clustering algorithm is able to detect it, it is likely that the factor model describes the hierarchical features of the systems. If the system has N elements the factor model has N factors –How is it possible to reduce the dimensionality of the model ? –Principal Component Analysis prescribes to use the k largest eigenvalues and (the corresponding eigenvectors) to build a k-factor model

19 Statistical uncertainty and necessity of node reduction dendrogram of the model 3 nodes (factors) dendrogram from a realization of finite length 99 nodes (factors)

20 Bootstrap procedure HC is applied to the data set. The result is the dendrogram. HC is applied to the N surrogated data matrices getting the set of surrogated dendrograms. For each node   of D, the bootstrap value is computed as the percentage of surrogated dendrograms in which the node   is preserved. A node is preserved in the bootstrap if it identifies a branch composed by the same elements as in the real data dendrogram

21 Example Daily return of 100 stocks traded at NYSE in the time period 1/ /1998 (T=1011) ALCAbootstrap value distribution

22 Node-factor reduction Select a bootstrap value threshold. For any node   with bootstrap value If then merge the node   with his first ancestor  q in the path to the root such that We do not choose a priori the value of bt but we infer the optimal value from the data in a self consistent way (cfr Hillis and Bull, Syst. Biol. 1993)

23 Empirical Application: node reduction Daily return of 100 stocks traded at NYSE in the time period 1/ /1998 (T=1011) 23 nodes node model E1=oil well and services, E2= oil and gas integrated S1=communication services, S2=retail H=major drugs U=electric utilities

24 Meaning of factors in the HNFM HNFM associated to the reduced dendrogram with 23 nodes. Equations for stocks belonging to the Technology and Financial Sectors. Technology Factor Financial Factor

25 Comparing filtering procedures A filtering procedure is a recipe to replace a sample correlation matrix with another one which is supposed to better describe the system How can we compare different filtering procedures? A good filtering procedure should be able to –remove the right amount of noise from the matrix to reveal the underlying model –be statistically robust to different realizations of the process

26 Kullback-Leibler distance For multivariate normally distributed random variables we have: Mutual information:, where p and q are pdf’s. Minimizing the Kullback-Leibler distance is equivalent to maximize the likelihood in the MLFA. We propose to use the Kullback-Leibler distance to quantify the performance of different filtering procedures of the correlation matrix

27 By applying the theory of Wishart matrices it is possible to show that where  is the model correlation matrix of the system while S 1 and S 2 are two sample correlation matrices obtained from two independent realizations each of length T The three expectation values are independent from , i.e they do not depend from the underlying model

28 Filtered correlation matrices We consider two filtered correlation matrices,, both obtained by comparing the empirical correlation matrix eigenvalues with the expectations of Random Matrix Theory. We consider two filtered correlation matrices,, obtained by applying the ALCA and the SLCA to the empirical correlation matrix respectively.

29 Filtered correlation matrix (1) M. Potters, J.-P. Bouchaud & L. Laloux, Acta Phys. Pol. B 36 (9), pp (2005).

30 Filtered correlation matrix (2) B. Rosenow, V. Plerou, P. Gopikrishnan & H.E. Stanley, Europhys. Lett. 59 (4), pp (2002)

31 Comparison of filtered correlation matrices Block diagonal model with 12 factors. N=100, T=748. Gaussian random Variables.

32 Comparison of filtered correlation matrices Block diagonal model with 12 factors. N=100, T=748. Gaussian random Variables.

33 Comparison of filtered correlation matrices

34 Conclusions It is possible to associate a time series factor model to a dendrogram, output of a hierarchical clustering algorithm The robustness of the factors with respect to statistical uncertainty can be determined by using the bootstrap technique The Kullback-Leibler distance allows to compare the characteristics of different filtering procedure taking also into account the noise due to the finiteness of time series This suggests the existence of a tradeoff between information and stability