M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September SVD and LS M.A. Miceli University of Rome I Stats in the Château Jouy-en-Josas August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Motivations Problems of high dimensionality in estimation: –Rank < actual dimension of the data sets inverse problems –Threholds in accepting variables eases on every dimension, as the number of variables/dimensions increases (ex. Wald test). How the SVD helps in extracting robust correlations between dependent and independent variables: automatic choice of “model”. Why Some evidence in predicting US CPIs indexes Some issues about normalizations
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Motivations Given a simultaneous linear system of equations 1.Collapsing dimensionality of the system to its min rank = min [rank(Y), rank (X)], 2.Advantages of SVD w.r.t. Principal Components: PC requires a sqare matrix, e.g. autocorrelation matrix, and ranks the dimensions within that single matrix; SVD ranks the correlations between X and Y dimensions 3.Discretionary possibility of getting rid of some - believed negligible – dimensions: we are interested in getting rid of those dimensions that can be generated by a totally random system of same dimensions (Marchenko-Pastur conditions adapted to a rectangular matrix).
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Definition of SVD of a matrix product SVD definition Having two matrices one can write and therefore If T << max(M,N)? No problems
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Diagonalizing the LS estimator Consider regressing every column y over the set of explanatory variables X: we write We diagonalize both matrices: (X’X) and (X’Y): –X’X –X’Y rectangular –NB. The SVD of a square matrix IS the same as the diagonalisation. We will write
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
7 (X’ Y) Uxy 0 Sxy Vxy SVD of the covariance matrix
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September X’Y Vxy Uxy Sxy 0 SVD mapping from column basis to row basis
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Y Vxy X Uxy Sxy Y linear combin X linear combin SVD: splitting the product X’Y
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Adding diagonalisation of both X and Y matrices
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September YXUxx Uxy Inv(Dxx)Sxy Vxy ‘Vyy ’ Returning to the original variables Replacing the old “B”: any advantage??!! We may cancel factors: any criterium?
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September RMT 1.Marcenko-Pastur conditions compute singular values density and interval limits for square matrices. Bouchaud, Miceli et al (2005) derive them for rectangular matrices. 2.We run exactly the same experiment with purely random generated matrices for “many times”: limits and densities reply the theory
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Marcenko-Pastur limits and density
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September RMT 1.Density and limits do change if we use raw or already diagonalized data. 2.Is this “double diagonalization” worthwhile? singular values are HD0 in standardization, eigenvectors are NOT.
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Diagonalized “LS estimator” We may approach the same problem in different ways 1.raw data 2.normalized factors 3.non normalized factors “unfortunately” 3. works best. Why? … Is it because factor normalization changes the ranking of the SVD singular values and this affect eventually the factor selection? NO! Answer at the end …. Very disturbing
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Example: Forecasting US CPIs Indexes Time series are mom % changes: Y:= 9 CPIs Indexes, aug83 – apr07 X:= 77 macroeconomic series nov83-apr07 including 3 lags of the Ys. T=282, N=9, M=77, rolling window W=100 or else. n= N/W, m=M/W.
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September CPIs
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Xs
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Estimation by Model III
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Singular values: Model I – Random generated DATA
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Singular values for SVD on raw and random DATA
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Estimation by Model II Factors are divided by their own eigenvalue
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Singular values: Model II – Data NORMALIZED FACTORS lambda max = Lambda min =0.608
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September lambda max = Lambda min =0.608 Singular values: Model II – Random generated NORMALIZED FACTORS Random generated singular values don’t look very differently ….
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Singular values for SVD on raw and random FACTORS
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Let’s see estimations by Model III
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September P&L Model III - Factors on raw data
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September P&L Model III - CPI Indexes (Model of Non Normalized Factors) – In sample With ALL svd factors2 svd factors
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Let’s see estimations by Model II (normalized factors)
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September P&L Model II (Normalized factors) - Factors
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September P&L Model II (Normalized factors) – CPI’s
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Normalized factorsNon normalized factors Example of CPI_comdty estimation
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September OUT OF SAMPLE Estimation on t=1,…,120 Forecast at fixed coefficients for t= 121, … 282
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September P&L: Factors (Model II)
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Forecast on CPI’s All factors 2 factors only Easier to predict: 1. medical care (since stable), 2. commodities (oil), 3. Transports
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Forecasts on Cpi’s Comdty
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Conclusions 1
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September Conclusions on the example