Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu February, 2010
Outline Overview of the project Improvements in the last week Speedup the data access Improve the PCA algorithm Use adjusted price in PNL calculation Taking trading volume into account Future work
Framework Raw Historical Data From WRDS PCA Eigenportfolios PCA Eigenportfolios Residuals as increments of AR process Compute S-scores ETFs for industry sectors ETFs for industry sectors Signal trade orders Market model 60-day returns Residual process model Current stock prices Market model 252-day returns Adjusted Stock price Series + indices Data pre-processing (python scripts) Back-testing simulations (matlab scripts)
Code Speedup Data access Tradeoff Always read from disk: very slow Everything in memory: not robust, can be slow Cache parts of dataset in memory Fast code Same Total Speedup > 16 times BeforeAfter
PCA amelioration (1/4) Suppose X is a nxp matrix including n samples and p features; Original algorithm: Calculate the Eigen-decomposition of the correlation matrix: The matrix Q consists of the Eigen-vectors of the correlation matrix
PCA amelioration (2/4) Suppose X is a nxp matrix including n samples and p features; Substituted algorithm: We use singular value decomposition (SVD) to get the eigenvectors. Then V consists of Eigen-vectors of the correlation matrix. This will reduce the computational complexity by around 80%
PCA amelioration (3/4) Proof: Since U and V are orthogonal, V consists of the eigen-vectors of the correlation matrix And equals to diagonal matrix D
PCA amelioration (4/4) Notice: if p is one eigenvector of X, then –p is also its eigenvector Since if Then The effect of “negative” can be removed by the estimation.
Experiment result (Fig. 1) Top 50 eigenvalues of the correlation matrix of market returns computed on May estimated using a 1-year window and a universe of 1590 stocks
Experiment result 2 Value of the first eigenvector
Experiment result 2 Value of the second eigenvector
Experiment result 2 Value of the third eigenvector
Preliminary PNL Experiment Dec Feb
After correction
Taking trading volume into account Problem mean-reversion strategies are sensitive to trading volume immediately before the signal was triggered. Modified returns is the average daily trading volume over a given trading window. Experiments PCA/ETF actual price vs. using trading volume
Top 50 eigenvalues of the correlation matrix—trading time Top 50 eigenvalues of the correlation matrix of market returns computed on May estimated using a 1-year window and a universe of 1590 stocks
Top 50 eigenvalues of the correlation matrix——calendar time Top 50 eigenvalues of the correlation matrix of market returns computed on May estimated using a 1-year window and a universe of 1590 stocks
Value of the first eigenvector
Future work Experiment on ETF Associate each stock with one ETF Compare ETF with PCA Take into account Transaction fee, interest, dividend Calculate PCA using trading-time modified return
THANK YOU