Data-intensive Computing Case Study Area 2: Financial Engineering B. Ramamurthy 6/26/20151B. Ramamurthy & Abhishek Agarwal
Modern Portfolio Theory Modern Portfolio Theory (MPT) is the theory of investment that tries to maximize the return and minimize the risk by analytically choosing among the different (financial) assets. MPT was introduced by Harry Morkowitz in 1952 and he received Nobel Prize for Economics in He is currently a professor of Finance at Rady School of Management at UC, San Diego. (83 years old) One of his influences was John Von Neumann, the inventor of the stored program computer. 6/26/20152B. Ramamurthy & Abhishek Agarwal
The Big Picture Stock market portfolio context: Given an amount A and a set a stocks {w, x, y, z} and the historical performance of the set of stocks, what should be (%) allocation of the amount to each of the stocks in the set so that returns are maximized and risks are minimized. Example: $10000, stocks {C, F, Q, T}, what is the recommended split among these to get best returns and least risk. Some quantitative assumptions are made about return and risks. The above is my simple interpretation of the complex problem. 6/26/20153B. Ramamurthy & Abhishek Agarwal
Reference Reference: Application of Hadoop MapReduce to Modern Portfolio Theory by Abhishek Agarwal and Bina Ramamurthy, paper submitted to ICSA Also work by Ross Goddard (UB grad at Drexel), Mohit Vora and Neeraj Mahajan (Yahoo.com, alumni) 6/26/20154B. Ramamurthy & Abhishek Agarwal
Markowitz Model How is it data intensive? – Number of assets in the financial world is quite large – Historical data for each of these assets will easily overwhelm traditional databases Would like real data on the volume of this problem. Any guess? We are currently working with assets. 6/26/20155B. Ramamurthy & Abhishek Agarwal
MPT Fundamental assumption of MPT is that the assets in an investment portfolio cannot be selected individually. We need to consider how the change in the value of every other asset can affect the given asset. Thus MPT is a mathematical formulation of the diversification in investing, with the objective of selecting a collection of investment assets that has collectively lower risk than any individual asset. In theory this is possible because different types of assets change values in opposite directions. For example, stock vs bonds, tech vs commodities Thus a collection will mitigate each other’s risks. 6/26/20156B. Ramamurthy & Abhishek Agarwal
Technical Details An asset’s return is modeled as normally distributed random variable Risk is defined as a standard deviation of return Return of a portfolio is a weighted combination of returns By combining different assets whose returns are not correlated, MPT reduced the total variance of the portfolio. 6/26/2015B. Ramamurthy & Abhishek Agarwal7
Expected Return and Variance 6/26/2015B. Ramamurthy & Abhishek Agarwal8 If E(R i )is the return on the asset i and w i is weight of the asset i, then the total expected return on the portfolio will be E(R p )= ∑wi* E(R i ) Portfolio Return Variance can be written as (σ p ) 2 = ∑∑ w i w j σ i σ j p ij where p ij is the co relation between the assets i and j. p ij for i= j is 0.
MPT explained 6/26/2015B. Ramamurthy & Abhishek Agarwal9
Portfolio Combination Compute and plot the expected returns and variance on a graph. The hyperbola derived from the plot represents the efficient frontier. Portfolio on the efficient frontier represents the combination offering the best possible return for a given risk level. Matrices are used for calculation of efficient frontier. 6/26/2015B. Ramamurthy & Abhishek Agarwal10
Efficient Frontier In the matrix form, for a given level of risk the efficient frontier is found by minimizing this expression. w T ∑w – q R T w – w represents weight of an asset in a portfolio( ∑w i =1) – ∑ is the co variance matrix – R is the vector of expected returns – q is the risk tolerance (0,∞) 6/26/2015B. Ramamurthy & Abhishek Agarwal11
What’s new? Parallel processing using MapReduce. Co-variance computation: from O(n 2 ) to O(n) 6/26/2015B. Ramamurthy & Abhishek Agarwal12
Co-variance Matrix We calculate how an asset varies in response to variations in every other asset. We use the means of monthly returns of both the assets for this purpose. This operation has to be done turn by turn for each asset. In a traditional environment this is done via nested loops. But this calculation is intrinsically parallel in nature. Each mapper i can calculate how the asset i varies in response to variations in all of other assets. The input to each mapper is the current asset and the list of all assets. Its output is a vector containing the variations of that asset with respect to the other assets. The reducer just inserts the result of the map operation into the covariance matrix table. As all the mappers execute parallel this gives us a run time of O(n). 6/26/2015B. Ramamurthy & Abhishek Agarwal13
Inverse of co-variant matrix 1.Using first principles: A -1 = adjoint(A)*(1/determinant(A)) This method requires the calculation of the transpose, determinant, cofactor, adjoint, upper triangle of a matrix. Some of these operations like transpose can be easily implemented on Hadoop. Others like determinant, upper triangle have data dependencies and therefore are not very suitable for a Hadoop like environment. 6/26/2015B. Ramamurthy & Abhishek Agarwal14
Inverse of Co-variant Matrix 1.Gaussian-Jordan Elimination Gaussian elimination that puts zeroes both above and below each pivot element as it goes from the top row of the given matrix to the bottom. [AI] = A -1 [AI]= IA -1 Gaussian- Jordan elimination has a runtime complexity of O(n 3 ). For MR it requires two sets in tandem resulting poor performance. 6/26/2015B. Ramamurthy & Abhishek Agarwal15
Inverse of a square co-variance matrix The third approach is to use single value decomposition (SVD). We use this approach for our implementation. By using the Single Value Theorem, the matrix A can be written as A= V ∑ U t where U and V are unitary matrices and ∑ is a rectangular diagonal matrix with the same size as A. Using Jacobi Eigen Value algorithm, if A is square and invertible then the inverse of A is given by A -1 = U ∑- 1 V t 6/26/2015B. Ramamurthy & Abhishek Agarwal16
Inverse of co-variance matrix using MR We need two map reduce task to implement this on hadoop. In the first task, each map task receives a row as a key and a vector of all the other rows as its value. This map emits block id and the sub vector pairs. The reduce task merges block structures based on the information of the block id. In the second task each mapper receives block id as a key and 2 sub matrices A and B as its value. The mapper multiplies both the matrices. As A will be a symmetric matrix A t *A= A*A t. The reducer computes the sum of all the blocks. 6/26/2015B. Ramamurthy & Abhishek Agarwal17
Expected Returns Matrix Using MR The expected returns matrix can be easily built on the Hadoop platform. Each mapper computes the expected return of a particular asset. All these mappers can run in parallel giving us a run time of O(1) as opposed to run time of O(n) that we would get in the traditional environment. 6/26/2015B. Ramamurthy & Abhishek Agarwal18
Multiply Variance inverse with Returns matrix Use block multiplication algorithm in MR framework. The next step of making each negative entry of a row into a positive: MR makes a O(n 2 ) algorithm into a O(n) algorithm. Sort the entries in the row once again using MR 6/26/2015B. Ramamurthy & Abhishek Agarwal19
Simulation using Hadoop Framework Besides the standard Hadoop package we also used two other packages- HBase [3] [10] and Hama [4]. Hama is a parallel matrix computation package based on Hadoop Map- Reduce. Hama proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Column families of Hbase, and utilizes the 2D blocked algorithms. We used HBase for storing the matrices. We use the Hama package for matrix multiplication, matrix transpose and for Jacobi Eigenvalue algorithm. Computational the simulation proved that time did not increase linearly with the size of data. 6/26/2015B. Ramamurthy & Abhishek Agarwal20