Lecture 4: Estimating the Covariance Matrix
We we will learn in this lecture Fundamental Methods of deriving the covariance matrix from datasets Dealing with non-stationary data sets Extensions to the basic estimation model: Factor Models
Estimating the Covariance Matrix Up until now we have treated the covariance matrix as something we just happen to know When we build a stochastic model we frequently have to estimate the covariance matrix from sampled data The basic methods of estimation are straight forward but before we introduce them we need to make sure that the covariance matrix has meaning when applied to a data set
What we are trying to measure We are using the covariance matrix and the expected return vector to describe the behaviour of random variables. But does it makes sense to apply them to any random variable? As we will shortly see the answer is no, for some random variables they have little or no meaning Therefore we cannot use the covariance matrix to describe some datasets
Observation 1:We Need A Random Variable With A Central Tendency The basis of our covariance matrix is the movement about the expected value Both variance and covariance are based on the idea of movement about a central point If the random variable does not have a central tendency then our method of measuring movement about a centre is meaningless It is also possible to adjust series for any predictable trends
Observation 2: We Need A Random Variable That Has Pattern To Its Behaviour There is little point is trying to describe the behaviour of something that does not have any behaviour! Are the random variable drawn from the same underlying world of causality? Is there a constant set of causal factors influencing the observations we are seeing in the dataset, or are the rules changing? If the rules are changing are they changing gradually?
The Concept Of Covariance Stationarity A set of random variables are said to be Covariance Stationary if their mean (or central tendency), variance and covariance are static A set of random variables are said to be Strictly Stationary if their joint distribution is stationary Covariance Stationary and Strictly Stationary are the same when we are dealing with the multivariate normal distribution
Our ‘Imaginary’ Hypothesis We know that the probability of observing a set of random variables (A,B,…,Z) is described by a multivariate normal distribution We have a finite set of observations for these random variables (A1,B1,….Z1), (A2,B2,….Z2),… We want to know what multivariate normal distribution, out of all the possible distributions, would most likely produce the data set we observed This is approach is called the maximum likelihood estimator
Maximum Likelihood What Normal Distribution Will most likely produce our data set? Our Data Sample Maximum Likelihood Distribution
The maximum likelihood For the multivariate normal distribution the maximum likelihood for its parameters can be calculated by averaging the observations Est. E(A) = 1/N S Ai Est. Var(A) = (1/N-1) S(Ai – E(Ai))2 Est. Cov(A,B) = (1/N-1) S(Ai – E(Ai)).(Bi – E(Bi)) where Ai and Bi are the ith observations
Filling in the Covariance Matrix Est. Var(A) Est. Cov(B,A) Est. Cov(C,A) Est. Cov(A,B) Est. Var(B) Est. Cov(C,B) Est. Cov(A,C) Est. Cov(B,C) Est. Var(C)
Excel’s Support for the Maximum Likelihood Estimate Excel has support calculating the maximum likelihood covariance matrix under: Tools -> Data Analysis -> Covariance. The Covariance dialog is as follow: Input Data Sample For Covariance Matrix Use Data Labels Output Range for Covariance Matrix
What To Do If The Data Does Not Have A Central Tendency When the dataset we are dealing with does not have a ‘central’ it is normally possible to ‘transform’ it to a dataset that does This is one of the reasons why we use returns rather than price The value of the FTSE 100 Index does not have a central tendency, but we could say that the proportional rate of growth in the index does The absolute level might not have a central tendency but the rate of growth in that level might have
What If The Behaviour Of The Variables Change Across Time? This is a more fundamental problem The nature of the thing we are measuring is changing across time However it might be reasonable to assume that it is changing slowly Mean, Variance and Covariance might change slowly. Intuitively we can deal with this type of non-stationarity by giving more recent estimates more weight We do not weight all observations equally More recent observations are more valuable
Variable Weight Equations Let oij be the ith observation of the jth series, wi be the ith weight attached to that observation then we make our estimates thus:
Choosing The Weights The selection of the weights is subjective but should reflect the basic idea that more recent observations are given greater importance The window over which we take our estimates is also subjective A formula used by some investment banks is: This produces a series 0.5,0.33,0.25…. decaying geometrically
Decaying Weights Observation Weight The further in the past the observation the smaller its weight in our calculation Time Offset
Problems with the Max Likelihood Method The maximum likelihood method requires us to estimate a large number of covariances ((N2 – N) / 2) Even if we have a large data set spurious correlations can enter into our matrix by chance Methods such as efficient frontier calculations which seek to exploit idiosyncratic behaviour will compound these spurious results! Direct estimation of the covariances does not explain why the ‘cause’ of the link or covariance
Factor Models One of the main technique used to overcome the problems experienced using the maximum likelihood method is factor modelling Factor Models impose structure on the relationships between the various elements of the covariance matrix This structure allows us to greatly reduce the number of parameters we need to estimate It also provides us with an explanation and breakdown of the ‘covariances’ not just their magnitude
Factor Model Formula The factor model seeks to describe an observed outcome in terms of some underlying parameters We will be dealing with linear factor models of the form: oi = bi1*f1 + bi2*f2 + … + biN*fN + ei where oi is the ith observed outcome biN is the sensitivity to the ith observed outcome to the Nth factor fN is the Nth factor ei is the unexplained random component of the ith observation
Observation explained by Factors Factor Model Diagram Input Factors Observation explained by Factors f1 Factor Model f2 bA1*f1 + bA2*f2 + bA3*f3 MA f3 Actual Observation OA ei Unexplained Noise
Assumptions made by factor models The correlations between the error term (ei) and the factors (f1, f2 etc) are 0 (this is guaranteed if we use regression to estimate the factor model) The error terms between the two observations i and j explained by the factor model uncorrelated (Cov(ei,ej) = 0) This uncorrelated error assumption is vital if we are to use factor models to estimate the covariance matrix accurately since it states that all the covariance is described by the factors Uncorrelated errors are only guaranteed if we do not leave important factors out of our model!
Estimating Factor Models Once the factors have been selected we can use standard linear regression techniques can be used to estimate factor sensitivities The problem can be reduced to finding the best fit line or plane between the observations and factors the slope of the line or plane are the sensitivities (b1, b2)
Explaining Covariance with Factor Models The relationships between different observations can be explain in terms of their relationships to the underlying factors If observation A and observation B are correlated then their correlation can be explained interms of their underlying factors By quantifying relationships interms of a factor model we greatly reduce the number of parameters we need to estimate
Diagram of Factor Model Method Vs Maximum Likelihood Observations Underlying Factor Observations Observations All Observations are indirectly related by an underlying Factor(s) All Observations are directly related to each other
Deriving the Observation Covariances from the Factor Model Once we have defined the factor model for a series of observations we can generate the implied variances and covariance for those observations For a 1 factor model: Var(O1) = b112 . Var(f1) + Var(e1) Cov(O1,O2) = b11.b21.Var(f1)
Proof for the Variance of the Observation from the Factors The 1 factor model: O1 = b11. f1 + e1 The Expection of the Observation: E(O1) = E(b11. f1 + e1) = b11. E(f1) + E(e1) = b11. E(f1) The Variance of the Observation: Var(O1) = E(b11. f1 + e1 - b11. E(f1) )2 = b112.Var(f1) + Var(e1) In General we do not use the factor model to estimate the variance of the observation and estimate it directly from the dataset. The Factor Model does not simplify the task of estimating variances in isolation.
Proof for the Covariance of the Observations from the Factors The 1 factor model: O1 = b11. f1 + e1 O2 = b21. f1 + e2 Cov(O1,O2) = Cov(b11. f1 + e1, b21. f1 + e2) = E((b11.f1 - b11. E(f1) + e1).(b21. f1 – b21. E(f1) + e2)) = E(b11. b21 .(f1 - E(f1)). (f1 –E(f1))) + E(b11.(f1 - E(f1)).e2) + E(b21.(f1 - E(f1)).e1) + E(e1. e2) = b11.b21.Var(f1) + 0 + 0 + 0 = b11.b21.Var(f1)
Intuitive Explanation Of The Covariance Equation The link between observation O1 and O2 is via factor f1. The stronger the link between O1 and O2 the stronger their links with the underlying factor by which they are related (ie the larger b11 and b21 the stronger their covariance). The larger the variance of the underlying factor the more it moves the related observations O1 and O2 and in turn the larger their covariance. If O1 and O2 are strongly related to a factor with a very low variance they will have a low covariance! (The factor never moves so the observations never move in unison as a result.)
Diagram Of The Derived Covariance Factor The larger the variance in the factor the more it will move the observations b1 b2 Observation 1 Observation 2 Covariance measures the relationship in movement
Advantages and Disadvantages of Factor Models The main advantage of a factor model is that it greatly reduces the number of parameters we need to estimate Model Params 2 Factor Model Max Likelihood 5 10 20 40 190 100 200 4950 The disadvantage is we need to have some insight into the mechanics of the observations to produce the factor model
Sharpe’s Single Factor Model Sharpe’s single factor model (also called the market model) seeks to explain variations in the returns on the ith asset in terms of variations in the market The market is represented by the appropriate stock market index (eg FTSE 100) It is closely related to the CAPM model widely used in finance
Formulation of the Sharpe Model The Sharpe Factor Model states that returns are only deterministically related to the market index and all other movement is noise unique to that stock: Rit = ai + bi.RMt + eit Ritis the return on the ith asset at time t RMt is the return on the market index at time t eit is the noise on the ith asset at time t ai is the difference between the expected return on the market index and the expected return on the ith asset bi is the sensitivity of the return of the return on ith assets to changes in the return on the market index
Calculating the Sharpe Factor Model Parameters The Sharpe Factor Model is calculated by using linear regression between the returns on the ith asset and the return on the market portfolio Once we have parameterised the Sharpe model we can use it to obtain an estimate for the covariance matrix We can estimate bi by taking: Cov(Rit, RMt) / Var(RMt) This is the same as the OLS estimate of the linear relationship between the return on the asset and return on the market
Generating a 2 by 2 Covariance Matrix Estimate b1 and b2 from the data set Estimate the variance of the market portfolio returns Var(RM) from the data set Estimate the variance of returns on asset 1 and 2 Var(R1) and Var(R2) from the data set Var(R1) b2 . b1. Var(RM) b1. b2 . Var(RM) Var(R2)
3 by 3 Covariance From Factor Model Var(R1) b2. b1 . Var(RM) b3. b1 . Var(RM) b1. b2 . Var(RM) Var(R2) b3. b2 . Var(RM) b1. b3 . Var(RM) b2. b3 . Var(RM) Var(R3)