Materials for Lecture 12 Chapter 7 – Study this closely Chapter 16 Sections and 4.3 Lecture 12 Multivariate Empirical Dist.xls Lecture 12 Multivariate Normal Dist.xls
Multivariate Probability Distributions Definition: Multivariate (MV) Distribution -- Two or more random variables that are correlated MV you have 1 distribution with 2 or more random variables Univariate distribution we have many distributions (one for each random variable)
Parameter Estimation for MV Dist. Data were generated contemporaneously –Output observed each year or month, –Prices observed each year for related commodities Corn and sorghum used interchangeably for animal feed Steer and heifer prices related Fed steer price and Feeder steer prices related –Supply and demand forces affect prices similarly, bear market or bull market; prices move together Prices for tech stocks move together Prices for an industry or sector’s stocks move together
Different MV Distributions Multivariate Normal distribution – MVN Multivariate Empirical – MVE Multivariate Mixed where each variable is distributed differently, such as –X ~ Uniform –Y ~ Normal –Z ~ Empirical –R ~ Beta –S ~ Gamma
Sim MV Distribution as Independent If correlation is ignored when random variables are correlated, results are biased: If Z = Ỹ 1 + Ỹ 2 OR Z = Ỹ 1 * Ỹ 2 and the model is simulated without correlation –But the true ρ 1,2 > 0 then the model will understate the risk for Z –But the true ρ 1,2 < 0 then the model will overstate the risk for Z If Z = Ỹ 1 * Ỹ 2 –The Mean of Z is biased, as well
Parameters for a MVN Distribution Deterministic component –Ŷ ij -- a vector of means or predicted values for the period i to simulate all of the j variables, for example: Ŷ ij = ĉ 0 + ĉ 1 X 1 + ĉ 2 X 2 Stochastic component –ê ji -- a matrix of residuals from the predicted or mean values for each (j) of the M random variables ê ji = Y ij – Ŷ ij and the Std Dev of the residuals σ êj Multivariate component –Covariance matrix (Σ) for all M random variables in the distribution MxM covariance matrix (in the general case use correlation matrix) –Estimate the covariance (or correlation) matrix using residuals about the forecast (or the deterministic component) σ 2 11 σ 12 σ 13 σ 14 1 ρ 12 ρ 13 ρ 14 Σ =σ 2 22 σ 23 σ 24 OR Ρ = 1 ρ 23 ρ 24 σ 2 33 σ 34 1 ρ 34 σ
3 Variable MVN Distribution Deterministic component for three random variables –Ĉ i = a + b 1 C i-1 –Ŵ i = a + b 1 T i + b 2 W i-1 –Ŝ i = a + b 1 T i Stochastic component –ê Ci = C i – Ĉ i –ê Wi = W i – Ŵ i –ê Si = S i – Ŝ i Multivariate component σ 2 cc σ cw σ cs Σ =σ 2 ww σ ws σ 2 ss
Simulating MVN in Simetar One Step procedure for a 4 variable Highlight 4 cells if the distribution is for 4 variables, type =MVNORM( 4x1Means Vector, 4x4 Covariance Matrix) =MVNORM( A1:A4, B1:E4) Control Shift Enter where: the 4 means or forecasted values are in column A rows 1-4, covariance matrix is in columns B-E and rows 1-4 If you use the historical means, the MVN will validate perfectly, but only forecasts (simulates) the future if the data are stationary. If you use forecasts rather than means, the validation test fails for the mean vector. –The CV will differ inversely from the historical CV as the means increase or decrease relative to history
Example of Mean vs. Y-Hat Problem for Validation
Simulating MVN in Simetar Two Step procedure for a 4 variable MVN Highlight 4 cells if the distribution is for 4 variables, and type =CUSD (Location of Correlation Matrix) Control Shift Enter =CUSD (B1:E4) for a 4x4 correlation matrix in cells B1:E4 Next use the individual CSNDs to calculate the random values, using Simetar NORM function: For Ỹ 1 = NORM( Mean 1, σ 1, CUSD 1 ) For Ỹ 2 = NORM( Mean 2, σ 2, CUSD 2 ) For Ỹ 3 = NORM( Mean 3, σ 3, CUSD 3 ) For Ỹ 4 = NORM( Mean 4, σ 4, CUSD 4 ) Use Two Step if you want more control of the process
Example of MVN Distribution Demonstrate MVN for a distribution with 3 variables One step procedure in line 63 Means in row 55 and covariance matrix in B58:D60 Validation test shows the random variables maintained historical covariance
Two Step MVN Distribution
Review Steps for MVN Develop parameters –Calculate averages (and standard deviations used for two step procedure) –Calculate Covariance matrix –Calculate Correlation matrix (Used for Two Step procedure and for validation of One Step procedure) One Step MVN procedure is easier Use Two Step MVN procedure for more control of the process Validate simulated MVN values vs. historical series –If you use different means than in history, the validation test for means vector WILL fail
Parameters for MV Empirical Step I Deterministic component for three random variables –Ĉ i = a + b 1 C i-1 –Ŵ i = a + b 1 T i + b 2 W i-1 –Ŝ i = a + b 1 T i Step II Stochastic component will be calculated from residuals –ê Ci = C i – Ĉ i –ê Wi = W i – Ŵ i –ê Si = S i – Ŝ i Step III Calculate the stochastic empirical distributions parameter –S Ci = Sorted (ê Ci / Ĉ i ) –S Wi = Sorted (ê Wi / Ŵ i ) –S Si = Sorted (ê Si / Ŝ i ) Step IV Multivariate component is a correlation matrix calculated using unsorted residuals in Step II
Simulating MVE in Simetar One Step procedure for a 4 variable MVE Highlight 4 cells if the distribution is for 4 variables, then type =MVEMP( Location Actual Data,,,, Location Y-Hats, Option) Option = 0 use actual data Option = 1 use Percent deviations from Mean Option = 2 use Percent deviations from Trend Option = 3 use Differences from Mean End this function with Control Shift Enter =MVEMP(B5:D14,,,, G7:I6, 2) Where the 10 observations for the 3 random variables are in rows 5-14 of columns B-D and simulate as percent deviations from trend
Two Step MVE Two Step procedure for a 4 variable MVE Highlight 4 cells if the distribution is for 4 variables, type =CUSD( Location of Correlation Matrix) Control Shift Enter =CUSD( A12:A15) Next use the CUSDs to calculate the random values (Mean here could also be Ŷ) For Ỹ 1 = Mean 1 + Mean 1 * Empirical(S 1, F(S i ), CUSD 1 ) For Ỹ 2 = Mean 2 + Mean 2 * Empirical(S 2, F(S i ), CUSD 2 ) For Ỹ 3 = Mean 3 + Mean 3 * Empirical(S 3, F(S i ), CUSD 3 ) For Ỹ 4 = Mean 4 + Mean 4 * Empirical(S 4, F(S i ), CUSD 4 ) Use Two Step if you want more control of the process
Parameter Estimation for MVE
Simulate a MVE Distribution
Validation of MV Distributions Simulate the model and specify the random variables as the KOVs then test the simulated random values Perform the following tests –Use the Compare Two Series Tab in HoHi to: Test means for the historical series or the forecasted means vs. the simulated means Test means and covariance for historical series vs. simulated –Use the Check Correlation Tab to test the correlation matrix used as input for the MV model vs. the implied correlation in the simulated random variables Null hypothesis (Ho) is: Simulated correlation ij = Historical correlation coefficient ij Critical t statistic is 1.98 for 100 iterations; if Null hypothesis is true the calculated t statistics will exceed 1.98 Use caution on means tests if your forecasted Ŷ is different from the historical Ῡ
Validation of MV Distributions
Test Correlation for MV Distributions Test simulated values for MVE and MVN distribution to insure the historical correlation matrix is reproduced in simulation –Data Series is the simulated values for all random variables in the MV distribution –The original correlation matrix used to simulate the MVE or MVN distribution
Validation Tests in Simetar Student t Test is used to calculate statistical significance of simulated correlation coefficient to the historical correlation coefficient –You want the test coefficient to be less than the Critical Value –If the calculated t statistic is larger than the Critical value it is bold
MV Mixed Distributions What if you need to simulate a MV distribution made up of variables that are not all Normal or all Empirical? For example: –X is ~ Normal –Y is ~ Beta –T is ~ Gamma –Z is ~ Empirical Develop parameters for each variable Estimate the correlation matrix for the random variables in the distribution
MV Mixed Distributions Simulate a vector of Correlated Uniform Standard Deviates using =CUSD() function =CUSD( correlation matrix ) is an array function so highlight the number of cells that matches the number of variables in the distribution Use the CUSD i values in the appropriate Simetar functions for each random variable =NORM(Mean, Std Dev, CUSD 1 ) =BETAINV(CUSD 2, Alpha, Beta) =GAMMAINV(CUSD 3, P1, P2) =Mean*(1+EMP(S i, F(S i ), CUSD 4 ))