Tutorial Financial Econometrics/Statistics 2005 SAMSI program on Financial Mathematics, Statistics, and Econometrics
Goal
At the index level
Part I: Modeling ... in which we see what basic properties of stock prices/indices we want to capture
Contents Returns and their (static) properties Pricing models Time series properties of returns
Why returns? Prices are generally found to be non-stationary Makes life difficult (or simpler...) Traditional statistics prefers stationary data Returns are found to be stationary
Which returns? Two type of returns can be defined Discrete compounding Continuous compounding
Discrete compounding If you make 10% on half of your money and 5% on the other half, you have in total 7.5% Discrete compounding is additive over portfolio formation
Continuous compounding If you made 3% during the first half year and 2% during the second part of the year, you made (exactly) 5% in total Continuous compounding is additive over time
Empirical properties of returns Mean St.dev. Annualized volatility Skewness Kurtosis Min Max IBM -0.0% 2.46% 39.03% -23.51 1124.61 -138% 12.4% (corr) 0.0% 1.64% 26.02% -0.28 15.56 -26.1% S&P 0.95% 15.01% -1.4 39.86 -22.9% 8.7% Data period: July 1962- December 2004; daily frequency
Stylized facts Expected returns difficult to assess What’s the ‘equity premium’? Index volatility < individual stock volatility Negative skewness Crash risk Large kurtosis Fat tails (thus EVT analysis?)
Pricing models Finance considers the final value of an asset to be ‘known’ as a random variable , that is In such a setting, finding the price of an asset is equivalent to finding its expected return:
Pricing models 2 As a result, pricing models model expected returns ... ... in terms of known quantities or a few ‘almost known’ quantities
Capital Asset Pricing Model One of the best known pricing models The theorem/model states
Black-Scholes Also Black-Scholes is a pricing model (Exact) contemporaneous relation between asset prices/returns
Time series properties of returns Traditionally model fitting exercise without much finance mostly univariate time series and, thus, less scope for tor the ‘traditional’ cross-sectional pricing models lately more finance theory is integrated Focuses on the dynamics/dependence in returns
Random walk hypothesis Standard paradigm in the 1960-1970 Prices follow a random walk Returns are i.i.d. Normality often imposed as well Compare Black-Scholes assumptions
Box-Jenkins analysis
Linear time series analysis Box-Jenkins analysis generally identifies a white noise This has been taken long as support for the random walk hypothesis Recent developments Some autocorrelation effects in ‘momentum’ Some (linear) predictability Largely academic discussion
Higher moments and risk
Risk predictability There is strong evidence for autocorrelation in squared returns also holds for other powers ‘volatility clustering’ While direction of change is difficult to predict, (absolute) size of change is risk is predictable
The ARCH model First model to capture this effect No mean effects for simplicity ARCH in mean
ARCH properties Uncorrelated returns Correlated squared returns martingale difference returns Correlated squared returns with limited set of possible patterns Symmetric distribution if innovations are symmetric Fat tailed distribution, even if innovations are not
The GARCH model Generalized ARCH Beware of time indices ...
GARCH model Parsimonious way to describe various correlation patterns for squared returns Higher-order extension trivial Math-stat analysis not that trivial See inference section later
Stochastic volatility models Use latent volatility process
Stochastic volatility models Also SV models lead to volatility clustering Leverage Negative innovation correlation means that volatility increases and price decreases go together Negative return/volatility correlation (One) structural story: default risk
Continuous time modeling Mathematical finance uses continuous time, mainly for ‘simplicity’ Compare asymptotic statistics as approximation theory Empirical finance (at least originally) focused on discrete time models
Consistency The volatility clustering and other empirical evidence is consistent with appropriate continuous time models A simple continuous time stochastic volatility model
Approximation theory There is a large literature that deals with the approximation of continuous time stochastic volatility models with discrete time models Important applications Inference Simulation Pricing
Other asset classes So far we only discussed stock(indices) Stock derivatives can be studied using a derivative pricing models Financial econometrics also deals with many other asset classes Term structure (including credit risk) Commodities Mutual funds Energy markets ...
Term structure modeling Model a complete curve at a single point in time There exist models in discrete/continuous time descriptive/pricing for standard interest rates/derivatives ...
Part 2: Inference
Contents Parametric inference for ARCH-type models Rank based inference
Analogy principle The classical approach to estimation is based on the analogy principle if you want to estimate an expectation, take an average if you want to estimate a probability, take a frequency ...
Moment estimation (GMM) Consider an ARCH-type model We suppose that can be calculated on the basis of observations if is known Moment condition
Moment estimation - 2 The estimator now is taken to solve In case of “underidentification”: use instruments In case of “overidentification”: minimize distance-to-zero
Likelihood estimation In case the density of the innovations is known, say it is , one can write down the density/likelihood of observed returns Estimator: maximize this
Doing the math ... Maximizing the log-likelihood boils down to solving with
Efficiency consideration Which of the above estimators is “better”? Analysis using Hájek-Le Cam theory of asymptotic statistics Approximate complicated statistical experiment with very simple ones Something which works well in the approximating experiment, will also do well in the original one
Quasi MLE In order for maximum likelihood to work, one needs the density of the innovations If this is not know, one can guess a density (e.g., the normal) This is known as ML under non-standard conditions (Huber) Quasi maximum likelihood Pseudo maximum likelihood
Will it work? For ARCH-type models, postulating the Gaussian density can be shown to lead to consistent estimates There is a large theory on when this works or not We say “for ARCH-type models the Gaussian distribution has the QMLE property”
The QMLE pitfall One often sees people referring to Gaussian MLE Then, they remark that we know financial innovations are fat-tailed ... ... and they switch to t-distributions The t-distribution does not possess the QMLE property (but, see later)
How to deal with SV-models? The SV models look the same But now, is a latent process and hence not observed Likelihood estimation still works “in principle”, but unobserved variances have to be integrated out
Inference for continuous time models Continuous time inference can, in theory, be based on continuous record observations discretely sampled observations Essentially all known approaches are based on approximating discrete time models
... in which we discuss the main ideas of rank based inference
The statistical model Consider a model where ‘somewhere’ there exist i.i.d. random errors The observations are The parameter of interest is some We denote the density of the errors by
Formal model We have an outcome space , with the number of observations and the dimension of Take standard Borel sigma-fields Model for sample size : Asymptotics refer to
Example: Linear regression Linear regression model (with observations ) Innovation density and cdf
Example ARCH(1) Consider the standard ARCH(1) model Innovation density and cdf
Maintained hypothesis For given and sample size , the innovations can be calculated from the observations For cross-sectional models one may even often write Latent variable (e.g., SV) models ...
Innovation ranks The ranks are the ranks of the innovations We also write for the ranks of the innovations based on a value for the parameter of interest Ranks of observations are generally not very useful
Basic properties The distribution does not depend on nor on permutation of This is (fortunately) not true for at least ‘essentially’
Invariance Suppose we generate the innovations as transformation with i.i.d. standard uniform Now, the ranks are even invariant with respect to
Reconstruction For large sample size we have and, thus,
Rank based statistics The idea is to apply whatever procedure you have that uses innovations on the innovations reconstructed from the ranks This makes the procedure robust to distributional changes Efficiency loss due to ‘ ’?
Rank based autocorrelations Time-series properties can be studied using rank based autocorrelations These can be interpreted as ‘standard’ autocorrelations rank based for given reference density and distribution free
Robustness An important property of rank based statistics is the distributional invariance As a result: a rank based estimator is consistent for any reference density All densities satisfy the QMLE property when using rank based inference
Limiting distribution The limiting distribution of depends on both the chosen reference density and the actual underlying density The optimal choice for the reference density is the actual density How ‘efficient’ is this estimator? Semiparametrically efficient
Remark All procedures are distribution free with respect to the innovation density They are, clearly, not distribution free with respect to the parameter of interest
Signs and ranks
Why ranks? So far, we have been considering ‘completely’ unrestricted sets of innovation densities For this class of densities ranks are ‘maximal invariant’ This is crucial for proving semiparametric efficiency
Alternatives Alternative specifications may impose zero-median innovations symmetric innovations zero-mean innovations This is generally a bad idea ...
Zero-median innovations The maximal invariant now becomes the ranks and signs of the innovations The ideas remain the same, but for a more precise reconstruction Split sample of innovations in positive and negative part and treat those separately
But ranks are still ... Yes, the ranks are still invariant ... and the previous results go through But the efficiency bound has now changed and rank based procedures are no longer semiparametrically efficient ... but sign-and-rank based procedures are
Symmetric innovations In the symmetric case, the signed-ranks become maximal invariant signs of the innovations ranks of the absolute values The reconstruction now becomes still more precise (and efficient)
Semiparametric efficiency
General result Using the maximal invariant to reconstitute the central sequence leads to semiparametrically efficient inference in the model for which this maximal invariant is derived In general use
Proof The proof is non-trivial, but some intuition can be given using tangent spaces