Risk Analysis & Modelling Lecture 10: Extreme Value Theory.

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Estimation of Means and Proportions
Risk Measurement for a Credit Portfolio: Part One
Comparing Two Proportions (p1 vs. p2)
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Market-Risk Measurement
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Probability Distributions
Evaluating Hypotheses
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
Sampling Distributions
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Lecture Slides Elementary Statistics Twelfth Edition
Market Risk VaR: Historical Simulation Approach
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Maximum likelihood (ML)
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
Stress testing and Extreme Value Theory By A V Vedpuriswar September 12, 2009.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Lecture 7: Simulations.
Lecture 5: Value At Risk.
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
Week 2 CS 361: Advanced Data Structures and Algorithms
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Business Statistics: Communicating with Numbers
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 5 Discrete Probability Distributions 5-1 Review and Preview 5-2.
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
SPC for Real-World Processes William A. Levinson, P.E. Intersil Corporation Mountaintop, PA.
Theory of Probability Statistics for Business and Economics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution.
Statistical Sampling & Analysis of Sample Data
CS433 Modeling and Simulation Lecture 15 Random Number Generator Dr. Anis Koubâa 24 May 2009 Al-Imam Mohammad Ibn Saud Islamic University College Computer.
Risk Analysis & Modelling Lecture 9: Auto-Correlated Risks.
CS6825: Probability Distributions An Introduction.
Market Risk VaR: Historical Simulation Approach N. Gershun.
1 A non-Parametric Measure of Expected Shortfall (ES) By Kostas Giannopoulos UAE University.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Topics for our first Seminar The readings are Chapters 1 and 2 of your textbook. Chapter 1 contains a lot of terminology with which you should be familiar.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Risk Analysis & Modelling
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Extreme Value Theory for High Frequency Financial Data Abhinay Sawant April 20, 2009 Economics 201FS.
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Lecture 3 Types of Probability Distributions Dr Peter Wheale.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Managerial Economics & Decision Sciences Department random variables  density functions  cumulative functions  business analytics II Developed for ©
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Market-Risk Measurement
Market Risk VaR: Historical Simulation Approach
CONCEPTS OF ESTIMATION
Social Science Statistics Module I Gwilym Pryce
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Presentation transcript:

Risk Analysis & Modelling Lecture 10: Extreme Value Theory

What we will learn in this lecture We will look at a method for dealing specifically with infrequent, extreme events: EVT (Extreme Value Theory) EVT can be used to describe the tails of almost any distribution EVT can be used as a method of calculating a distribution independent measure of VaR The limitations of EVT More advanced programming techniques in VBA!

The Tails Of A Distribution & Risk In our look at Value At Risk we estimated the likely loss by describing the complete behaviour of a random variable From this distribution we took the 5% lower tail or 1% lower tail as our estimate of a serious, but possible loss We had to make some strict assumptions about the distribution in order to estimate the position of these losses Since we are only interested in the tails of the distribution cant we make fewer assumptions and just focus on the tails? The answer is yes and the method is EVT!

Tail Risks From VaR We make a lot of strict assumptions about the random variable in order to estimating the whole distribution All we are interested is the lower tail We do not need a complete description of the random distribution just the lower tail. Isn’t this inefficient?

Probability Distribution Recap Before we discuss EVT we will recap some statistics Imagine we have a probability distribution F which describes a continuous random variable The definition of a probability distribution is that it describe the chance of observing a value equal or less than a given level It is sometimes called Cumulative Density So F(0.5) would give the probability that our random variable will take a value equal to or less than 0.5 We will assume we are dealing with variables that can be between –infinity and +infinity F(-infinity) is 0, F(+infinity) is 1 If the upper bound is +infinity then were are certain the random variable will be bellow it!

Probability Distribution Example For Random Variable X X -X P(X<=C) C

Peak Over Threshold Distribution The Peak Over Threshold Distribution (POTD) describes the distribution of a random variable given that we know it has exceeded a given boundary An example of such a distribution would be the distribution describing daily returns greater than 5% This distributions is obviously related to the distribution describing the complete behaviour of the random variable More specifically the POTD is: The probability of observing a value of X that is less than or equal to y+u given that X is above u

If we know the Probability Distribution describing a random variable X, F, then we can express the POTD in terms of F, u, y: F(y+u) is the probability that X will be less than or equal to y+u F(u) is the probability that X will be less than or equal to u F(y+u) - F(u) is the probability that X will be greater than u but less than or equal to y+u 1-F(u) is the probability that u will be greater than u (F(inf) = 1) Since our probability distribution is conditional of the fact that we are above the boundary u we divide by the probability 1–F(u) (ie rescale our probabilities)

POTD Interpretation u u+y X PDF(X) POTD measures the probability that X will be greater than u and less than u + y, given that we know X is greater than u u Probability that X is greater than a threshold value u

A Very Important Result It can be shown that regardless of the probability distribution F of the variable the POTD distribution approaches a set distribution as the threshold u increases: Where G is a Generalised Pareto Distribution (GPD):  is the shape parameter,  is a scaling parameter (like  scales the normal distribution) When  > 0 we say the GPD has ‘heavy’ tails

Use Of POTD Limit Re-expressing this result we can say that for all values of x greater than u This means we can estimate the Probability Distribution F(x) for the tail (x>u) interms of the Generalised Pareto Distribution All we have to do is find out a way of estimating G’s parameters!

Peak Over Threshold Approach If we have an observation of a random variable X then EVT tells us that if we set a high enough boundary (u) the distribution describing the random points above this boundary will have a Generalised Pareto Distribution u X Random Points Over Threshold u Are Described By GPD

We can describe the tail of the distribution by fitting a GPD to the data-points above the boundary If 10% of the points are above the boundary then we say that F(u) is 90% (ie 90% of the points are below the boundary u, this is our estimate for the specific value F(u), remember we do not know what F(u) looks like!) We estimate F(u) by from the number of points above the boundary (N u ) relative to to the total N of points in our data set: F(u) = 1 – (N u /N) For this boundary the relationship between F(x) and G(x) would be: F(x) describes the probability distribution for x above the boundary u We can estimate the parameters of G ( ,  ) using maximum likelihood What values for ,  are most likely to produce the points we observe above the boundary?

Maximum Likelihood Estimator We have a set of data points (S) we observe above the boundary we set u (A 1,A 2,A 3..) What is the probability of observing this set of results? Given they are independent it is the product of the probability of observing each individually: We want to select the GPD which maximises P(S) We can also express this problem as selecting the GPD to maximise the log likelihood, (which is often a simpler problem to solve)

We still have to work out how to calculate the probability of observing a point above the boundary This is given by the probability density function (pdf) for the GPD: Note this is just the derivative of G with respect to y We notice that the log of this is:

So to find the GPD distribution that is most likely to have produced the data points we have observed above the boundary we simply have to find the values of ,  that maximise: We A i is the set of observations above the boundary we set We cannot user solver to find these values! We need to use a grid searching algorithm because it can have multiple peaks Once we have ,  we have a GPD tail distribution that we can use to calculate Value at Risk! Max ln(P(s)) by changing , 

Using EVT To Calculate VaR VaR tells us how far into the tail of a distribution we have to go to be sure only 5% or 1% of possible outcomes will be bellow that point Since EVT describes the tails we should be able to use it to calculate VaR We want to use the tail distribution to ask at what level of loss can we say that only X% of losses will be less than that loss level? There are 3 problems that must be solved before we can calculate VaR using EVT

Problem 1: EVT Deals With The Upper Tail The EVT model we have looked at deals exclusively with the upper tail of the distribution, while VaR deals with the lower tail The solution to this is fairly simple, instead of measuring returns on the portfolio we measure losses. A positive loss (L) is a negative return (R), A positive return is a negative loss. Using this definition the problem of finding that maximum loss is to find the upper tail of the distribution describing L.

Problem 2: Selecting The Upper Boundary To use the Peak Over Threshold we need to set an upper boundary on the level of loss and only look at points above that line Since the tail distribution is only valid above this peak we must select a threshold which is not above the level of VaR we wish to calculate For example if we set our threshold so that only 3% of the distribution is above it we cannot then use this tail distribution to estimate the 5% tail

Diagram of the relationship between the threshold and the VaR level we can estimate X u The level of the threshold determines how much of the tail our GPD estimates The VaR confidence interval must be contained by our tail estimation! We only estimate the distribution above our boundary

Problem 3: Inverting the Tail Estimator Our tail estimator for the function is: F(x) tells us the probability of observing a value less than or equal to x (loss) We are interested in finding the the level of x (loss) for which there is only a probability P of observing losses greater than x For example we want to find the loss level which we can say only 1% of losses will be greater than We need to rearrange the above to get x interms of the probability rather than the probability in terms of x

This comes down to reordering After some work: We observe that F(x) measures the probability that the loss L is less than or equal to some level x (L <= x). We want the probability the loss is greater than some level x V(x). Which is simply V(x) = 1-F(x).

The Terms Of EVT VaR V(x) is the VaR Confidence level, such as 5% for 5% VaR X is the upper boundary on loss that we only expect to be above V(x) % of the time U is the level of the threshold we set (the loss level) to estimate the tail N is the total number of observations for losses in our dataset and N u, therefore our estimate for 1-F(u) is N u /N ,  are terms we estimate using Maximum Likelihood from the points over the threshold from our dataset of losses This is only value for risks estimate above u, our tail estimator is only valid above the threshold we set!

The Advantages of EVT The advantage of EVT is that it just focuses on the tail of the distribution The GPD this method can estimate the fat tail losses we observe in financial instruments and insurance liabilities The calculation is not excessively complex or computationally intensive

The Problem With EVT The problem with EVT is that we need a lot of data to estimate the GPD of the tail The higher we set the boundary for the Peak Over Threshold the closer our distribution will be to the GPD Unfortunately the higher we set the boundary the less data we have to estimate the GPD EVT is still under development, we will have to wait and see what people come up with!

Part 2: Arrays, Objects & More VBA Tricks

A Recap of Last Week Last week we introduced the key concepts of variables and statements Variables were like boxes that store a single piece of data Statements were instructions to the computer We looked at If statements and Loops This week will look at a special type of variable called an array We will look at how to create our own variable types with objects!

An Array An array is a variable that can store more than one value Last week we looked at variables as boxes that only contain one value An array is a variable that can contain many variables Arrays are important because often we want to store lists or blocks of things of various length of variable length (such as a list of all the students in the class) Each element in the array is identified by a number

Creating an Array Let us say we want to create an Array of 10 strings we would write: Dim StudentNames(10) as String If we wanted to create an Array of 500 Daily Returns on a stock: Dim DailyReturns(500) as Double If we wanted to create a list of student ages where ClassSize is an integer variable we would say: Dim StudentAges(ClassSize) as Integer If the variable ClassSize contained 35 then the Array StudentAges would contain 35 variables

Accessing The Elements Of An Array Just like a Variable the Array is initially blank Let us say we wanted to assign the name Frank Bloggs to the first element in the StudentNameArray we would write: StudentNameArray(1) = “Frank Bloggs” StudentNameArray(1) would now contain the string “Frank Bloggs”

The For Loop The for loop is a different type of loop It is especially designed for the case where we use an integer variable to count the number of loops If we wanted to count all the cells in column A which have a value greater than 0.5 we would write Dim I as integer Dim CellCount as Integer Dim CellValue as Double For I = 1 to 100 CellValue = Cells(I,1) If CellValue > 0.5 then CellCount = CellCount + 1 End If Next I

If..Then..Else..End If Last week we looked at If Then Blocks There is an extension to this called If Then Else Blocks: If MyNumber > 0.5 Then Call MsgBox(“MyNumber is Greater Than 0.5”) Else Call MsgBox(“MyNumber is Not Greater Than 0.5”) End if This is useful when we want to say to the computer: “if this is then do this else do something else”, rather than just saying “if this is do this”.

Introduction To Objects Using objects we can create our own variable types So we could say: Dim TopStudent as New Student This is very useful and is the basis for object oriented programming (OO) The type Student is know as a class (the type of the variable) and TopStudent is an object of type student (instance or example of that class)

Class Modules To create our own variable types or “Classes” we have to create a Class Module The name we give the Class Module is the name of the new Variable Type we create There is a class module called student containing the following: Public StudentName as String Public StudentAge as Integer Public StudentGrade as Double Every object of type student will have 3 sub- variable or members: StudentName, StudentAge and StudentGrade They are declared as Public so we can access them outside the class module

Using Objects Here is some example code of using an object of type Student: Dim TopStudent as new Student TopStudent.Name = “Frank Bloggs” TopStudent.StudentAge = 32 TopStudent.StudentGrade = 72.1 Notice how we access the members of the object using a ‘.’ Objects make the code readable Objects have many uses in advanced programming techniques but those are for you to discover!

THE END