jrothenb – 1 Joerg Rothenbuehler
jrothenb – 2 The distribution of the Maximum:
jrothenb – 3 Fisher-Tippett Theorem
jrothenb – 4 The Extreme Value Distributions
jrothenb – 5 The three EVD can be represented by a single three parameter distribution, called the GENERALIZED EVD (GEV): Generalized Extreme Value Distribution (GEV)
jrothenb – 6 The function of the parameters
jrothenb – 7 Excesses over high thresholds
jrothenb – 8 Generalized Pareto Distribution (GPD)
jrothenb – 9 Properties of GPD
jrothenb – 10 The Empirical Mean Excess Function The empirical mean excess function of a GPD with
jrothenb – 11 Modeling Extreme Events: The number of exceedances of a high threshold follows a Poisson process (iid exp. distributed interarrival times) Excesses over a high threshold can be modeled by a GPD An appropriate value of the high threshold can be found by plotting the empirical mean excess function. The distribution of the maximum of a Poisson number of iid excesses over a high threshold is a GEV with the same shape parameter as the corresponding GPD.
jrothenb – 12 Extremal Index of a Stationary Time Series The extremal index measures the dependence of the data in the tails. can be interpreted as the average cluster size in the tails: High values appear in clusters of size means there is no clustering in the tails. If the data does not show strong long range dependence, but has extremal index, its maxima has distribution, where H is the GEV of iid data with the same marginal distribution. GPD analysis may not be appropriate for data with
jrothenb – 13 The Data: Surveyor Project One way delays of probe packets during one week Packets sent according to a Poisson process with a rate of 2/sec Packet is time-stamped to measure delay If delay >10 sec, packet assumed lost, discarded Saturday and Sunday excluded for analysis More details:
jrothenb – 14 Time-Series Plot Colorado-Harvard Monday 12:00am - Friday 8:00 pm
jrothenb – 15 ACF and Ex. Index Estimation
jrothenb – 16 Empirical Mean Excess Function
jrothenb – 17 Estimation of Shape Parameter as a function the used threshold using GPD
jrothenb – 18 Result of the GPD Fit
jrothenb – 19 Fit of a GPD-Distr. for Colorado-Harvard threshold = Quantile of threshold = Number of exceedances = 500 Parameter estimates and Standard Errors xi beta
jrothenb – 20 Estimations based on GPD Fit p quantile sfall empirical quantile
jrothenb – 21 Quantile estimation as a function of the threshold Empirical quantile % Estimate
jrothenb – 22 Fitting a GEV to block wise maxima Block 1 Block 2 Block 3 Block 4 Block 5
jrothenb – 23 GEV-Fit Results for different Block sizes Block size = 7200 : 108 Blocks xi sigma mu Estimation Std. Error Block size = : 54 Blocks xi sigma mu Estimation Std. Error
jrothenb – 24 High Level Estimation Level exceeded during 1 of 50 hours Block size LowerEstimateUpper 1h h h Level exceeded during 1 of 100 hours Block size LowerEstimateUpper 1h h h
jrothenb – 25 Does GPD always work? The Army-Lab. – Univ. of. Virginia dataset Time Series Plot ACF Plot, Lags:5-1000PACF Plot, Lags: ACF Plot, Lags:1-1000
jrothenb – 26 What goes wrong beyond the LRD: Empirical Mean Excess Function Shape Parameter
jrothenb – 27 Non-Stationarity: Harvard to Army- Lab. Time Series Plot: Monday 12 am – Friday 8 pm
jrothenb – 28 Pick a few hours per day! Mean Excess Plot 11am – 4pm Mon - Fri Empirical Tail Distr. Shape Parameter Estimation
jrothenb – 29 Single Outlier: Virginia - Harvard Empirical Tail Distr. ACF, Lag Estimation of Extremal Index Monday 12am – Friday 8pm
jrothenb – 30 The effect of the outlier on GEV Fit Without outlier: Block size = Blocks xi sigma mu x1= Fit With outlier Block size = Blocks xi sigma mu x1=
jrothenb – 31 The effect of the single outlier on GPD: Analysis with outlier Analysis without outlier
jrothenb – 32 Conclusions: The GPD is a model that can be fitted to the tails of a distribution. The quality of the fit can be checked with various methods. From the model, we can gain quantile estimates at the edge of or outside the data range. However, a good fit is often not possible. The GEV provides a model for the distribution of block wise Maxima. Its use is supported by EVT for stationary time series without strong LRD, while GPD is only supported in the iid case. The quality of fit can be checked with similar tools as in the GPD model. Certain problems remain, and reliable quantile estimates are not available.
jrothenb – 33 Acknowledgements: Applied Research Group at Telcordia: – E. van den Berg – K. Krishnan – J. Jerkins – A. Neidhardt – Y. Chandramouli Cornell University: – Prof. G. Samorodnitsky