Reserve Ranges, Confidence Intervals and Prediction Intervals Glen Barnett, Insureware David Odell, Insureware Ben Zehnwirth, Insureware.

Slides:



Advertisements
Similar presentations
Multiple Regression Analysis
Advertisements

The Simple Regression Model
©Towers Perrin Emmanuel Bardis, FCAS, MAAA Cane Fall 2005 meeting Stochastic Reserving and Reserves Ranges Fall 2005 This document was designed for discussion.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Best Estimates for Reserves Glen Barnett and Ben Zehnwirth or find us on
Reserve Risk Within ERM Presented by Roger M. Hayne, FCAS, MAAA CLRS, San Diego, CA September 10-11, 2007.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Regression Analysis, Chapter 13,
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
More information will be available at the St. James Room 4 th floor 7 – 11 pm. ICRFS-ELRF will be available for FREE! Much of current discussion included.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
When can accident years be regarded as development years? Glen Barnett, Ben Zehnwirth, Eugene Dubossarsky Speaker: Dr Glen Barnett Senior Research Statistician,
Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Reinsurance of Long Tail Liabilities Dr Glen Barnett and Professor Ben Zehnwirth.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.
Best Estimates for Reserves Glen Barnett and Ben Zehnwirth.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Estimation and Application of Ranges of Reasonable Estimates Charles L. McClenahan, FCAS, MAAA 2003 Casualty Loss Reserve Seminar.
Chapter 8: Simple Linear Regression Yang Zhenlin.
ANOVA, Regression and Multiple Regression March
Last Study Topics 75 Years of Capital Market History Measuring Risk
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Modelling Multiple Lines of Business: Detecting and using correlations in reserve forecasting. Presenter: Dr David Odell Insureware, Australia.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Stats Methods at IC Lecture 3: Regression.
The simple linear regression model and parameter estimation
Sampling Distribution Models
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Presentation transcript:

Reserve Ranges, Confidence Intervals and Prediction Intervals Glen Barnett, Insureware David Odell, Insureware Ben Zehnwirth, Insureware

Summary Uncertainty and variability are distinct concepts fundamental difference between confidence interval and prediction interval. When finding a CI or PI, assumptions should be explicit, interpretable, testable - related to volatility in the past. Differences between CI & PI explained via simple examples and then using loss triangles in the PTF modeling framework loss triangle regarded as sample path from fitted probabilistic model An identified optimal model in the PTF framework describes the trend structure and volatility about it succinctly – the "four pictures" model predicts lognormal distributions for each cell + their correlations, conditional on explicit, interpretable assumptions related to past volatility Immediate benefits include percentile and tables for total reserve and aggregates, by calendar year and accident year.

Variability and Uncertainty -different concepts; not interchangeable "Variability is a phenomenon in the physical world to be measured, analyzed and where appropriate explained. By contrast uncertainty is an aspect of knowledge." Sir David Cox

Variability and uncertainty Process variability is a measure of how much the process varies about its mean – e.g.  2 (or  ) Parameter uncertainty is how much uncertainty in some parameter estimate (e.g. var(µ) or s.e.(µ) ) or function of parameter estimates (say for a forecast mean – “uncertainty in the estimate”) Predictive variability is (for most models used) the sum of the process variance and parameter uncertainty ^^

Example: Coin vs Roulette Wheel Coin 100 tosses fair coin (#H?) Mean = 50 Std Dev = 5 CI [50,50] "Roulette Wheel" No. 0,1, …, 100 Mean = 50 Std Dev = 29 CI [50,50] In 95% of experiments with the coin the number of heads will be in interval [40,60]. In 95% of experiments with the wheel, observed number will be in interval [2, 97]. … Where do you need more risk capital? Introduce uncertainty into our knowledge - if coin or roulette wheel are mutilated then conclusions could be made only on the basis of observed data.

- similar thing with wheel (more complex) Parameter uncertainty increases width of prediction interval Example: Coin vs Roulette Wheel Coin 100 tosses Mean = ? Std Dev = ? CI [?,?] can toss coin 10 times first (5 heads –> est. mean 50) "Roulette Wheel" No. 0,1, …, 100 Mean = ? Std Dev = ? CI [?,?] Process variability cannot be controlled but can be measured

A basic forecasting problem Consider the following simple example – n observations Y 1... Y n ~ N(µ,  2 ) Y i = µ +  i  i ~ N(0,  2 ) Now want to forecast another observation... (Actually, don’t really need normality for most of the exposition, but it’s a handy starting point.) iid

A basic forecasting problem Y n+1 = µ +  n+1 Y n+1 = µ +  n+1 µ known: known Y n+1 = µ + 0 Variance of the forecast is Var(µ) + Var(  n+1 ) =  2 0 forecast of the error term ^^ ^ ^

A basic forecasting problem Next observation might lie down here, or up here. Similarly for future losses: may be high or low. The risk to your business is not simply from the uncertainty in the mean – is related to the amount you will pay, not its mean. (you're forecasting a random quantity)

A basic forecasting problem - Even when mean is known exactly, still underlying process uncertainty (- with 100 tosses of a fair coin, might get 46 heads or 57 heads etc). -Except with modeling losses you design a model to describe what's going on with the data. -It doesn't really make sense to talk about a mean (or any other aspect of the distribution) in the absence of a probabilistic model. [If you don't have a distribution, what distribution is this "mean" the mean of?]

- assumptions need to be explicit so you can test distribution is related to what's going on in the data - don’t want to use coin model if your data is actually coming from a roulette wheel! even worse if you don't get the mean right Of course, in practice we don't know the mean – we only have a sample to tell us about it.

Let's make a different assumption to before – now we don't know µ - we’ll have an estimate of the future mean based – through our model – on past values - but estimate of the future mean is not exactly the actual mean, even with a great model (random variation) - the estimate is uncertain (can estimate the uncertainty - if the model is a good description)

- So we will have an estimate of the mean and we'll also have a confidence interval for the mean. - interval designed so that if we were able to rerun history (retoss our coin, respin our roulette wheel), many times, the intervals we generate will include the unknown mean a given fraction of the time But that probability relies on the model... if the model doesn't describe the data, confidence interval is useless (won't have close to required probability coverage)

Confidence Interval for Mean of Coin Model (100 tosses) with a small sample (10) 10 tosses, 5 heads. => p = ½ µ = 100p Var(µ) = Var(p) = ½ ½ /10= % CI for µ = 50  1.96x15.8 ~= (19,81) ^ ^ ^

Let’s look at some real long-tail data. Has been inflation-adjusted and then normalized for a measure of exposure (exposures are fairly stable, so looks similar)

- No trends in the accident year direction - Calendar years are sufficiently stable for our current purpose (one year is a bit low – it could be omitted if preferred, but we will keep it) Note a tendency to “clump” just below the mean – more values below mean: skewness

On the log-scale this disappears, and the variance is pretty stable across years: (many other reasons to take logs) Skewness is removed – values much more symmetric about center

Consider a single development - say DY 3: ML estimates of µ,  : and (note that the MLE of  is s n, with the n denominator, not the more common s n-1 ). Normalized Logs 1, , , , , , ,

We don’t know the true mean. Assuming random sample from process with constant mean, predict mean for next value as without some indication of its accuracy, not very helpful: 95% confidence interval for µ: (7.034, 7.345)

Now we want to look at a prediction. Want to predict a random outcome where we don't know the mean (assuming variance known here, but generally doesn’t change prediction intervals a great deal) Note with coin tossing/roulette wheel experiment, almost never pay the mean (50). - don't pay mean, pay the random outcome - risk posed to the business is NOT just the uncertainty in the mean.

To understand your business you need to understand the actual risk of the process, not just the risk in the estimate of the mean. Want to forecast another observation, so let’s revisit the simple model: Y n+1 = µ +  n+1 ^ ^ ^ forecast of the error term

µ UNKNOWN: So Y n+1 = µ + 0 Variance of forecast = Var (µ) + Var(  n+1 ) =  2 /n +  2 (in practice you replace  2 by its estimate) ^ ^

So again, imagine that the distributions are normal. The next observation might lie – down here, or up here. (implications for risk capital) Distribution of µ used for CI - relates to “range” for mean Fitted distribution Prediction interval for µ unknown – relates to “range” for future observed ^

predictive variance = Var(µ +  n+1 ) = Var(µ) + Var(  n+1 ) ( = parameter uncertainty + process var.) (NB variances add if independent, additive errors) –– Alternative way to look at it: prediction error = Y n+1 – Y n+1. Predictive variance = var(prediction error) = var(Y n+1 -Y n+1 ) =  2 + var(µ) ^ ^ ^ ^

Returning to example: 95% prediction interval for Y n+1 is (6.75,7.63): CI

Parameter uncertainty can be reduced - more data reduces parameter uncertainty (more than 10 tosses of the coin in pre-trial). In some cases you can go back and get more loss data (e.g. dig up an older year) – or eventually you'll have another year of data But process variability doesn't reduce with more data - an aspect of the process, not knowledge

Note that nearby developments are related: If DY 3 was all missing, you could take a fair guess at where it was! So in this case we do have much more data!

Need a model to take full advantage of this. Even just fitting line through DY2-4 has a big effect on the width of the confidence interval: Only changes the prediction interval by ~2%. So calculated hardly changes Log-Normalized vs Dev Year PI CI

But that prediction interval is on log scale. To take a prediction interval back to the normalized-$ scale, just back-transform the endpoints of the PI To produce a confidence interval for the mean on the normalized-$ scale is harder (can’t just backtransform limits on CI – that’s an interval for the median) not a particularly enlightening bit of algebra, so we’re leaving the derivation out here

There are some companies around for whom (for some lines of business) the process variance is very large. - some have a coefficient of variation near 0.6. [so standard deviation is > 60% of the mean] That's just a feature of the data. - May not be able to control it, but you sure need to know it.

Why take logs? -tends to stabilize variance -multiplicative effects (including economic effects, such as inflation) become additive (percentage increases or decreases are multiplicative) - exponential growth or decay → linear - skewness often eliminated distributions tend to look near normal - Using logs a familiar way of dealing with many of these issues (standard in finance) NB for these to work have to take logs of incremental paid, not cumulative paid.

If we graph the data for an accident year against development year, we can see two trends. e.g. trends in the development year direction xxxxxxxxxxxxx x x x x x x x x xx x x x Probabilistic Modelling of trends

x x x x x x x x xx x x x Could put a line through the points, using a ruler. Or could do something formally, using regression. xxxxxxxxxxxxx Probabilistic Modelling Variance =

 The model is not just the trends in the mean, but the distribution about the mean x x x x x x x x xx x x x ( Data = Trends + Random Fluctuations ) Models Include More Than The Trends Introduction to Probabilistic Modelling (y – ŷ)

o o o o o o o o o o o o o  Simulate “new” observations based in the trends and standard errors x x x x x x x x xx x x x Simulating the Same “Features” in the Data Introduction to Probabilistic Modelling  Simulated data should be indistinguishable from the real data

- Real Sample: x 1,…, x n - Fitted Distribution fitted lognormal - Random Sample from fitted distribution: y 1,…, y n What does it mean to say a model gives a good fit? e.g. lognormal fit to claim size distribution Model has probabilistic mechanisms that can reproduce the data Does not mean we think the model generated the data y’s look like x’s: —

PROBABILISTIC MODEL Real Data S1 S2 Simulated triangles cannot be distinguished from real data – similar trends, trend changes in same periods, same amount of random variation about trends S3S3 Based on Ratios Models project past volatility into the future Trends+ variation about trends

Trends in three directions, plus volatility “picture” of the model

Testing a ratio model Since the correctness of an interval (“range”) depends on the model, it’s necessary to test a model’s appropriateness Simple diagnostic model checks exist Ratio models can be embedded in regression models, and so we can do more rigorous testing – extend this to a diagnostic model

X = j-1 Y = j  Link Ratios are a comparison of columns j-1 j y x  We can graph the ratios of Y:X - line through O? y/x y x y x ELRF (Extended Link Ratio Family) x is cumu. at dev. j-1 and y is cum. at dev. j Using ratios => E(Y|x) =  x

Mack (1993) Chain Ladder Ratio ( Volume Weighted Average ) Arithmetic Average

Intercept (Murphy (1994)) Since y already includes x: y = x + p Incremental Cumulative at j at j -1 Is b -1 significant ?Venter (1996)

Use link-ratios for projection Abandon Ratios - No predictive power j-1 j } p x x x x x x x p CumulativeIncremental

Is assumption E(p | x ) = a + (b-1) x tenable? Note: If corr(x, p) = 0, then corr((b-1)x, p) = 0 If x, p uncorrelated, no ratio has predictive power Ratio selection by actuarial judgement can’t overcome zero correlation. p x Corr. often close to 0 -Sometimes not. Does this imply ratios are a good model? - ranges?

With two associated variables, tempting to think X causes changes in Y. However, may have another variable impacting both X and Y in a similar way, causing them move together: Here X is a noisy proxy for N; N is a better predictor of Y (sup. inflation, exposures) X Y X Y N

Condition 1: Condition 2: j-1 j } p j-1 j CumulativeIncremental w p w x x x x x x x x p

Now Introduce Trend Parameter For Incrementals p p vs acci. yr, not previous cumulative NB: diagnostic model, not predictive model

Condition 3: Incremental Review 3 conditions: Condition 1: Zero trend Condition 2: Constant trend, positive or negative Condition 3: Non-constant trend

10 d t = w+d Development year Calendar year Accident year w Trends occur in three directions: Future Past Probabilistic Modelling

M3IR5 Data d d PAID LOSS = EXP(alpha - 0.2d) -0.2 alpha =

Axiomatic Properties of Trends Probabilistic Modelling

Sales Figures

Resultant development year trends

Discussion of some interactive analysis examples (ELRF vs PTF) Summary of highlights: Two triangles - ABC - LRHigh

ABC: summary – chain ladder PIs for next year seem far too low – doesn’t even predict last diagonal:

Outside 95% prediction interval ABC: one-step-ahead prediction of last diagonal (DYs 1-9) All values well above mean pred. - most above 97½%ile Why is this? At high end of “range”

ABC Chain ladder fit – residuals vs. calendar year change in cal. trends! (missing feature) Can we use ratios from last few years? No good – recent ratios still include old incrementals

ABC PTF fit: Model calendar year trends Can we predict last year? (not fitted) Little high but most points well inside P.I. (what inflation next year?)

further example (ELRF vs PTF) - LR High (chain ladder PIs much too high )