H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest,

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Resampling techniques
Maximum likelihood (ML) and likelihood ratio (LR) test
Visual Recognition Tutorial
Mixed models Various types of models and their relation
Linear and generalised linear models
Inferences About Process Quality
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Principles of Pattern Recognition
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Statistical Decision Theory
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Lecture 9. If X is a discrete random variable, the mean (or expected value) of X is denoted μ X and defined as μ X = x 1 p 1 + x 2 p 2 + x 3 p 3 + ∙∙∙
Bayesian Analysis and Applications of A Cure Rate Model.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Some Common Discrete Random Variables. Binomial Random Variables.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Discrepancy between Data and Fit. Introduction What is Deviance? Deviance for Binary Responses and Proportions Deviance as measure of the goodness of.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Surveying II. Lecture 1.. Types of errors There are several types of error that can occur, with different characteristics. Mistakes Such as miscounting.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Conditional Expectation
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Bayesian Semi-Parametric Multiple Shrinkage
Visual Recognition Tutorial
Parameter Estimation 主講人:虞台文.
Classification of unlabeled data:
Sample Mean Distributions
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
The normal distribution
Mathematical Foundations of BME Reza Shadmehr
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Example Human males have one X-chromosome and one Y-chromosome,
Significant models of claim number Introduction
Lecture 7 Sampling and Sampling Distributions
Mathematical Foundations of BME
Learning From Observed Data
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest, Hungary

The basis for locally dependent premiums  Companies apply spatially dependent premiums for various types of insurances. More risky customers should pay more, but how to determine the dependence of risk on location?  We analyse third party liability motor insurances data for a certain company in Hungary. Only claim frequency is considered in this talk, claim size needs different models. So the occurrence of claims constitues the risks for the present talk.  An insurance company may not want to set its premium rating changing from locality to locality, but it has to know how much discrepancy is resulted from smoothing ie. aggregating for larger regions – customers are very sensitive for “unjustly set” rates.

Information from the neighbourhood to be used Information from the neighbourhood to be used  Only the capital Budapest is large enough for reliable direct risk estimation.  In a village with 2 contracts 1 occurring claim increases dramatically the estimated risk, but not the true one.  What to do with localities with no contract  The spatial risk component cannot be estimated from the local experience alone, in addition the information available in the neighbourhoods has to be accounted for. But what to call neighbourhoods?  Being aware of its shortcomings, we choose all the localities within 15 km aerial distance to be neighbours of a given locality.

The inhomogeneous spatial Poisson process  Suppose the claim frequency Z j of the j-th individual contract to be distributed by Poisson law.  Its Poisson intensity parameter depends on the exposure time τ j (the time spent in risk), which is known to us as data.  Furthermore the intensity depends some other risk factors characterising the contract (such as car type, age etc.).  Finally the intensity parameter depends on the location where the contract belongs to.  Suppose in addition that interdependence among claim frequencies is created solely through the intensity parameters, i.e. Z j -s are conditionally independent given the values of the intensities.  Our final assumption is that the effects of the exposure time, risk factors and location are multiplicative on the intensity.

Contract-level model  So we end up with Z j distributed as Poisson( ·  j ·τ j ·e  i ) with the average intensity or common claim frequency, the risk factor effect  j, exposure τ j and the spatial risk parameter e  i.  The additional risk factors are  car type (30)  -gender (3 male, female, company)  -age group (6)  -population size (10)  For the first instance suppose and all e  i –s to be equal to 1. Then  j -s are easily estimable by a generalised linear model.  Introducing now  j ·τ j as the modified exposure (denoted by τ j * ), we can build a model for the claim frequencies at locations and estimate the spatial risk parameter e  i.

Location-level model  By virtue of the conditional independence, the claim frequency Y i at the i-th location will be distributed as Poisson( ·∑τ j * ·e  i ), where the summation goes over all contracts belonging to location i.  In this model we consider ∑τ j * as given (“observed” data), even though it contains estimated components, and denote it by t i.  After estimating e  i it is possible to return to the contract level and reestimate the effects of the risk factors and iterate this procedure.  Remarkable that stability can be reached within a few steps.

The hierarchical Bayesian model  Let us introduce some further notations:  Y i : number of claims, t i : modified exposure time, θ i : risk factor at the i-th location, i = 1, 2, …, N, λ: common claim frequency  A: neighbourhood matrix  ρ: parameter of the covariance  p, q, α, β: Bayesian parameters  The claim frequency follows a non-homogeneous Poisson process. That is, Y i -s are independent Poisson ( ·t i ·e  i ) distributed random variables, given λ and Θ i.  On the second level of model hierarchy suppose the spatial parameters Θ j -s to be normally distributed with the covariance matrix Σ=(I-ρA) -1, depending on the neighbourhood matrix A

 We must keep the covariance matrix Σ positive definite, therefore suppose the following prior on the parameter and ρ  p and q are conditional on ρ prescribing the expectation and variance as p/(p+q)= ρ and ρ 2 (1-ρ)/(p+ρ)=  2  We have to take care of the update of ρ, since this distribution is not symmetric. R’s xbeta function helps to compute the correction for the posterior ratio  Under these assumptions the posterior can be computed as

 From here we have the form for the log-posterior as  For λ the computation of the maximum likelihood estimator, conditional on ρ and Θ is possible, as  For ρ and Θ Metropolis-Hastings update is needed

 The problem is that Θ is a 3111 long vector  Updating the posterior requires the computation of a quadratic form with a 3111x3111 matrix, at each coordinate of the 3111 long vector  So it is clearly paralysing step even on a very fast computer, even if trying to factorise the matrix into a full rank diagonal plus a sparse matrix  We used the following updating rule  Propose in all the coordinates, one by one, and compute the increment between the present logposterior and the one-coordinate-update. (By not updating the logposterior, we can use vector operation instead of cycles which is a lot faster)  Determine on this basis those coordinates where to accept the proposal  Update the logposterior  Update the other parameters  In these steps the logposterior is updated sequentially  update of Θ (with cca. 80 % acceptance) and update for ρ and λ is possible in about 2 hours running time on a PC.

Convergence of the parameters acceptance ratio λ : 34,4%, ρ: 18.6%, Θ: 74.1% means λ : , ρ: ,  : 0.344

Updates of logposteriors

 By estimating λ we can compare the expected number of claims to the observed ones  There are other risk factors than location, that have to be accounted for, but suppose the opposite for a moment.  When expected observed compute the probability of sample domination P(Y j  y j )  Plot these probabilities on a map – this is the so called probability map, measuring the inhomogeneity of the Poisson process

Probability map of claims, based on exposure time  There are further risk factors, like age of the policyholder, car type (ccm), or population size of the locality, etc.  A simple general linear model can be used for adjusting for these risk factors, but even then, a probability map clearly shows a spatial inhomogeneity in the remaining risks Other risk factors

Comparison of observed and expected  The expected claim frequency has to be compared to the observed one  and the probability map can be drawn  Clearly the residuals are almost equally likely everywhere

Premium clustering based on the spatial structure