Presentation is loading. Please wait.

Presentation is loading. Please wait.

USE OF BASIC STATISTICS IN INSURANCE PRICING AND RISK ASSESSMENT

Similar presentations


Presentation on theme: "USE OF BASIC STATISTICS IN INSURANCE PRICING AND RISK ASSESSMENT"— Presentation transcript:

1 USE OF BASIC STATISTICS IN INSURANCE PRICING AND RISK ASSESSMENT
March 2017

2 Agenda Statistical concepts relating to insurance: (1) Risk data (2) Presentation of risk data (3) Statistical measurement (4) Deriving probability (5) Probability distributions (6) Regression and correlation Practical examples to illustrate the above concepts

3 (1) Risk data

4 (i) using published data
Risk data Database Use existing database, or Create a database by: (i) using published data (ii) gathering data - direct observation - interviews - experiments - questionnaires

5 Risk data Compare: Population Stratified sampling Random sampling

6 (2) Presentation of risk data

7 Presentation of risk data
TABLES Simple tables Compound tables Relative figures Frequency and severity Effect of mix change

8 Presentation of risk data
Claims Analysis: a comparison of frequency and severity Cost (£) Frequency % Severity % 0<1, , % £3.00m % <2, , % £4.25m % <3, , % £3.10m % <4, % £1.75m % <5, % £1.45m % Total , % £13.55m %

9 Presentation of risk data
Effect of mix change It is important to look at the elements which make up a total, not just the total itself

10 Presentation of risk data
GRAPHS Single line graphs Multiple line graphs X-Y graphs

11 Presentation of risk data
BAR CHARTS Simple bar charts Multiple bar charts Component bar charts

12 Presentation of risk data
PIE CHARTS Simple pie charts Enhanced pie charts PICTOGRAMS

13 FREQUENCY DISTRIBUTIONS
Presentation of risk data FREQUENCY DISTRIBUTIONS Unordered array Ordered array Frequency distribution Group frequency distribution

14 Presentation of risk data
To prepare a grouped frequency distribution from raw data: the highest and lowest values (the range) decide upon the number of classes (usually 4, 5 or 6) divide the entire range by the number of classes decide how to display the data (continuous or discrete; size of classes) allocate the individual values to the classes

15 Presentation of risk data
Discrete Variables can only be whole numbers, fractions are impossible. Therefore they can be represented by class limits of 0-9; 10-19; and so on. Continuous Variables can be any fraction of a number, represented by class limits 0<10, 10<20 and so on.

16 Presentation of risk data
Relative Frequency the percentage of recorded figures in each class as a measure of the total number of raw data items. Cumulative frequency the addition of each frequency class to the total of its predecessors. Compare “less than” and “more than” Histogram the area represents the actual values for each frequency distribution. Horizontal axis (continuous scale) records class limits, vertical axis records frequency values

17 Presentation of risk data
Relative and Cumulative Frequencies Cost Frequency Relative Cumulative Frequency Frequency 0< % 20 40 150< % 30 20 300< % 35 10 450< % 600< % Total %

18 Presentation of risk data
Ogives are graphs of cumulative frequency distributions horizontal axis represents class limits vertical axis represents cumulative frequency Frequency polygon add a class interval at each end of the histogram mark the mid points at the tops of each of the distribution rectangles join the mid points with a straight line the area of a frequency polygon = area of the histogram Frequency curve is a free-hand drawing of shape of the curve

19 (3) Statistical measurement

20 Statistical measurement
Location - where the data is placed in the whole span of possible data variables available Dispersion - the extent to which individual items in a set of values differ from each other in magnitude Skew - the clustering of the data around the mean, median and mode

21 Statistical measurement
LOCATION Arithmetic Mean Geometric Mean Median Mode

22 Statistical measurement
Arithmetic mean for list of values Add up all the data items and dividing by the number of items Arithmetic mean = x n

23 Statistical measurement
Arithmetic mean for frequency distribution e.g. time it takes to settle a claim in ABC Insurance plc Time in days Number of claims x f fx ,000 ,365 ,040 f = fx = 4,140 Arithmetic mean = fx = 4,140 = 23 days f

24 Statistical measurement
Arithmetic mean for grouped data Travel claims costs £ No 150<160 10 160<170 20 170<180 15 180<190 12 mid-pts f x fx 150< 160< 170< 180< _ 57 9,695 So x = f x = 9,695 = £170.09 f

25 Statistical measurement
Important Features of the Arithmetic mean Most commonly used form of average Involves all values in a distribution Advantage: completeness Disadvantage: easily distorted by extreme values. To avoid this disadvantage, use the median or the mode Used to work out the standard deviation Can produce ‘impossible values’ in the case of discrete data (example: average of 2.1 children in a family) Not applicable where relative changes are being averaged. In such cases the geometric mean must be used.

26 Statistical measurement
Geometric mean Used when relative changes in one variable are being averaged

27 Statistical measurement
Median for list of values (1) Arrange the data values in numerical order (2) For odd-numbered array of values, take the middle number (3) For even-numbered array of values, take the two middle values and calculate the arithmetic mean

28 Statistical measurement
Median for a frequency distribution No. of clerks Frequency Cumulative Frequency 29 Median number of clerks is 29

29 Statistical measurement
Median for grouped data Claims costs (£) Frequency Cumulative Frequency 100

30 Statistical measurement
Finding the median for grouped data Find the class having the median value in it Decide on the class boundaries Find out how far in to the class you must go (consult the cumulative frequency distribution) Determine the class interval and multiply this by the distance you wish to move into it Add this to the lower boundary of the class See formula

31 Statistical measurement
Mode for list of values The most frequently occurring value Mode for grouped data Mode = mean – 3(mean – median)

32 Statistical measurement
DISPERSION Important to know the distribution or variance of the data values around the location Range the difference between the highest and lowest values. Quartile deviation The difference between the top and bottom quarters of the values indicates the inter-quartile range. Divide by 2 to get quartile deviation.

33 Statistical measurement
Standard Deviation the most satisfactory measure of distribution as it uses all the values Coefficient of variation used to compare the relative variability, or dispersion, of two or more sets of figures

34 Statistical measurement
Standard deviation For a simple list: _ s= (x- x)2 n For grouped data (adapted for midpoints) s= f(x- x)2 f s= fx f x 2 f f

35 Statistical measurement
SKEW Distribution with many small values (example: claims under household policies) Symmetrical Distribution: Distribution with many large values (example: claims aviation insurance) The difference between mean and median can be used to measure Skew: Skew = 3 (mean - median) standard deviation

36 Statistical measurement
Normal distribution a frequency distribution which shows a symmetrical curve peaking at the centre. Mean median and mode coincide Positive Distribution the peak lies to the left of centre; the mean is dragged to the right due to high outliners; the mode is the peak of the distribution Negative Distribution the peak lies to the right of centre; the mean has been dragged to the left due to very low outliners; Pearson’s Coefficient measures the degree of skew – I.e. how far of Skewness the mean is from the mode; the lower the value calculated, the nearer the data is to a normal distribution; formula automatically gives the direction of skew mean mode

37 (4) Deriving probability

38 Deriving probability PROBABILITY
The contributions of the many will be used to pay for the losses of the few To know how much each insured must contribute to the pool, we must know what the likely losses are going to be Probability theory is a formal mechanism for measuring likelihood

39 Deriving probability Deriving probabilities
A priori probability is one that applies when all the possible outcomes of the event are known before the event occurs Relative Frequency based upon empirical (historical) information of what has happened in the past Subjective probability deal with occurrences that have not happened at all before or very infrequently

40 Deriving probability Probability rules
Alternative Events P(A or B) – “additions” rule (1) Events mutually exclusive (2) Events not mutually exclusive - need to remove the aspect of double counting (best visualised by Venn Diagrams) Joint Events P(A and B) – “multiplication” rule (1) Independent events (2) Dependent events

41 Deriving probability Probability rules Alternative events:
Mutually exclusive P(A or B) = P(A) + P(B) Not Mutually exclusive P(A or B) = P(A) + P(B) - P(A and B) Joint Events: Independent P(A and B) = P(A) x P(B) Dependent P(A and B) = P(A) x P(B/A)

42 (5) Probability distributions

43 Probability distributions
Expected value When tossing a coin 100 times you would expect to get 50 heads because ½ times 100 = 50 When rolling a die 60 times you would expect 10 sixes because 1/6 times 60 = 10  Expected value E = P(x).x In frequency distributions the expected values of the various outcomes have to be added to find the total: E = P(x).x

44 Probability distributions
Expected Frequency Number Number Probability Expected number of of thefts of shops distribution thefts per shop x P(x) P(x).x 1,

45 Probability distributions
Law of large numbers The actual number of events occurring will tend towards the expected number where there are a large number of similar situations

46 Probability distributions
Expected severity Cost Frequency Probability Midpoint Expected per theft (£) distribution cost loss per theft P(x) x P(x).x 0< 300< 600< 900<1, 1,200<1,

47 Probability distributions
Premium calculation: Expected number of thefts per shop: 0.45 Expected loss per theft: £315.00 Pure premium per shop: £315 x 0.45 = £141.75

48 Probability distributions
NORMAL DISTRIBUTION Symmetrical bell shaped curve Mean under the apex, coincides with mode and median Tails never touch the horizontal Specific areas around the mean can be measured

49 Probability distributions
The width of the bell depends upon the actual spread of the distribution of data measured by the Standard Deviation. Data with the same mean but different Standard Deviations will produce different curves Knowing the mean and standard distribution for different groups of data enable comparisons to be made between them and a normal distribution curve

50 Probability distributions
Percentages under the normal curve 34.13% 13.59% 2.15% 0.13% -3 -2 -1 +1 +2 +3 68.26% 95.44% 99.74% 34.13% 34.13% 13.59% 13.59% 2.15% 2.15% 0.13% 0.13% -3 -2 -1 +1 +2 +3 68.26% 95.44% 99.74%

51 (6) Regression and correlation

52 Regression and correlation
Regression indicates what kind of relationship there is between dependent and independent variables. Correlation measures the strength of that relationship

53 Regression and correlation
y = a + bx Where y is the dependent variable a is the where the line crosses the vertical axis b is the gradient of the line (regression coefficient) x is the independent variable a and b are constant

54 Regression and correlation
Regression formulae Line of best fit y = a + bx _ _ xy – nxy Where b = _ x2 - nx2 and a = y - bx n

55 Regression and correlation
The equation y = a+bx is true for all straight lines. By calculating values of ‘a’ and ‘b’ from the data provided for ‘x’ and ‘y’, the line that minimises the total of the squared deviations of all points from it will be established. This is the line calculated according to the least squares method. Regression coefficient, the value of ‘b’, illustrates the nature of the relationship positive value for ‘b’ shows that as ‘x’ gets bigger so does ‘y’ negative value show that as ‘x’ gets bigger ‘y’ gets smaller Can predict values for ‘y’ from any value of ‘x’ between the limits of ‘x’ provided in the data only

56 Regression and correlation
Correlation = a measure of the strength of the relationship between two variables Coefficient of determination r2 = explained variation total variation expressed as a percentage Correlation coefficient r = square root of r2 figures between +1 and -1

57 Regression and correlation
= how much of the variation in the predicted variable is due to its relationship with the other variable

58 Regression and correlation
Correlation coefficient (r) or -1  r  1 i.e. r is between -1 and 1

59 Regression and correlation
Coefficient of determination Measures how much of the variation in the predicted variable is due to its relationship with the other variable Expressed as a percentage Low values suggest that the relationship is so low that it is not worth considering Correlation coefficient Assess on scale +1 through zero to -1 The closer ‘r’ is to 1, the closer the relationship between the variables but it does NOT mean causation A score of +1 represents a perfect positive correlation - increases in ‘x’ result in increases in ‘y’ A score of -1 represents a perfect negative correlation - increases in ‘x’ result in decreases in ‘y’ A score of 0 means there is no correlation - changes in one variable have no effect on the other

60 Regression and correlation


Download ppt "USE OF BASIC STATISTICS IN INSURANCE PRICING AND RISK ASSESSMENT"

Similar presentations


Ads by Google