Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions.

Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions 4.3 Best linear unbiased predictors (BLUPs) 4.4 Mixed model predictors 4.5 Bayesian inference 4.6 Case study: Forecasting lottery sales 4.7 Credibility Theory Appendix 4A Linear unbiased predictors

4.1 Estimators versus predictors In the longitudinal data model, y it = z it ´  i + x it ´  +  it, the variables {  i } describe subject-specific effects. Given the data {y it, z it, x it }, in some problems it is of interest to “summarize” subject effects. –We have discussed how to estimate fixed, unknown parameters. –It is also of interest to summarize subject-specific effects, such as those described by the random variable  i. Predictors are “estimators” of random variables. –Like estimators, predictors are said to be linear if they are formed from a linear combination of the response y.

Applications of prediction In animal and plant breeding, one wishes to predict the production of milk for cows based on (1) their lineage (random) and (2) herds (fixed) In credibility theory, one wishes to predict expected claims for a policyholder given exposure to several risk factors In sample surveys, one wishes to predict the size of a specific age-sex-race cohort within a small geographical area (known as “small area estimation”). In a survey article, Robinson (1991) also cites (1) ore reserve estimation in geological surveys, (2) measuring quality of a production plan and (3) ranking baseball players abilities.

4.2. Prediction for one-way ANOVA models Consider the traditional one-way random effects ANOVA (analysis of variance) model: y it =   +  i +  it –Suppose that we wish to summarize the subject-specific conditional mean,   +  i. For contrast, first consider using the fixed effects model with    –Here, we have that is the “best” (Gauss-Markov) estimate of  i. –This estimate is unbiased, that is, E =  i. –This estimate has minimum variance among all linear unbiased estimators (BLUE).

Shrinkage estimator Using the one-way random effects model. –Consider an “estimator” of   +  i that is a linear combination of and, that is, for constants c 1 and c 2. Calculations show that the best values of c 1 and c 2 that minimize are c 2 = 1 – c 1 and For large n, we have the shrinkage estimator, or predictor, of   +  i to be, where

Example of shrinkage estimator Hypothetical Run Times for Three Machines Machine Run Times Average Run Time 1 14, 12, 10, 12 1 = 12 2 9, 16, 15, 12 2 = 13 3 8, 10, 7, 7 3 = 8 –Notation: y ij means the jth run from the ith machine. –For example, y 21 = 9 and y 23 = 15. Are there real differences among machines?

To see the “shrinkage” effect, consider Figure 4.1 Comparison of Subject-Specific Means to Shrinkage Estimators. 8131211 11.82512.650 8.525 Example - Continued

More on shrinkage estimators Under the random effects model, is an unbiased predictor of   +  i in the sense that E - (   +  i ) = 0. –However, is inefficient in the sense that has a smaller mean square error than. –Here, has been “shrunk” towards the stable estimator –The “estimator” is said to “borrow strength” from the stable estimator Recall Note that  i  1 as either (i) T i  or (ii)   2 /   2 .

Best predictors From Section 3.1, it is easy to check that the generalized least square estimator of   is The linear predictor of   +  i that has minimum variance is =  i + (1 -  i ) m ,GLS. –Here, the acronym BLUP stands for best linear unbiased predictor.

Types of Predictors We have now introduced the BLUP of   +  i. This quantity is a linear combination of global parameters and subject-specific effects. Two other types of predictors are of interest. –Residuals. Here, we wish to “predict”  it. The BLUP residual turns out to be –Forecasts. Here, we wish to predict, for “L” lead time units into the future, –Without serial correlation, the predictor is the same as the predictor of   +  i. However, we will see that the mean square error turns out to be larger.

4.3 Best linear unbiased predictors This section develops best linear unbiased predictors in the context of mixed linear models, then specializes the consideration to longitudinal data mixed models. BLUPs are developed by examining the minimum mean square error predictor of a random variable, w. –We give a development due to Harville (1976). –The argument is originally due to Goldberger (1962), who coined the phrase best linear unbiased predictor. –The acronym was first used by Henderson (1973). BLUPs can also be developed as conditional expectations using multivariate normality BLUPs can also be developed in a Bayesian context.

Mixed linear models Suppose that we observe an N  1 random vector y with mean E y = X  and variance Var y = V. –We wish to predict a random variable w, that has mean E w =  and Var w =   2. –Denote the covariance between w and y as Cov(w,y) = cov wy. Assuming known regression parameters (  ), the best linear (in y) predictor of w is w * = E w + cov wy V -1 (y - E y ) =  + cov wy V -1 (y - X  ). –If w,y are multivariate normal, then w * equals E (w | y ) and hence is a minimum mean square predictor of w. –The predictor w * is also a minimum mean square predictor of w without the assumption of normality. See Appendix 4A.1.

BLUP’s as predictors To develop the BLUP, –define b GLS = ( X V -1 X ) -1 X V -1 y to be the generalized least squares (GLS) estimator of . –This is the best linear unbiased estimator (BLUE). –Replace  by b GLS in the definition of w * to get the BLUP w BLUP = b GLS + cov wy V -1 (y - X b GLS ) = ( - cov wy V -1 X) b GLS + cov wy V -1 y. –See Appendix 4A.2 for a check, establishing w BLUP as the best linear unbiased predictor of w. From Appendix 4A.3, we also have the form for the minimum mean square error: Var (w BLUP - w) = ( - cov wy V -1 X) ( X V -1 X ) -1 ( - cov wy V -1 X) - cov wy V -1 cov wy +   2.

Example: One-way model Recall, y it =   +  i +  it –Thus, y i = 1 i (   +  i ) +  i. Thus, X i = 1 i and –With this, we note that V i -1 (y i - X i b GLS )= –Thus, for predicting w =   +  i we have =1 and Cov(w, y i ) = 1 i     for the i th subject, 0 otherwise. Thus,

Random effect ANOVA model For predicting residuals  it we have =0 and Cov(w, y i ) =     for the i th subject, tth time period, 0 otherwise. Let 1 it be a T i  1 vector with a 1 in the tth position, 0 otherwise. Thus, is our BLUP residual.

4.4 Mixed model predictors Recall the longitudinal data mixed model y i = Z i  i + X i  +  i As described in Section 3.3, this is a special case of the mixed linear model. We use V = block diagonal (V 1,..., V n ), where V i = Z i D Z i + R i. X = (X 1,... X n ) For BLUP calculations, note that cov wy = ( Cov(w, y 1 ),…, Cov(w, y n ) )

Longitudinal data mixed model BLUP Recall that the r.v. w has mean E w =  and Var w =   2. The BLUP is The mean square error is Var (w BLUP - w) =

BLUP special cases Global parameters and subject-specific effects. –Suppose that the interest is in predicting linear combinations of global parameters  and subject- specific effect  i. –Consider linear combinations of the form w = c 1  i + c 2 . Residuals. Here, w =  it. Forecasts. Suppose that the ith subject is included in the data set; predict –for L lead time units in the future.

Predicting global parameters and subject-specific effects Consider linear combinations of the form w = c 1  i + c 2 . Straightforward calculations show that –E w = c 2  so that = c 2, –Cov (w, y j ) = c 1 D Z i for j = i –Cov (w, y j ) = 0 for j  i. Thus, w BLUP = c 2 b GLS + c 1 D Z i V i -1 (y i - X i b GLS ).

Special case 1 Take c 2 = 0. Because the means and variance expressions are true for all vectors c 2, we may write this in vector notation to get the BLUP of  i, the vector a i,BLUP = D Z i V i -1 (y i - X i b GLS ). This is unbiased in the sense that E a i,BLUP -  i = 0. This estimate has minimum variance among all linear unbiased predictors (BLUP). In the case of the error components model (z it = 1), this reduces to For comparison, recall the fixed effects parameter estimate,

Motivating BLUP’s We can also motivate BLUP’s using normal theory: –Consider the case where  i and  are multivariate normally distributed. –Then, it can be shown that E (  i | y i ) = D Z i V i -1 (y i -X i  ). –To motivate this, consider asking the question: what realization of  i could be associated with y i ? The expectation! –The BLUP is the BLUE of E (  i | y i ). (That is, replace  by b GLS.)

Special case 2 As another example, it is of interest to predict Choose and This yields This predictor is of interest in actuarial science, where it is known as the credibility estimator.

BLUP Residuals Here, w =  it. Because E w = 0, it follows that = 0. Straightforward calculations show that –Cov (w, y j ) =   2 1 it for j = i and –Cov (w, y j ) = 0 for j  i. –Here, the symbol 1 it denotes a T i  1 vector that has a “one” in the tth position and is zero otherwise. Thus e it,BLUP =   2 1 it V i -1 (y i - X i b GLS ). This can also be expressed as

Predicting future observations Suppose that the ith subject is included in the data set; predict –for L lead time units in the future. We will assume that and are known. It follows that Straightforward calculations show that Thus, the forecast of y i,T i +L is Thus, the forecast is the estimate of the conditional mean plus the serial correlation correction factor

Predicting future observations To illustrate, consider the special case where we have autoregressive of order 1 (AR(1)), serially correlated errors. Thus, we have After some algebra, the L step forecast is

4.5 Bayesian Inference With Bayesian statistical models, one views both the model parameters and the data as random variables. –We assume distributions for each type of random variable. Given the parameters β and α, the response model is –Specifically, we assume that the responses y conditional on α and β are normally distributed and that E (y | α, β ) = Z α + X β and Var (y | α, β) = R. Assume that α is distributed normally with mean  α and variance D and that β is distributed normally with mean μ β and variance  β, each independent of the other.

Distributions The joint distribution of (α, β) is known as the prior distribution. To summarize, the joint distribution of (α, β, y) is where V = R + Z D Z.

Posterior Distribution The distribution of parameters given the data is known as the posterior distribution. The posterior distribution of (α, β) given y is normal. The conditional moments are

Relation with BLUPs In longitudinal data applications, one typically has more information about the global parameters β than subject- specific parameters α. Consider first the case  β = 0, so that β =  β with probability one. –Intuitively, this means that β is precisely known, generally from collateral information. –Assuming that  α = 0, it is easy to check that the best linear unbiased estimator (BLUE) of E ( α | y ) is a BLUP = D Z V -1 ( y – X b GLS ) –Recall from equation (4.11) that a BLUP is also the best linear unbiased predictor in the frequentist (non-Bayesian) model framework.

Relation with BLUPs Consider second the case where  β -1 = 0. –In this case, prior information about the parameter β is vague; this is known as using a diffuse prior. –Assuming  α = 0, one can show that E ( α | y ) = a BLUP It is interesting that in both extreme cases, we arrive at the statistic a BLUP as a predictor of α. –This analysis assumes D and R are matrices of fixed parameters. –It is also possible to assume distributions for these parameters; typically, independent Wishart distributions are used for D -1 and R -1 as these are conjugate priors. –The general strategy of substituting point estimates for certain parameters in a posterior distribution is called empirical Bayes estimation.

Example – One-way random effects ANOVA model The posterior means turn out to be where Note that   measures the precision of knowledge about . Specifically, we see that   approaches one as   2 , and approaches zero as   2  0.

4.6 Wisconsin Lottery Sales T=40 weeks of sales from n =50 zip codes

Lottery Sales Data Analysis Cross-sectional analysis shows that population size heavily influences sales, with Kenosha as an outlier Multiple time series plots –show the effect of jackpots that is common to all postal codes –show the heterogeneity among postal codes (reaffirmed by a pooling test) –show the heteroscedasticity that is accommodated through a logarithmic transformation

Lottery Sales Model Selection In-sample results show that –One-way error components dominates pooled cross- sectional models –An AR(1) error specification significantly improves the fit. –The best model is probably the two-way error component model, with an AR(1) error specification (not yet documented) Out-of-sample analysis suggests that –logarithmic sales is the preferred choice of response; it outperforms sales and percentage change.

4.7. What is Credibility? Hickman’s (1975) Analogy –In politics, leaders begin with a reservoir of credibility which decreases as executive experience is compiled. –Insurance behaves in a reverse fashion! –Here, credibility increases as experience increases.

Credibility Theory Credibility is a technique for predicting future expected claims for a risk class, given past claims of that and related risk classes. Importance –Credibility is widely used for pricing property and casualty, worker’s compensation and health care coverages. –According to Rodermund (1989), “the concept of credibility has been the casualty actuaries’ most important and enduring contribution to casualty actuarial science.”

History Mowbray (1914 - PCAS) –Asked the question, “how extensive is an exposure necessary to give a dependable pure premium?” –This approach is now known as the “limited fluctuation” or “American” credibility Question 1 – do we have enough exposure to give full weight to the risk class under consideration? Question 2 – if not, how can we combine information from this and related risk classes?

More History Whitney (1918 - PCAS) –introduced the idea of using a weighted average of average claims of (1) a given risk class and (2) all risk classes. –The weight is known as the credibility factor. –It is of the form New Premium = Z  Claims Experience + (1 – Z)  Old Premium.

Example - Balanced Bühlmann Consider the model y it =  +  i +  it. The credibility factor is The traditional credibility estimator is

Example Hypothetical Claims for Three Towns Town Claims Average Claim 1 14, 12, 10, 12 1 = 12 2 9, 16, 15, 12 2 = 13 3 8, 10, 7, 7 3 = 8 Are there real differences among towns? Mowbray - does Town 3 have enough data to support its own estimator of pure premiums? Whitney - how can I use the information in Towns 1 and 2 to help determine my rate for Town 3?

Response toWhitney Known as the “shrinkage” effect Comparison of Subject-Specific Means to Credibility Estimators. 8131211 11.82512.650 8.525

Why study credibility theory? Long history of applications – “a business necessity” –More recently, many theoretical advances with fewer innovative applications Credibility techniques required in legal statutes and standards of practice –Standard of Practice 25 by the Actuarial Standards Board of the American Academy of Actuaries –Wisconsin statutes on credibility insurance and disability income Advanced techniques are critical for keeping up with competition (health insurance – health economists) Innovative techniques enhance the “credibility” of the profession

Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions.

Similar presentations

Presentation on theme: "Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions.

Similar presentations

Presentation on theme: "Chapter 4 Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models –Shrinkage estimation, types of predictions."— Presentation transcript:

Similar presentations

About project

Feedback