Analysis of multivariate transformations
Transformation of the response in regression The normalized power transformation is: is the geometric mean of the observations The purpose is to find an estimate of for which the errors in z( ) are approximately normally distributed with constant variance
Score test for transformation The score test T sc ( = 0 ) is the t-statistic on the constructed variable w( 0 )
Multivariate transformations In this case y i is a v 1 vector of responses at observation i with y ij the observation on response j. The normalized transformation of y ij is given by: is the geometric mean of the jth response
Multivariate transformations We assume a multivariate linear regression model of the form
Mult. transformations to normality If the transformed obs. are normally distributed with mean μ i and cov. matrix Σ the max. loglikelihood is given by
Mult. transformations to normality If the explanatory variables are the same The max. lik. estimator of Σ is given by e i (λ) is a v 1 vector of residuals for observation i for some value of
The profile loglikelihood (i.e. maximized over μ and Σ) is
Multivariate likelihood ratio test The multivariate generalization of T SC is given by: This statistic must be compared with a 2 distr. with v df.
Swiss heads: monitoring lik. ratio test for transf. H 0 :λ=1 The last two units (104 and 111) to enter provide all the evidence for a transformation
Boxplot of 6 var. with univariate outliers labelled
Swiss heads The marginal distribution of y 4 had the two outliers (units 104 and 111). We want to test whether all the evidence for a transformation is due to y 4. We recalculate the likelihood ratio but now testing whether 4 is equal to 1.
Forward plot of the lik. ratio test H 0 : 4 =1 The last two units to enter provide all the evidence for a transformation
Mussels data 82 observations on Horse mussels (cozze) from New Zealand. Five variables: Purpose: to see whether multivariate normality can be obtained by joint transformation of all 5 variables
Mussels data: spm
Forward lik. ratio for H 0 : =1
Finding a multivariate transformation with the forward search With just one variable for transformation it is extremely easy to use the fan plot from the forward search to find satisfactory transformations and observations which are influential With v variables there are 5 v combinations of the 5 values of =(-1,-0.5,0,0.5,1)
Suggested procedure for finding multivariate transformations Run the FS through untransformed data, ordering the observations at each m by MD calculated from untransformed observations. Estimate at each step. Select a preliminary set of transformation parameters
Monitoring of MLE of H 0 : =1 H 0 : =(0.5, 0, 0.5, 0, 0)
Monitoring of MLE of H 0 : =(0.5, 0, 0.5, 0, 0)
Forward lik. ratio for H 0 : =(0.5,0,0.5,0,0)
Validation of the transformation In univariate analysis the likelihood ratio test is Asymptotically the null distribution of T LR is chi-squared on one degree of freedom.
Signed square root of T LR This test asymptotically has N(0,1) Including the sign of the difference between the two gives an indication of the direction of any departure from the hypothesised value
Multivariate version of the signed sqrt lik. ratio We test just one component of when all others are kept at some specified value We calculate a set of tests by varying each component of about 0
Example: mussels data validation of 0 =(0.5,0,0.5,0,0) Purpose to validate in a multivariate way 1 =0.5 for the first variable To form the likelihood ratio test we need an estimator = ( 1, …, v ) found by maximization only over 1. The other parameters keep their values in 0. (In this example 0,0.5,0,0) 1 takes the 5 standard values of (-1,-0.5,0,0.5,1)
Example: validation of 1 We perform 5 independent FS with 0 =(-1, 0,0.5,0,0) 0 =(-0.5, 0,0.5,0,0) 0 =(0, 0,0.5,0,0) 0 =(0.5, 0,0.5,0,0) 0 =(-1, 0,0.5,0,0) We monitor for each search the signed square root likelihood ratio test
Version for multivariate data of the signed sqrt LR test j is the parameter under test S j is one of the 5 standard values of 0j is the vector of parameter values in which j takes one of the 5 standard values S while the other parameters keep their value in 0 One plot for each j j =1, …, v
Mussels data: validation of 0 =(0.5,0,0.5,0,0)
Forward lik. ratio for H 0 : =(1/3,1/3,1/3,0,0)
Mussels data: spm (transf. obs.)
Monitoring MD before transforming
Monitoring MD after transforming
Minimum MD before and after transforming The transformation has separated the outliers from the bulk of the data.
Gap before and after transforming
Conclusions This was an example of our approach to finding a mult. transformation in the presence of potential influential obs. and outliers. Procedure: start the search with untransformed data to suggest a transformation and repeat the analysis until you find an acceptable transformation. In this example only 3 searches were necessary to find a transformation which is stable for all the search, any changes being at the end.
Exercises
Exercise 1 The next slide gives two sets of bivariate data. Which of the two has to be transformed to achieve bivariate normality? Consider a forward search in which you monitor the likelihood ratio test for the hypothesis of no transformation. Describe the plot you would expect to get for each of the two sets of data.
Two sets of simulated bivariate data