Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Slides:



Advertisements
Similar presentations
Cointegration and Error Correction Models
Advertisements

Autocorrelation and Heteroskedasticity
AP Statistics Course Review.
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Chapter 12 Inference for Linear Regression
Brief introduction on Logistic Regression
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
L18: CAPM1 Lecture 18: Testing CAPM The following topics will be covered: Time Series Tests –Sharpe (1964)/Litner (1965) version –Black (1972) version.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 11 Multiple Regression.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Inference for regression - Simple linear regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multinomial Distribution
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Tutorial I: Missing Value Analysis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Computacion Inteligente Least-Square Methods for System Identification.
Lesson Testing the Significance of the Least Squares Regression Model.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
CHAPTER 12 More About Regression
BINARY LOGISTIC REGRESSION
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
CHAPTER 12 More About Regression
Fundamentals of regression analysis
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Review for Exam 2 Some important themes from Chapters 6-9
Multiple Regression Models
Section 11.2 Day 2.
Correlation and Simple Linear Regression
Review of Statistical Inference
When You See (This), You Think (That)
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
Statistics II: An Overview of Statistics
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
3.2. SIMPLE LINEAR REGRESSION
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Analysis of multivariate transformations

Transformation of the response in regression The normalized power transformation is: is the geometric mean of the observations The purpose is to find an estimate of for which the errors in z( ) are approximately normally distributed with constant variance

Score test for transformation The score test T sc ( = 0 ) is the t-statistic on the constructed variable w( 0 )

Multivariate transformations In this case y i is a v 1 vector of responses at observation i with y ij the observation on response j. The normalized transformation of y ij is given by: is the geometric mean of the jth response

Multivariate transformations We assume a multivariate linear regression model of the form

Mult. transformations to normality If the transformed obs. are normally distributed with mean μ i and cov. matrix Σ the max. loglikelihood is given by

Mult. transformations to normality If the explanatory variables are the same The max. lik. estimator of Σ is given by e i (λ) is a v 1 vector of residuals for observation i for some value of

The profile loglikelihood (i.e. maximized over μ and Σ) is

Multivariate likelihood ratio test The multivariate generalization of T SC is given by: This statistic must be compared with a 2 distr. with v df.

Swiss heads: monitoring lik. ratio test for transf. H 0 :λ=1 The last two units (104 and 111) to enter provide all the evidence for a transformation

Boxplot of 6 var. with univariate outliers labelled

Swiss heads The marginal distribution of y 4 had the two outliers (units 104 and 111). We want to test whether all the evidence for a transformation is due to y 4. We recalculate the likelihood ratio but now testing whether 4 is equal to 1.

Forward plot of the lik. ratio test H 0 : 4 =1 The last two units to enter provide all the evidence for a transformation

Mussels data 82 observations on Horse mussels (cozze) from New Zealand. Five variables: Purpose: to see whether multivariate normality can be obtained by joint transformation of all 5 variables

Mussels data: spm

Forward lik. ratio for H 0 : =1

Finding a multivariate transformation with the forward search With just one variable for transformation it is extremely easy to use the fan plot from the forward search to find satisfactory transformations and observations which are influential With v variables there are 5 v combinations of the 5 values of =(-1,-0.5,0,0.5,1)

Suggested procedure for finding multivariate transformations Run the FS through untransformed data, ordering the observations at each m by MD calculated from untransformed observations. Estimate at each step. Select a preliminary set of transformation parameters

Monitoring of MLE of H 0 : =1 H 0 : =(0.5, 0, 0.5, 0, 0)

Monitoring of MLE of H 0 : =(0.5, 0, 0.5, 0, 0)

Forward lik. ratio for H 0 : =(0.5,0,0.5,0,0)

Validation of the transformation In univariate analysis the likelihood ratio test is Asymptotically the null distribution of T LR is chi-squared on one degree of freedom.

Signed square root of T LR This test asymptotically has N(0,1) Including the sign of the difference between the two gives an indication of the direction of any departure from the hypothesised value

Multivariate version of the signed sqrt lik. ratio We test just one component of when all others are kept at some specified value We calculate a set of tests by varying each component of about 0

Example: mussels data validation of 0 =(0.5,0,0.5,0,0) Purpose to validate in a multivariate way 1 =0.5 for the first variable To form the likelihood ratio test we need an estimator = ( 1, …, v ) found by maximization only over 1. The other parameters keep their values in 0. (In this example 0,0.5,0,0) 1 takes the 5 standard values of (-1,-0.5,0,0.5,1)

Example: validation of 1 We perform 5 independent FS with 0 =(-1, 0,0.5,0,0) 0 =(-0.5, 0,0.5,0,0) 0 =(0, 0,0.5,0,0) 0 =(0.5, 0,0.5,0,0) 0 =(-1, 0,0.5,0,0) We monitor for each search the signed square root likelihood ratio test

Version for multivariate data of the signed sqrt LR test j is the parameter under test S j is one of the 5 standard values of 0j is the vector of parameter values in which j takes one of the 5 standard values S while the other parameters keep their value in 0 One plot for each j j =1, …, v

Mussels data: validation of 0 =(0.5,0,0.5,0,0)

Forward lik. ratio for H 0 : =(1/3,1/3,1/3,0,0)

Mussels data: spm (transf. obs.)

Monitoring MD before transforming

Monitoring MD after transforming

Minimum MD before and after transforming The transformation has separated the outliers from the bulk of the data.

Gap before and after transforming

Conclusions This was an example of our approach to finding a mult. transformation in the presence of potential influential obs. and outliers. Procedure: start the search with untransformed data to suggest a transformation and repeat the analysis until you find an acceptable transformation. In this example only 3 searches were necessary to find a transformation which is stable for all the search, any changes being at the end.

Exercises

Exercise 1 The next slide gives two sets of bivariate data. Which of the two has to be transformed to achieve bivariate normality? Consider a forward search in which you monitor the likelihood ratio test for the hypothesis of no transformation. Describe the plot you would expect to get for each of the two sets of data.

Two sets of simulated bivariate data