Appendix A of: Symmetry and Lattice Conditional Independence in a Multivariate Normal Distribution. by Andersson& Madsen. Presented by Shaun Deaton.

Slides:



Advertisements
Similar presentations
Presented by: Shaun Deaton. The idea is to hypothesize constraints on the interchangeability of N normally distributed random variables. Then test the.
Advertisements

T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the Standard Normal Distribution Review the Central Limit Theorem.
Rules of Matrix Arithmetic
Chapter 4 Euclidean Vector Spaces
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Lecture 3: A brief background to multivariate statistics
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
2013/12/10.  The Kendall’s tau correlation is another non- parametric correlation coefficient  Let x 1, …, x n be a sample for random variable x and.
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Hypothesis Testing Steps in Hypothesis Testing:
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Linear Algebra.
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Factor Analysis Purpose of Factor Analysis
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
Review of Matrix Algebra
Linear and generalised linear models
Orthogonality and Least Squares
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
APPLICATIONS OF DIFFERENTIATION 4. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.
Matrices and Determinants
Stats & Linear Models.
1 Operations with Matrice 2 Properties of Matrix Operations
Separate multivariate observations
CS 450: COMPUTER GRAPHICS 3D TRANSFORMATIONS SPRING 2015 DR. MICHAEL J. REALE.
Today Wrap up of probability Vectors, Matrices. Calculus
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
Little Linear Algebra Contents: Linear vector spaces Matrices Special Matrices Matrix & vector Norms.
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Multivariate Statistics Matrix Algebra II W. M. van der Veld University of Amsterdam.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Modern Navigation Thomas Herring
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Eigenvectors and Linear Transformations Recall the definition of similar matrices: Let A and C be n  n matrices. We say that A is similar to C in case.
Section 2.3 Properties of Solution Sets
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Computer Graphics Matrices
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Matrices, Vectors, Determinants.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Sect. 4.5: Cayley-Klein Parameters 3 independent quantities are needed to specify a rigid body orientation. Most often, we choose them to be the Euler.
MATRICES A rectangular arrangement of elements is called matrix. Types of matrices: Null matrix: A matrix whose all elements are zero is called a null.
Linear Algebra Review.
CH 5: Multivariate Methods
Systems of First Order Linear Equations
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Dr Huw Owens Room B44 Sackville Street Building Telephone Number 65891
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Maths for Signals and Systems Linear Algebra in Engineering Lectures 9, Friday 28th October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Basics of Linear Algebra
Presentation transcript:

Appendix A of: Symmetry and Lattice Conditional Independence in a Multivariate Normal Distribution. by Andersson& Madsen. Presented by Shaun Deaton

Let a random vector in ℝ 6 with a multivariate normal distribution N(0, Σ ), having zero mean and covariance matrix. dof=21

Introduce symmetry constraints; i.e. invertible linear transformations mapping the observation vector space ℝ 6 into itself. Consider for Example the sample vectors having components paired as shown below. This can be likened to measurements across three controlled populations or to measurements on three attributes of an individual (There are many variations, that can be chosen for the situation. ). In this case: Let x 1, x 3, and x 5 all represent the same observable, only each is restricted to a distinct object. Similarly, the remaining variables x 2, x 4, and x 6 represent the same observable property possessed by each object. Run experiment and collect data by measuring the two observable properties of each object. Obtaining at least N ≥ I samples from ℝ 6 to ensure MLE existence. Perform usual multivariate analysis to obtain empirical covariance. Object A Object B Object C

Let G1 = { Id, swap(A,B) } A B C dof=12 Now introducing symmetry constraints from G1, a set of symmetries, requires the following block form for the covariance matrix. Where Id is the identity transformation and swap is a permutation interchanging A & B. Rewriting the covariance matrix in block form using 2x2 symmetric positive definite matrices gives the following: Where generally:

Let G2 = { Id, swap(A,B), swap(A,C), swap(B,C), (A,B)(B,C)=CAB, (A,C)(B,C)=BCA } A C B dof=6 Another set of symmetry constraints that could be imposed on the analysis are given by G2. There are the two symmetries from before, plus four others, which can be seen as all symmetries of an equilateral triangle. Rewriting this covariance matrix using the 2x2 symmetric matrices as before, gives the following: Where generally:

Note that symmetries can manifest in many different contexts: (algebraic symmetry by permuting variables) x 1 + x 2 + x 3 + x 4 + x 5 + x 6 Is symmetric under all actions of G2. x 1 x 2 + x 3 x 4 + x 5 x 6 Is symmetric under all actions of G1. Same as swap(1,3)swap(2,4). Has more symmetries than this… Then there is … x 1 2 x 2 + x 3 2 x 4 + x 5 2 x 6

The measured samples vectors can be arranged into an N x 6 matrix in this example; generally an N x I matrix for I variables per sample. Let this data matrix be denoted by ; with a double arrow for being a vector of vectors. This notation is used for a convenient symbolizing when one wants to refer to all of the vector samples, or just one of the I x 1 vectors. More explicitly: Under the multivariate normal model, with zero mean, the empirical covariance matrix is calculated by: is an (I x N) x (N x I) = I x I matrix.

Hypothesis Testing works in the same fashion; by looking at the ratio of maximized likelihood functions under each hypothesis. (Then infer rejection via a chi-square or more generalized distribution.) Where I in exp is from the trace of the I×I identity matrix, obtained from. Which also defines the MLE sigma for the larger dof (degree-of-freedom); ‘big’ hypothesis. For the smaller dof (‘null’) hypothesis, the maximization gives. Where, and the new N x 6 matrix, must be such that. At least this is what is being tested; whether the covariance of the data has this symmetry or not.

Testing the data for this covariance symmetry is achieved by operating on the sample covariance matrix, transforming it into a new matrix to use in the likelihood ratio test. The NxI data matrix is not transformed into a new ‘symmetrized’ data set. Although, there is such a relationship, it is more complicated and involves extra calculation, rather than just ‘symmetrizing’ the sample covariance. Now can view the null hypothesis, that the covariance matrix has some group symmetry, at the data level of sample vectors; which comes down to saying that any and all samples may be transformed into new samples giving the same statistical information as far as the model is concerned. This idea is illustrated in the next slide, where a sample vector is assumed to be transformed by some invertible linear transformation. The possibilities for this transformation are generated from those belonging to the group G.

The linear transformation, must remain unchanged by all symmetries under consideration, and all symmetries must form a group.group Now, consider the following where is an invertible linear transformation and is one of the symmetry operations like the swap illustrated below*; then looking at the variable used by the exp() function in N(0, Σ ): *E.g. a swap(A,B) symmetry applied to a general matrix of 2x2 block matrices. Letting, then obtain. Showing that the transformation of a sample vector by a symmetry does not change the shape of the underlying distribution. Which is another way to view the situation, that is, the probability distribution over the sample space has symmetry; i.e. like the geometric shapes above for the distributions graphical form and like the polynomials above for the functional form of the distribution; if there is one.

There is a valid invertible transformation for every symmetry restriction. Giving |G| total number of transformed sample data, including the original data due to the identity transformation always being present. Since essentially everything has the identity symmetry; just do nothing. Still require to be a invariant (i.e. unchanged) by all symmetry actions. But, it is not, instead a group (used same as symmetry) action gives the following:. This shows that an element of the same form results, so the affect of applying all group actions is to permute the set of all transformed versions of the measured data. That is the |G| in total, N x 6 data matrices are swapped around in their entirety. The only group action that fixes is the identity action, so its asymmetric, and so must increase its symmetry to the whole group.

Creating symmetric objects. Given any non-symmetric shape, i.e. has only invariance under identity, action, like… Turn it into a shape that has a non-trivial symmetry; here overlapping six copies of the shape, each rotated by a multiple of 60 0, generates the new shape. Having the rotational symmetries of a regular hexagon, denoted by C 6. From this shape another can be generated by copying & reflecting about the vertical line shown. Now, all symmetries of a hexagon are possessed by this final shape, denoted by D 6 ; i.e. all ways it can be flipped & rotated while afterwards being unchanged; of which there are 12 ways. The C 6 & D 6 symbols are just standard group names specifying a collection of symmetries.

Likewise a similar idea is applied to the empirical covariance matrix, increasing the number of matrices to a total of |G|, by applying all symmetry operations. The ‘overlapping’ in this case is replaced by matrix addition, which is then averaged by dividing by |G|. -The formula for the transformation is given by: -Where the resulting matrix has the following G-invariant property:. This MLE is then used for a likelihood ratio test to determine if the assumed symmetry structure is present in the objects under experiment.

Let H be a subgroup of G. Then the covariance matrix can be symmetrized under each group structure and tested. Since a larger group means more symmetry restriction there will be a decrease in the dof of the model, giving an inverse relationship between subgroups of groups and sub-hypothesis of hypothesis. H = H Id, here the identity just gives the usual empirical covariance. H 0 = H H, matrix symmetry under H. H 00 = H G, matrix symmetry under G. More to come … ?

Back Definition: A group is a set with a closed binary operation, defined on it, denoted by ‘*’, such that the following holds: (where closed means the operation does not give elements outside of the set.) 1)Associative Law– (a*b)*c = a*(b*c) 2)Identity Law– There is a unique element Id, where Id*g = g*Id = g, for all elements g in the group. 3)Inverse Law– Every group element has a unique inverse; i.e. given any element g, can find an element h where g*h = h*g = Id. Additionally, a group representation is a mapping of the group G into the group of invertible linear transformations GL(V), on some n-D vector space V. In our case V is the I-D sample space ℝ I, and G is mapped to a special subgroup of GL(V), composed of all orthogonal matrices O(V); i.e. from the matrix view those where the transpose of a matrix is its inverse. This is the reason for the transpose operator for some of the group operations, it’s the inverse under the orthogonal representation.

Note: This slide is an aside from the previous material, it looks more into applying the symmetrization to the underlying data, not the covariance. About calculating the symmetrized covariance matrix straight from the data; (which is not the way to go, especially for efficiency). The action on the sample vectors turns into a similar action on the N x I data matrix: The |G| diagonal terms give the correct symmetrization, just need a change of variables to put the transposition in the right place using the following to transform the initial sample vectors; gives the usual conjugation form. Although the |G| 2 - |G| extra, off diagonal terms, do not cancel. When summed pair-wise with their transpose related term, they form |G|(|G|- 1 )/2 symmetric matrices. These “cross-conjugation” terms are interesting, I am still going to look into the problem from this perspective, though will not spend too much time on it.