Matrix Algebra and Linear Models. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows.

Slides:



Advertisements
Similar presentations
SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The Simple Regression Model
Lecture 3: A brief background to multivariate statistics
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Chapter 2 Basic Linear Algebra
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Matrices and Systems of Equations
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
Review of Matrix Algebra
Chapter 11 Multiple Regression.
Economics 2301 Matrices Lecture 13.
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Basics of regression analysis
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
Linear regression models in matrix terms. The regression function in matrix terms.
INDR 262 INTRODUCTION TO OPTIMIZATION METHODS LINEAR ALGEBRA INDR 262 Metin Türkay 1.
Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,
Stats & Linear Models.
Lecture 10A: Matrix Algebra. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns)
1 Operations with Matrice 2 Properties of Matrix Operations
Separate multivariate observations
5  Systems of Linear Equations: ✦ An Introduction ✦ Unique Solutions ✦ Underdetermined and Overdetermined Systems  Matrices  Multiplication of Matrices.
Objectives of Multiple Regression
Chapter 10 Review: Matrix Algebra
Compiled By Raj G. Tiwari
Sundermeyer MAR 550 Spring Laboratory in Oceanography: Data and Methods MAR550, Spring 2013 Miles A. Sundermeyer Linear Algebra & Calculus Review.
ECON 1150 Matrix Operations Special Matrices
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
Matrix Algebra. Quick Review Quick Review Solutions.
Chap. 2 Matrices 2.1 Operations with Matrices
Presentation by: H. Sarper
Some matrix stuff.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 3 Determinants Linear Algebra. Ch03_2 3.1 Introduction to Determinants Definition The determinant of a 2  2 matrix A is denoted |A| and is given.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
1 Overview of Experimental Design. 2 3 Examples of Experimental Designs.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Matrices and Determinants
Linear Algebra Chapter 2 Matrices.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
1 G Lect 4W Multiple regression in matrix terms Exploring Regression Examples G Multiple Regression Week 4 (Wednesday)
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Matrices Introduction.
Regression.
Matrices and Vectors Review Objective
Evgeniya Anatolievna Kolomak, Professor
Systems of First Order Linear Equations
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
OVERVIEW OF LINEAR MODELS
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Matrix Algebra and Linear Models

Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns) think of Railroad Car (3 x 1) (1 x 4) (3 x 3) (3 x 2) A matrix is defined by a list of its elements. B has ij-th element B ij -- the element in row i and column j Usually written in bold lowercase General Matrices Usually written in bold uppercase a= A b=(20521) C= A D= A

Addition and Subtraction of Matrices If two matrices have the same dimension (same number of rows and columns), then matrix addition and subtraction simply follows by adding (or subtracting) on an element by element basis Matrix addition: (A+B) ij = A ij + B ij Matrix subtraction: (A-B) ij = A ij - B ij Examples: A= µ ∂ andB= µ ∂ C=A+B= µ ∂ andD=A°B= µ 2°2 °11 ∂ - - -

Partitioned Matrices It will often prove useful to divide (or partition) the elements of a matrix into a matrix whose elements are itself matrices. One can partition a matrix in many ways A row vector whose elements are column vectors A column vector whose elements are row vectors C= A = 0 B B B ¢¢¢¢¢¢¢¢¢¢¢¢ C C C C A = µ ab dB ∂ … … … a=(3);b=(12);d= µ 2 1 ∂ ;B= µ ∂

Towards Matrix Multiplication: dot products The dot (or inner) product of two vectors (both of length n) is defined as follows: Example: a. b = 1*4 + 2*5 + 3*7 + 4*9 = 60 a¢b= n X i =1 a i b i. a= C A andb=(4579)

Towards Matrix Multiplication: Systems of Equations Matrix multiplication is defined in such a way so as to compactly represent and solve linear systems of equations. This can be more compactly written in matrix form as y=π+Ø 1 x 1 +¢¢¢Ø n x n … The least-squares solution for the linear model yields the following system of equations for the  i

Matrix Multiplication: Quick overview The order in which matrices are multiplied effects the matrix product, e.g. AB = BA For the product of two matrices to exist, the matrices must conform. For AB, the number of columns of A must equal the number of rows of B. The matrix C = AB has the same number of rows as A and the same number of columns as B. C (rxc) = A (rxk) B (kxc) Inner indices must match columns of A = rows of B Outer indices given dimensions of resulting matrix, with r rows (A) and c columns (B) Elements in the ith row of matrix A Elements in the jth column of B  ij-th element of C is given by C ij = k X l =1 A il B lj

More formally, consider the product L = MN M= 0 B m 1 m 2... m r 1 C C A wherem i =(M i 1 M i 2 ¢¢¢M ic ) … Express the matrix M as a column vector of row vectors N=(n 1 n 2 ¢¢¢n b )wheren j = 0 B N 1 j N 2 j... N cj 1 C C A … Likewise express N as a row vector of column vectors The ij-th element of L is the inner product of M’s row i with N’s column j L= B m 1 ¢n 1 m 1 ¢n 2 ¢¢¢m 1 ¢n b m 2 ¢n 1 m 2 ¢n 2 ¢¢¢m 2 ¢n b m r ¢n 1 m r ¢n 2 ¢¢¢m r ¢n b C C A 01 … … …

Examples AB= µ ab cd ∂µ ef gh ∂ = µ ae+bgaf+bh ce+dgcf+dh ∂ ( ( ( ) )) BA= µ ae+cfeb+df ga+chgd+dh ∂ ( ) Likewise

The Transpose of a Matrix The transpose of a matrix exchanges the rows and columns, A T ij = A ji Useful identities (AB) T = B T A T (ABC) T = C T B T A T Inner product = a T b = a T (1 X n) b (n X 1) Indices match, matrices conform Dimension of resulting product is 1 X 1 (i.e. a scalar) Outer product = ab T = a (n X 1) b T (1 X n) Resulting product is an nxn matrix a= a 1... a n 1 A b= b 1... b n 1 A a 1... a n 1 A (b 1 ¢¢¢b n )=ab T = 0 B a 1 b 1 a 1 b 2 ¢¢¢a 1 b n a 2 b 1 a 2 b 2 ¢¢¢a 2 b n a n b 1 a n b 2 ¢¢¢a n b n 1 C C A … (a 1 ¢¢¢a n ) b 1... b n 1 A =a T b= n X i =1 a i b i … Note that  b T a = (a T b) T = a T b

The Identity Matrix, I The identity matrix serves the role of the number 1 in matrix multiplication : AI =A, IA = A I is a  diagonal matrix, with all diagonal elements being one, all off-diagonal elements zero. I ij = 1 for i = j 0 otherwise

The Inverse Matrix, A -1 ( A= µ ab cd ∂ For For a square matrix A, define the Inverse of A, A -1, as the matrix satisfying A -1 A=AA -1 =I A ° 1 = 1 ad°bc µ d°b °ca ∂ If this quantity (the determinant) is zero, the inverse does not exist. If det(A) is not zero, A -1 exists and A is said to be non-singular. If det(A) = 0, A is singular, and no unique inverse exists (generalized inverses do)

Useful identities (AB) -1 = B -1 A -1 (A T ) -1 = (A -1 ) T Key use of inverses --- solving systems of equations Ax = c Matrix of known constants Vector of known constants Vector of unknowns we wish to solve for A -1 Ax = A -1 c. Note that A -1 Ax = Ix = x Hence, x = A -1 c V  = c --->  = V -1 c

Example: solve the OLS for  in y =  +  1 z 1 +  2 z 2 + e c= æ(y;z 1 ) æ(y;z 2 ) 1 A V= æ 2 (z 1 )æ(z 1 ;z 2 ) æ(z 1 ;z 2 )æ 2 (z 2 ) 1 A  = V -1 c æ(z 1 ;z 2 )=Ω 12 æ(z 1 )æ(z 2 ) It is more compact to use V ° 1 = 1 æ 2 (z 1 )æ 2 (z 2 )(1°Ω 2 12 ) æ 2 (z 2 )°æ(z 1 ;z 2 ) °æ(z 1 ;z 2 )æ 2 (z 1 ) 1 A Ø 1 Ø 2 1 A = 1 æ 2 (z 1 )æ 2 (z 2 )(1°Ω 2 12 ) æ 2 (z 2 )°æ(z 1 ;z 2 ) °æ(z 1 ;z 2 )æ 2 (z 1 ) 1 A æ(y;z 1 ) æ(y;z 2 ) 1 A -

If  12 = 0, these reduce to the two univariate slopes, Ø 1 = æ(y;z 1 ) æ 2 (z 1 ) andØ 2 = æ(y;z 2 ) æ 2 (z 2 )

Quadratic and Bilinear Forms Quadratic product: for A n x n and x n x 1 x T Ax= n X i =1 n X j =1 a ij x i x j Scalar (1 x 1) Bilinear Form (generalization of quadratic product) for A m x n, a n x 1, b m x1 their bilinear form is b T 1 x m A m x n a n x 1 Note that b T A a = a T A T b b T Aa= m X i =1 n X j =1 A ij b i a j

Covariance Matrices for Transformed Variables æ 2 ° c T x ¢ =æ 2 √ n X i =1 c i x i ! =æ n X i =1 c i x i ; n X j =1 c j x j 1 A = n X i =1 n X j =1 æ(c i x i ;c j x j )= n X i =1 n X j =1 c i c j æ(x i ;x j ) =c T Vc ( ) ( What is the variance of the linear combination, c 1 x 1 + c 2 x 2 + … + c n x n ? (note this is a scalar) Likewise, the covariance between two linear combinations can be expressed as a bilinear form, æ(a T x;b T x)=a T Vb

Now suppose we transform one vector of random variables into another vector of random variables Transform x into (i) y k x 1 = A k x n x n x 1 (ii) z m x 1 = B m x n x n x 1 The covariance between the elements of these two transformed vectors of the original is a k x m covariance matrix = AVB T For example, the covariance between y i and y j is given by the ij-th element of AVA T Likewise, the covariance between y i and z j is given by the ij-th element of AVB T

The Multivariate Normal Distribution (MVN) Consider the pdf for n independent normal random variables, the ith of which has mean  i and variance  2 i This can be expressed more compactly in matrix form p(x)= n Y i =1 (2º) ° 1 = 2 æ ° 1 i exp µ ° (x i °π i ) 2 2æ 2 i ∂ =(2º) °n= 2 √ n Y i =1 æ i ! ° 1 exp √ ° n X i =1 (x i °π i ) 2 2æ 2 i !

Define the covariance matrix V for the vector x of the n normal random variable by Fun fact! For a diagonal matrix D, the Det = | D | = product of the diagonal elements Define the mean vector  by Hence in matrix from the MVN pdf becomes Now notice that this holds for any vector  and symmetric (positive-definite) matrix V (pd -> a T Va = Var(a T x) > 0 for all nonzero a) jVj= n Y i =1 æ 2 i V= 0 B æ 2 1 0¢¢¢0 0æ 2 2 ¢¢¢ ¢¢¢¢¢¢æ 2 n 1 C C A … … n - X i =1 (x i °π i ) 2 æ 2 i =(x°π) T V ° 1 (x°π) π= 0 B π 1 π 2... π n 1 C C A p(x)=(2º) °n= 2 jVj ° 1 = 2 exp ∑ ° 1 2 (x°π) T V ° 1 (x°π) ∏

Properties of the MVN - I 1) If x is MVN, any subset of the variables in x is also MVN 2) If x is MVN, any linear combination of the elements of x is also MVN. If x is MVN( ,V) fory=x+a;yisMVN n (π+a;V) fory=a T x= n X k =1 a i x i ;yisN(a T π;a T Va) fory=Ax;yisMVN m ≥ Aπ;A T VA ¥

Properties of the MVN - II 3) Conditional distributions are also MVN. Partition x into two components, x 1 (m dimensional column vector) and x 2 ( n-m dimensional column vector) x 1 | x 2 is MVN with m-dimensional mean vector and m x m covariance matrix x= µ x 1 x 2 ∂ ( ) π= µ π 1 π 2 ∂ andV= V x 1 x 1 V x 1 x 2 V T x 1 x 2 V x 2 x 2 1 A ( ) π x 1 j x 2 =π 1 +V x 1 x 2 V ° 1 x 2 x 2 (x 2 °π 2 ) V x 1 j x 2 =V x 1 x 1 °V x 1 x 2 V ° 1 x 2 x 2 V T x 1 x 2

Properties of the MVN - III 4) If x is MVN, the regression of any subset of X on another subset is linear and homoscedastic Where e is MVN with mean vector 0 and Variance-covariance matrix The regression is linear because it is a linear function of x 2 linear The regression is homoscedastic because the variance is a constant that does not change with the particular value of the random vector x 2 x 1 =π x 1 x 2 +e j =π 1 +V x 1 x 2 V ° 1 x 2 x 2 (x 2 °π 2 )+e - -

Example: Regression of Offspring value on Parental values Assume the vector of offspring value and the values of both its parents is MVN. Then from the correlations among (outbred) relatives, Hence, Where e is normal with mean zero and variance z o z s z d 1 A ªMVN 2 4 π o π s π d 1 A ;æ 2 z 1h 2 =2h 2 =2 h 2 =210 h 2 =201 1 A 3 5 is Let x 1 =(z o );x 2 = µ z s z d ∂ ( ) ( ) V x 1 ; x 1 =æ 2 z ;V x 1 ; x 2 = h 2 æ 2 z 2 (11);V x 2 ; x 2 =æ 2 z æ 2 e =æ 2 z ° h 2 æ 2 z 2 (11)æ ° 2 z h 2 æ 2 z =æ 2 z µ 1° h 4 2 ∂ (( z o =π o + h 2 æ 2 z 2 (11)æ ° 2 z µ ∂µ z s °π s z d °π d ∂ +e =π o + h 2 2 (z s °π s )+ h 2 2 (z d °π d )+e - ( ( =π 1 +V x 1 x 2 V ° 1 x 2 x 2 (x 2 °π 2 )+e - -

Hence, the regression of offspring trait value given the trait values of its parents is z o =  o + h 2 /2(z s -  s ) + h 2 /2(z d -  d ) + e where the residual e is normal with mean zero and Var(e) =  z 2 (1-h 4 /2) Similar logic gives the regression of offspring breeding value on parental breeding value as A o =  o + (A s -  s )/2 + (A d -  d )/2 + e = A s /2 + A d /2 + e where the residual e is normal with mean zero and Var(e) =  A 2 /2

Linear Models

Quick Review of the Major Points The general linear model can be written as y = X  + e Vector of observed dependent values Design matrix --- observations of the variables in the assumed linear model Vector of unknown parents to estimate Vector of residuals. Usually assumed MVN with mean zero If the residuals are homoscedastic and uncorrelated, so that we can write the cov matrix of e as Cov(e) =  2 I then use the OLS estimate, OLS(  (X T X) -1 X T y If the residuals are heteroscedastic and/or dependent, Cov(e) = V and we must use GLS solution for , GLS(  ) = (X T V -1 X) -1 V -1 X T y

y=π+Ø 1 x 1 +Ø 2 x 2 +¢¢¢+Ø n x x +e … Linear Models One tries to explain a dependent variable y as a linear function of a number of independent (or predictor) variables. A multiple regression is a typical linear model, Here e is the residual, or deviation between the true value observed and the value predicted by the linear model. As with a univariate regression (y =a + bx + e), the model parameters are typically chosen by least squares, wherein they are chosen to minimize the sum of squared residuals,  e i 2 The (partial) regression coefficients are interpreted as follows. A unit change in x i while holding all other variables constant results in a change of  i in y

Predictor and Indicator Variables In a regression, the predictor variables are typically continuous, although they need not be. Suppose we measuring the offspring of p sires. One linear model would be y ij =  + s i + e ij Trait value of offspring j from sire i Overall mean. This term in included to give the s i terms a mean value of zero, i.e., they are expressed as deviations from the mean The effect for sire i (the mean of its offspring). Recall that variance in the s i estimates Cov(half sibs) = V A /2 The deviation of the jth offspring from the family mean of sire i. The variance of the e’s estimates the within-family variance. Note that the predictor variables here are the s i, something that we are trying to estimate We can write this in linear model form by using indicator variables, Giving the above model as x ik = Ω 1ifsirek=i 0otherwise

Models consisting entirely of indicator variables are typically called ANOVA, or analysis of variance models Models that contain no indicator variables (other than for the mean), but rather consist of observed value of continuous or discrete values are typically called regression models Both are special cases of the General Linear Model (or GLM) y ijk =  + s i + d ij +  x ijk + e ijk ANOVA model Regression model Example: Nested half sib/full sib design with an age correction  on the trait

Linear Models in Matrix Form Suppose we have 3 variables in a multiple regression, with four (y,x) vectors of observations. In matrix form, y=XØ+e y= 0 y 1 y 2 y 3 y 4 1 C A Ø= 0 B π Ø 1 Ø 2 Ø 3 1 e= 0 e 1 e 2 e 3 e 4 1 C A X= 0 1x 11 x 12 x 13 1x 21 x 22 x 23 1x 31 x 32 x 33 1x 41 x 42 x 43 1 C A Vector of parameters to estimate The design matrix X. Details of both the experimental design and the observed values of the predictor variables all reside solely in X y i =π+Ø 1 x i 1 +Ø 2 x i 2 +Ø 2 x i 2 +e i

The Sire Model in Matrix Form The model here is y ij =  + s i + e ij Consider three sires. Sire 1 has 2 offspring, sire 2 has one and sire 3 has three. The GLM form is y= 0 B B B B B y 11 y 12 y 21 y 31 y 32 y 33 1 C C C C C ;X= 0 B B B B B C C C C C Ø= 0 π s 1 s 2 s 3 1 C A ;ande= 0 B B B B e 11 e 12 e 21 e 31 e 32 e 33 1 C C C C C A

Ordinary Least Squares (OLS) When the covariance structure of the residuals has a certain form, we solve for the vector  using OLS If the residuals are homoscedastic and uncorrelated,  2 (e i ) =  e 2,  (e i,e j ) = 0. Hence, each residual is equally weighted, Sum of squared residuals can be written as Taking (matrix) derivatives shows this is minimized by This is the OLS estimate of the vector  The variance-covariance estimate for the sample estimates is Predicted value of the y’s If residuals follow a MVN distribution, OLS = ML solution - V b =(X T X) ° 1 æ 2 e - n X i =1 b e 2 i = b e T b e=(y°Xb) T (y°Xb) - - b=(X T X) ° 1 X T y -

Example: Regression Through the Origin y i =  x i + e i Here X= 0 B x 1 x 2... x n 1 C C A y= 0 B y 1 y 2... y n 1 C C A Ø=(Ø) X T X= n X i =1 x 2 i X T y= n X i =1 x i y i P b= ≥ X T X ¥ ° 1 X T y= P x i y i x 2 i ( ) - P P æ 2 (b)= 1 n°1 (y i °bx i ) 2 x 2 i - - æ 2 (b)= ≥ X T X ¥ ° 1 æ 2 e = æ 2 e P x 2 i ( ) - X æ 2 e = 1 n1 (y i bx i ) 2 - -

Polynomial Regressions and Interaction Effects GLM can easily handle any function of the observed predictor variables, provided the parameters to estimate are still linear, e.g. Y =  +  1 f(x) +  1 g(x) + … + e Quadratic regression: Interaction terms (e.g. sex x age) are handled similarly With x 1 held constant, a unit change in x 2 changes y by  2 +  3 x 1 Likewise, a unit change in x 1 changes y by  1 +  3 x 2 y i =Æ+Ø 1 x i +Ø 2 x 2 i +e i Ø= Æ Ø 1 Ø 2 1 A X= 0 B 1x 1 x 2 1 1x 2 x x n x 2 n 1 C C A y i =Æ+Ø 1 x i 1 +Ø 2 x i 2 +Ø 3 x i 1 x i 2 +e i Ø= 0 Æ Ø 1 Ø 2 Ø 3 1 C A X= 0 B 1x 11 x 12 x 11 x 12 1x 21 x 22 x 21 x x n 1 x n 2 x n 1 x n 2 1 C C A

Fixed vs. Random Effects In linear models are are trying to accomplish two goals: estimation the values of model parameters and estimate any appropriate variances. For example in the simplest regression model, y =  +  x + e, we estimate the values for  and  and also the variance of e. We, of course, can also estimate the e i = y i - (  +  x i ) Note that  are fixed constants are we trying to estimate (fixed factors or fixed effects), while the e i values are drawn from some probability distribution (typically Normal with mean 0, variance  2 e ). The e i are random effects.

“Mixed” models contain both fixed and random factors This distinction between fixed and random effects is extremely important in terms of how we analyzed a model. If a parameter is a fixed constant we wish to estimate, it is a fixed effect. If a parameter are drawn from some probability distribution and we are trying to make inference on either the distribution and/or specific realizations from this distribution, it is a random effect. We generally speak of estimating fixed factors and predicting random effects.

Example: Sire model y ij =  + s i + e ij Fixed effectRandom effect It depends. If we have (say) 10 sires, if we are ONLY interested in the values of these particular 10 sires and don’t care to make any other inferences about the population from which the sires are drawn, then we can treat them as fixed effects. In the case, the model is fully specified the covariance structure for the residuals. Thus, we need to estimate , s 1 to s 10 and  2 e, and we write the model as y ij =  + s i + e ij,  2 (e ij ) =  2 e. Is the sire effect fixed or random? Conversely, if we are not only interested in these 10 particular sires but also wish to make some inference about the population from which they were drawn (such as the additive variance, since  2 A = 4  2 s, ), then the s i are random effects. In this case we wish to estimate  and the variances  2 s and  2 w. Since 2s i also estimates (or predicts) the breeding value for sire i, we also wish to estimate (predict) these as well. Under a Random-effects interpretation, we write the model as y ij =  + s i + e ij,  2 (e ij ) =  2 e,  2 (s i ) =  2 s

Generalized Least Squares (GLS) Suppose the residuals no longer have the same variance (i.e., display heteroscedasticity). Clearly we do not wish to minimize the unweighted sum of squared residuals, because those residuals with smaller variance should receive more weight. Likewise in the event the residuals are correlated, we also wish to take this into account (i.e., perform a suitable transformation to remove the correlations) before minimizing the sum of squares. Either of the above settings leads to a GLS solution in place of an OLS solution.

In the GLS setting, the covariance matrix for the vector e of residuals is written as R where R ij =  (e i,e j ) The linear model becomes y = X  + e, cov(e) = R The GLS solution for  is The variance-covariance of the estimated model parameters is given by b=X T R ° 1 X ° 1 X T R ° 1 y () - - () V b = ≥ X T R ° 1 X ¥ ° 1 æ 2 e

Example One common setting is where the residuals are uncorrelated but heteroscedastic, with  2 (e i ) =  2 e /w i. For example, sample i is the mean value of n i individuals, with  2 (e i ) =  e 2 /n i. Here w i = n i Consider the model y i =  +  x i R=Diag(w ° 1 1 ;w ° 1 2 ;:::;w ° 1 n ) Ø= µ Æ Ø ∂ X= 1x x n 1 A

Here giving where - R ° 1 =Diag(w 1 ;w 2 ;:::;w n ) X T R ° 1 y=w y w xy w 1 A - - X T R ° 1 X=w 1x w x w x 2 w 1 A - w= n X i =1 w i ;x w = n X i =1 w i x i w ;x 2 w = n X i =1 w i x 2 i w y w = n X i =1 w i y i w ;xy w = n X i =1 w i x i y i w

° This gives the GLS estimators of  and  as Likewise, the resulting variances and covariance for these estimators is a=y w °bx w - b= xy w °x w y w x 2 w x 2 w - - ° æ 2 (b)= æ 2 e w(x 2 w °x 2 w ) æ(a;b)= °æ 2 e x w w(x 2 w °x 2 w ) æ 2 (a)= æ 2 e ¢x 2 w w(x 2 w x 2 w )

Chi-square and F distributions Let U i be a unit normal The sum U U … + U n 2 is a chi-square random variable with k degrees of freedom Under appropriate normality assumptions, the sums of squares that appear in linear models are also chi-square distributed. In particular, The ratio of two chi-squares is an F distribution n X i=1 (x i °x)ª¬ 2 n°1 - ~ - 2

¬ 2 k =k ¬ 2 n =n ªF k;n ~ In particular, an F distribution with k numerator degrees of freedom, and n denominator degrees of freedom is given by Thus, F distributions frequently arrive in tests of linear models, as these usually involve ratios of sums of squares.

Sums of Squares in linear models The total sums of squares SST of a linear model can be written as the sum of the error (or residual) sum of squares and the model (or regression) sum of squares SS T = SS M + SS E X (y i °y) 2 - b X (y i °y i ) 2 - X ( b y i °y) 2 ° -- r 2, the coefficient of determination, is the fraction of variation accounted for by the model r 2 = SS M SS T = 1 - SS E SS T

bb SS E = n X i=1 (y i °y i ) 2 = n X i e 2 i - SS T = n X i=1 (y i °y) 2 = n X i y 2 i °y 2 = n X i y 2 i ° 1 n 2 √ n X i y i ! ( SS T =y T y° 1 n y T Jy=y T µ I° 1 n J ∂ y - - ( ) E =y T µ I°X ≥ X T X ¥ °1 X T ∂ y - - ( ()) ) M = T ° E =y T µ X ≥ X T X ¥ °1 X T ° 1 n J ∂ y -- - ( () ) Sums of Squares are quadratic products We can write this as a quadratic product as Where J is a matrix all of whose elements are 1’s

Hypothesis testing Provided the residual errors in the model are MVN, then for a model with n observations and p estimated parameters, SS E æ 2 e ª¬ 2 n°p ~ - Consider the comparison of a full (p parameters) and reduced (q < p) models µ SS E r ° E f p°q ∂¡µ E f n°p ∂ = µ n°p p°q ∂µ E r E f °1 ∂ ( ) The difference in the error sum of squares for the full and reduced model provided a test for whether the model fit is the same This ratio follows an F p-q,n-p distribution

Does our model account for a significant fraction of the variation? µ n°p p°1 ∂µ SS T E f °1 ∂ = µ n°p p°1 ∂µ r 2 1°r 2 ∂ * ) ( Here the reduced model is just y i = u + e i In this case, the error sum of squares for the reduced model is just the total sum of squares and the F test ratio becomes This ratio follows an F p-1,n-p distribution