OVERVIEW OF LINEAR MODELS

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Copula Regression By Rahul A. Parsa Drake University &
The Simple Regression Model
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
GENERAL LINEAR MODELS: Estimation algorithms
PBG 650 Advanced Plant Breeding Module 9: Best Linear Unbiased Prediction – Purelines – Single-crosses.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Violations of Assumptions In Least Squares Regression.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Lecture 10A: Matrix Algebra. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns)
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Review of Lecture Two Linear Regression Normal Equation
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
1 G Lect 4W Multiple regression in matrix terms Exploring Regression Examples G Multiple Regression Week 4 (Wednesday)
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Regression Overview. Definition The simple linear regression model is given by the linear equation where is the y-intercept for the population data, is.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Linear Regression Modelling
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Regression.
Evgeniya Anatolievna Kolomak, Professor
ECONOMETRICS DR. DEEPTI.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Kalman Filtering: Control with Limited/Noisy Measurements
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
G Lecture 6 Multilevel Notation; Level 1 and Level 2 Equations
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
The regression model in matrix form
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Statistical Assumptions for SLR
Linear regression Fitting a straight line to observations.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
What are BLUP? and why they are useful?
5.2 Least-Squares Fit to a Straight Line
Chapter 4, Regression Diagnostics Detection of Model Violation
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Principles of the Global Positioning System Lecture 11
Fixed, Random and Mixed effects
Multivariate Linear Regression
OVERVIEW OF LEAST-SQUARES, MAXIMUM LIKELIHOOD AND BLUP PART 2
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Linear Regression Summer School IFPRI
The Basic Genetic Model
Presentation transcript:

OVERVIEW OF LINEAR MODELS Consider the following linear model y=X𝜷 + e y= vector of observations 𝜷 = vector of parameters of the fixed effects e= vector of random residuals X= design (incidence) matrix that relates observations to fixed effects.

ORDINARY LEAST SQUARES (OLS) Assumes the residuals are homoscedastic and uncorrelated, for all i and for all i ≠ j. For the vector of residuals is The OLS estimates of 𝜷 is the b vector that minimizes the residual sum of squares The unweighted sum of squared of residuals is minimized

ORDINARY LEAST SQUARES (OLS) Taking derivatives, the estimates are If the residuals follow a multivariate normal distribution with then he OLS estimates are also the maximum likelihood estimates (ML)

ORDINARY LEAST SQUARES (OLS) If is singular those estimates still hold when a generalized inverse is used However, only certain linear combinations of fixed effects can be estimated.

GENERALIZED LEAST SQUARES (GLS) When residuals errors are heteroscedastic and/or correlated, OLS estimates of regression parameters and standard errors are potentially biased A more general regression analysis use the covariance matrix of the vector or residuals as Lack if independence of the residuals is indicated by the presence of non-zero off-diagonal elements of R and heteroscedasticity is indicated by differences in the diagonal elements of R

GENERALIZED LEAST SQUARES (GLS) Weighted least squares takes these complications into account

Best Linear Unbiased Predictor BLUP BLP assumes that the fixed effects are known when in practice they are never known and must be estimated from the data. The example of fixed effects in plant and animal breeding are spurious effects associated with blocks, locations, year, treatment, etc. However, some genetic effects such as selection generation, varieties, and seed source may also be assumed fixed. BLUP simultaneously estimate the fixed effects and the breeding values (random effects).

Best Linear Unbiased Predictor Henderson (1949) developed the theory of BLUPs by which fixed effects and breeding values can be simultaneously estimated. The properties of BLUP are similar to those of BLP and BLUP reduces to BLP when no adjustment for environmental factors are needed. The properties of BLUP are incorporated in the name -- BLUP.

Best Linear Unbiased Predictor Best – it maximizes the correlation between the true (a) and the predicted (â) breeding value. Linear – predictor are linear function of observations. Unbiased – E(â)= a Predictor – The prediction of the true breeding value –

THE GENERAL MIXED MODEL Best Linear Unbiased Predictor Consider the following linear mixed model y=X𝜷 + Zu + e y= vector of observations 𝜷 = vector of levels of fixed effects u= vector of levels or random effects e= vector of random residuals X= design (incidence) matrix that relates observations to fixed effects. Z=design (incidence) matrix that relates observations to random effects.

Best Linear Unbiased Predictor Expectation of u and e By definition E(u)=E(e)=0 and E(y)=X𝜷 and Variance of u and e Var(e)=I2e=R assumed i.i.d and include random environmental and non-additive genetic effects. Var(u)=A2e=G where A is the numerical relationship matrix. Covariance between u and e Cov(e,u)=Cov(a,u)=0

Best Linear Unbiased Predictor Expectation of y E(y)=X𝜷 Variance of y Var(y)=V=Var(Zu + e)=ZVar(u)Z/ +Var(e)+ Cov(Zu,e)+Cov(e,Zu) = ZGZ/ + R + ZCov(u,e) + Cov(e,u)Z/ Since Cov(e,u)=Cov(u,e)=0 then V=ZGZ/ +R

Best Linear Unbiased Predictor Covariance between (y,u) and (y,e) Cov(y,u)=Cov(Zu+e,u)=Cov(Zu,u)+Cov(e,u) =ZCov(u,u)=ZG Cov(y,e)=Cov(Zu+e,e)=Cov(Zu,u)+Cov(e,e) =ZCov(u,e)+Cov(e,e) = R

Best Linear Unbiased Predictor The problem with y=X𝜷 + Zu + e is to predict a linear function of 𝜷 and u The predictor is selected such that is unbiased and has minimum prediction error variance (PEV)

Best Linear Unbiased Predictor This minimization leads to the BLUP of u and The BLUE is the generalized least square solution (GLS) for BLUE is the estimate of the linear functions of fixed effects, that has minimum sampling variance, is unbiased and is based on linear function of the data.

Best Linear Unbiased Predictor The BLUP is similar to the conditional expectation of u given y under the assumption of MVN As noted, the practical application of the expression of BLUE and BLUP require that the variance components be known. Thus, prior to the BLUP analysis the variance components need to be estimated by ML or REML.

Best Linear Unbiased Predictor Note that the solution to the BLUE and the BLUP requires the inverse of the covariance matrix V. When y has many thousands of observations as is commonly the case in animal and plant breeding, the computation of V-1 can be very difficult. Henderson offers a solution by proposing a more compact method for jointly obtaining the and the in he MME

MME Advantages Matrices to invert R and G are trivial if they are diagonal and thus submatrices in the MME are easier to compute than V-1 Dimensionality of the matrices on the left, needed to get the solution are much less dimension than the dimension of matrix V

MME If R-1 is an identity matrix it can be factorized from both sides of the MME such that MME may not be of full rank due to dependency in the matrix for fixed environmental effects. It may be necessary to set some levels of the fixed effects to zero When there is dependency to obtain solutions to the MME

Assumption for y=X𝜷 + Zu + e Distributions of y, u, and e are MVN implying the traits are determined by many additive genes of infinitesimal effects at many infinitely unlinked loci (infinitesimal model). The variance-covariance R and G for the base population are assumed to be known. In practice are never known but assuming an infinitesimal model they can be estimated by REML. MME can account for selection.

The generalized inverse of the coefficient matrix of the MME ---- Accuracy of evaluation --- Sampling variance of 𝜷 and prediction error of u The generalized inverse of the coefficient matrix of the MME provides information on the Sampling variance of 𝜷 Prediction error variance of u

Sampling variance of 𝜷 =V( 𝜷 −𝜷)=C11 𝜎 𝑒 2 ---- Accuracy of evaluation --- Sampling variance of 𝜷 and prediction error of u Sampling variance of 𝜷 =V( 𝜷 −𝜷)=C11 𝜎 𝑒 2 Variance of the prediction error (PEV) of u =V( u −u)=C22 𝜎 𝑒 2