Summarizing Variation Michael C Neale PhD Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.

Slides:

Advertisements

Similar presentations

Chapter 3 Properties of Random Variables

Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.

Bivariate analysis HGEN619 class 2007.

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce

Summarizing Variation Matrix Algebra & Mx Michael C Neale PhD Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.

1 MF-852 Financial Econometrics Lecture 3 Review of Probability Roy J. Epstein Fall 2003.

Structural Equation Modeling

Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.

Multivariate Genetic Analysis: Introduction(II) Frühling Rijsdijk & Shaun Purcell Wednesday March 6, 2002.

The Simple Regression Model

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.

ACDE model and estimability Why can’t we estimate (co)variances due to A, C, D and E simultaneously in a standard twin design?

SIMPLE LINEAR REGRESSION

Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

Chapter 13 Introduction to Linear Regression and Correlation Analysis

David M. Evans Sarah E. Medland Developmental Models in Genetic Research Wellcome Trust Centre for Human Genetics Oxford United Kingdom Twin Workshop Boulder.

13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.

Missing Data Michael C. Neale International Workshop on Methodology for Genetic Studies of Twins and Families Boulder CO 2006 Virginia Institute for Psychiatric.

Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.

SIMPLE LINEAR REGRESSION

Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.

Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.

Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.

LECTURE 16 STRUCTURAL EQUATION MODELING.

Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.

Chapter 8: Bivariate Regression and Correlation

Path Analysis HGEN619 class Method of Path Analysis allows us to represent linear models for the relationship between variables in diagrammatic.

Regression and Correlation Methods Judy Zhong Ph.D.

SIMPLE LINEAR REGRESSION

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)

Summarizing Variation Matrix Algebra Benjamin Neale Analytic and Translational Genetics Unit, Massachusetts General Hospital Program in Medical and Population.

Karri Silventoinen University of Helsinki Osaka University.

Institute of Psychiatry King’s College London, UK

Basic Statistics Correlation Var Relationships Associations.

April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.

10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.

Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.

1 Matrix Algebra Variation and Likelihood Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics VCU International Workshop on Methodology.

Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.

Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.

Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.

David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.

Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.

CSSE463: Image Recognition Day 10 Lab 3 due Weds Lab 3 due Weds Today: Today: finish circularity finish circularity region orientation: principal axes.

Model building & assumptions Matt Keller, Sarah Medland, Hermine Maes TC21 March 2008.

Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)

MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.

Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.

QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.

March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.

Biostatistics Class 3 Probability Distributions 2/15/2000.

Inference about the slope parameter and correlation

Boulder Colorado Workshop March

Introduction to Multivariate Genetic Analysis

MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience

Path Analysis Danielle Dick Boulder 2008

Structural Equation Modeling

Introduction to Linkage and Association for Quantitative Traits

Linkage in Selected Samples

Structural Equation Modeling

Power Calculation for QTL Association

BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS

Multivariate Genetic Analysis: Introduction

Structural Equation Modeling

Presentation transcript:

Summarizing Variation Michael C Neale PhD Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University

Overview )Mean )Variance )Covariance )Not always necessary/desirable

Computing Mean  Formula E (x i )/N )Can compute with 5Pencil 5Calculator 5SAS 5SPSS 5Mx

One Coin toss 2 outcomes HeadsTails Outc ome Probab ility

Two Coin toss 3 outcomes HHHT/THTT Outc ome Probab ility

Four Coin toss 5 outcomes HHHHHHHTHHTTHTTTTTTT Outc ome Probab ility

Ten Coin toss 9 outcomes Outc ome Probab ility

Pascal's Triangle Pascal's friendChevalier de Mere 1654; Huygens 1657; Cardan /1 1/2 1/4 1/8 1/16 1/32 1/64 1/128 Frequency Probability

Fort Knox Toss Heads-Tails Gauss 1827 Series 1 Infinite outcomes

Variance )Measure of Spread )Easily calculated )Individual differences

Average squared deviation Normal distribution : xixi didi Variance = G d i 2 /N

Measuring Variation )Absolute differences? )Squared differences? )Absolute cubed? )Squared squared? Weighs & Means

Measuring Variation )Squared differences Ways & Means Fisher (1922) Squared has minimum variance under normal distribution

Covariance ) Measure of association between two variables ) Closely related to variance ) Useful to partition variance

Deviations in two dimensions :x:x :y:y                                 

:x:x :y:y  dx dy

Measuring Covariation )A square, perimeter 4 )Area 1 Area of a rectangle 1 1

Measuring Covariation )A skinny rectangle, perimeter 4 )Area.25*1.75 =.4385 Area of a rectangle

Measuring Covariation )Points can contribute negatively )Area -.25*1.75 = Area of a rectangle

Measuring Covariation Covariance Formula F = E (x i - : x )(y i - : y ) xy (N-1)

Correlation )Standardized covariance )Lies between -1 and 1 r = F xy 2 2 y x F * F

Summary Formulae : = ( E x i )/N F x = E (x i - :)/(N-1) 2 2 r = F xy 2 2 y x F * F F xy = E (x i -: x )(y i -: y )/(N-1)

Variance covariance matrix Several variables Var(X) Cov(X,Y) Cov(X,Z) Cov(X,Y) Var(Y) Cov(Y,Z) Cov(X,Z) Cov(Y,Z) Var(Z)

Conclusion )Means and covariances )Conceptual underpinning )Easy to compute )Can use raw data instead

Biometrical Model of QTL m d +a-a

Biometrical model for QTL Diallelic locus A/a with p as frequency of a

Classical Twin Studies )Summary: rmz & rdz )Basic model: A C E )rmz = A + C )rdz =.5A + C )var = A + C + E )Solve equations Information and analysis

Contributions to Variance )Additive QTL variance 5VA = 2p(1-p) [ a - d(2p-1) ]2 )Dominance QTL variance 5VD = 4p2 ( 1- p) 2 d2 )Total Genetic Variance due to locus  VQ = V A + VD Single genetic locus

Origin of Expectations )P = aA + cC + eE )Standardize A C E )V P = a 2 + c 2 + e 2 )Assumes A C E independent Regression model

Path analysis )Two sorts of variable 5Observed, in boxes 5Latent, in circles )Two sorts of path 5Causal (regression), one-headed 5Correlational, two-headed Elements of a path diagram

Rules of path analysis )Trace path chains between variables )Chains are traced backwards, then forwards, with one change of direction at a double headed arrow )Predicted covariance due to a chain is the product of its paths )Predicted total covariance is sum of covariance due to all possible chains

ACE model MZ twins reared together

ACE model DZ twins reared together

ACE model DZ twins reared apart

Model fitting )Takes care of replicate statistics )Maximum likelihood estimates )Confidence intervals on parameters )Overall fit of model )Comparison of nested models

Fitting models to covariance matrices )MZ covariances 53 statistics V1 CMZ V2 )DZ covariances 53 statistics V1 CDZ V2 )Parameters: a c e )Df = nstat - npar = = 3

Model fitting to covariance matrices )Inherently compares fit to saturated model )Difference in fit between A C E model and A E model gives likelihood ratio test with df = difference in number of parameters

Confidence intervals )Two basic forms 5covariance matrix of parameters 5likelihood curve )Likelihood-based has some nice properties; squares of CIs on a give CI's on a 2 Meeker & Escobar 1995; Neale & Miller, Behav Genet 1997

Multivariate analysis )Comorbidity 5Partition into relevant components 5Explicit models 5One disorder or two or three )Longitudinal data analysis 5Partition into new/old 5Explicit models 5Markov 5Growth curves

Cholesky Decomposition )Provides a way to model covariance matrices )Always fits perfectly )Doesn't predict much else Not a model

Perverse Universe A E.7 P NOT!

Perverse Universe A E Y X r(X,Y)=0; Problem for almost any multivariate method

Analysis of raw data )Awesome treatment of missing values )More flexible modeling 5Moderator variables 5Correction for ascertainment 5Modeling of means )QTL analysis

Technicolor Likelihood Function For raw data in Mx j=1 ln L i = f i 3 ln [w j g(x i,: ij, G ij )] m x i - vector of observed scores on n subjects :ij - vector of predicted means Gij - matrix of predicted covariances - functions of parameters

Pihat Linkage Model for Siblings Each sib pair i has different COVARIANCE

Mixture distribution model Each sib pair i has different set of WEIGHTS p(IBD=2) x P(LDL1 & LDL2 | rQ = 1 ) p(IBD=1) x P(LDL1 & LDL2 | rQ =.5 ) p(IBD=0) x P(LDL1 & LDL2 | rQ = 0 ) Total likelihood is product of weighted likelihoods rQ=1rQ=.5 rQ=.0 weight j x Likelihood under model j

Conclusion )Model fitting has a number of advantages )Raw data can be analysed with greater flexibility )Not limited to continuous normally distributed variables

Conclusion II )Data analysis requires creative application of methods )Canned analyses are of limited use )Try to answer the question!