Regression Models for Linkage: Merlin Regress

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
The Simple Linear Regression Model: Specification and Estimation
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Quantitative Genetics
Different chi-squares Ulf H. Olsson Professor of Statistics.
Linkage Analysis in Merlin
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin.
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:
Introduction to QTL analysis Peter Visscher University of Edinburgh
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Estimation in Marginal Models (GEE and Robust Estimation)
A Statistical Method for Adjusting Covariates in Linkage Analysis With Sib Pairs Colin O. Wu, Gang Zheng, JingPing Lin, Eric Leifer and Dean Follmann Office.
Linkage and association Sarah Medland. Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Tutorial I: Missing Value Analysis
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Introduction to Genetic Theory
Summary of the Statistics used in Multiple Regression.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Power in QTL linkage analysis
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Genome Wide Association Studies using SNP
Can resemblance (e.g. correlations) between sib pairs, or DZ twins, be modeled as a function of DNA marker sharing at a particular chromosomal location?
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
QTL Mapping in Humans Lon Cardon, Stacey Cherny Goncalo Abecasis
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Linkage in Selected Samples
Power to detect QTL Association
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
Statistical Assumptions for SLR
Why general modeling framework?
OVERVIEW OF LINEAR MODELS
What are BLUP? and why they are useful?
Pattern Recognition and Machine Learning
OVERVIEW OF LINEAR MODELS
Multivariate Linear Regression
Lecture 9: QTL Mapping II: Outbred Populations
IBD Estimation in Pedigrees
Univariate Linkage in Mx
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
3 basic analytical tasks in bivariate (or multivariate) analyses:
The Basic Genetic Model
Presentation transcript:

Regression Models for Linkage: Merlin Regress Pak Sham, Shaun Purcell, Stacey Cherny, Goncalo Abecasis

Problem with VC linkage analysis Maximum likelihood variance components linkage analysis Powerful but Prone to give false positives in selected samples or non-normal traits Ascertainment correction may allow VC to be applied to selected samples Conditioning on trait values deals with both selection and non-normality, but is computationally intensive in large pedigrees

(HE-SD) (X – Y)2 = 2(1 – r) – 2Q( – 0.5) +  Simple regression-based method squared pair trait difference proportion of alleles shared identical by descent (HE-SD) (X – Y)2 = 2(1 – r) – 2Q( – 0.5) +  ^ Suitable for selected samples, and robust to normality BUT: Less powerful than VC linkage analysis

Why is HE regression less powerful? Wright (1997), Drigalenko (1998) phenotypic difference discards sib-pair QTL linkage information squared pair trait sum (mean-corrected) provides extra information for linkage (HE-SS) (X + Y)2 = 2(1 + r) + 2Q( – 0.5) +  ^

New dependent variable to increase power cross-product (mean-corrected) (HE-CP) But this was found to be less powerful than original HE when sib correlation is high

Clarify the relative efficiencies of existing HE methods Demonstrate equivalence between a new HE method and variance components methods Show application to the selection and analysis of extreme, selected samples

NCPs for H-E regressions NCP per sibpair 8(1 – r )2 8(1 + r )2 1 + r2 Variance (X – Y)2 (X + Y)2 XY Dependent

Combining into one regression New dependent variable : a linear combination of squared-sum squared-difference Inversely weighted by their variances:

Weighted H-E A function of Equivalent to variance components square of QTL variance marker informativeness complete information: Var( )=1/8 sibling correlation Equivalent to variance components to second-order approximation Rijsdijk et al (2000)

Sample selection A sib-pairs’ squared mean-corrected DV is proportional to its expected NCP Equivalent to variance-components based selection scheme Purcell et al, (2000)

Information for linkage -4 -3 -2 -1 1 2 3 4 Sib 1 trait Sib 2 trait 0.2 0.4 0.6 0.8 1.2 1.4 1.6 Sibship NCP

Selected samples 5% most informative pairs selected r = 0.05 r = 0.60

Extension to General Pedigrees Multivariate Regression Model Weighted Least Squares Estimation Weight matrix based on IBD information

Switching Variables To obtain unbiased estimates in selected samples Dependent variables = IBD Independent variables = Trait

Dependent Variables Estimated IBD sharing of all pairs of relatives Example:

Independent Variables Squares and cross-products (mean-corrected) (equivalent to non-redundant squared sums and differences) Example

Covariance Matrices Dependent Obtained from prior (p) and posterior (q) IBD distribution given marker genotypes Handles imperfect marker information

Covariance Matrices Independent Obtained from properties of multivariate normal distribution, under specified mean, variance and correlations

Estimation For a family, regression model is Estimate Q by weighted least squares, and obtain sampling variance, family by family Combine estimates across families, inversely weighted by their variance, to give overall estimate, and its sampling variance

Implementation MERLIN-REGRESS Requires pedigree (.ped), data (.dat) and map (.map) files as input Key parameters: --mean, --variance Used to standardize trait --heritability Use to predict correlations between relatives

Mis-specification of the mean, 500 random sib pairs, 20% QTL ="Not linked, full" Average chi-square Mean

MERLIN-REGRESS Features Identifies informative families --rankFamilies Provides measure for marker information content at each location Option for analyzing repeated measurements --testRetest Customizing models for each trait -t models.tbl TRAIT, MEAN, VARIANCE, HERITABILITY in each row Convenient options for unselected samples: --randomSample --useCovariates --inverseNormal

The End