Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
The General Linear Model. The Simple Linear Model Linear Regression.
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Chapter 10 Simple Regression.
Quantitative Genetics
Different chi-squares Ulf H. Olsson Professor of Statistics.
Modern Navigation Thomas Herring
Linkage Analysis in Merlin
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin.
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:
Introduction to QTL analysis Peter Visscher University of Edinburgh
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
A Statistical Method for Adjusting Covariates in Linkage Analysis With Sib Pairs Colin O. Wu, Gang Zheng, JingPing Lin, Eric Leifer and Dean Follmann Office.
Linkage and association Sarah Medland. Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Machine Learning 5. Parametric Methods.
Tutorial I: Missing Value Analysis
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture 22: Quantitative Traits II
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Introduction to Genetic Theory
Summary of the Statistics used in Multiple Regression.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
Øyvind Langsrud New Challenges for Statistical Software - The Use of R in Official Statistics, Bucharest, Romania, 7-8 April 1 A variance estimation R.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Power in QTL linkage analysis
Regression Models for Linkage: Merlin Regress
Probability Theory and Parameter Estimation I
Genome Wide Association Studies using SNP
QTL Mapping in Humans Lon Cardon, Stacey Cherny Goncalo Abecasis
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Linkage in Selected Samples
Statistical Assumptions for SLR
OVERVIEW OF LINEAR MODELS
What are BLUP? and why they are useful?
OVERVIEW OF LINEAR MODELS
Lecture 9: QTL Mapping II: Outbred Populations
Univariate Linkage in Mx
Power Calculation for QTL Association
Presentation transcript:

Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis

This Session Quantitative Trait Linkage Analysis Variance Components Haseman-Elston An improved regression based method General pedigrees Non-normal data Example application PEDSTATS MERLIN-REGRESS

Simple regression-based method squared pair trait difference proportion of alleles shared identical by descent (HE-SD) (X – Y) 2 = 2(1 – r) – 2Q(  – 0.5) +  ^

Haseman-Elston regression IBD (X - Y) 2  = -2Q 210

Sums versus differences Wright (1997), Drigalenko (1998) phenotypic difference discards sib-pair QTL linkage information squared pair trait sum provides extra information for linkage independent of information from HE-SD (HE-SS) (X + Y) 2 = 2(1 + r) + 2Q(  – 0.5) +  ^

New dependent variable to increase power mean corrected cross-product (HE-CP) But this was found to be less powerful than original HE when sib correlation is high

Variance Components Analysis

Likelihood function

Linkage

No Linkage

The Problem Maximum likelihood variance components linkage analysis Powerful (Fulker & Cherny 1996) but Not robust in selected samples or non-normal traits Conditioning on trait values (Sham et al 2000) improves robustness but is computationally challenging Haseman-Elston regression More robust but Less powerful Applicable only to sib pairs

Aim To develop a regression-based method that Has same power as maximum likelihood variance components, for sib pair data Will generalise to general pedigrees

Extension to General Pedigrees Multivariate Regression Model Weighted Least Squares Estimation Weight matrix based on IBD information

Switching Variables To obtain unbiased estimates in selected samples Dependent variables = IBD Independent variables = Trait

Dependent Variables Estimated IBD sharing of all pairs of relatives Example:

Independent Variables Squares and cross-products (equivalent to non-redundant squared sums and differences) Example

Covariance Matrices Dependent Obtained from prior (p) and posterior (q) IBD distribution given marker genotypes

Covariance Matrices Independent Obtained from properties of multivariate normal distribution, under specified mean, variance and correlations Assuming the trait has mean zero and variance one. Calculating this matrix requires the correlation between the different relative pairs to be known.

Estimation For a family, regression model is Estimate Q by weighted least squares, and obtain sampling variance, family by family Combine estimates across families, inversely weighted by their variance, to give overall estimate, and its sampling variance

Average chi-squared statistics: fully informative marker NOT linked to 20% QTL Average chi-square Sibship size N=1000 individuals Heritability=0.5 10,000 simulations

Average chi-squared statistics: fully informative marker linked to 20% QTL Average chi-square Sibship size N=1000 individuals Heritability= simulations

Average chi-squared statistics: poorly informative marker NOT linked to 20% QTL Average chi-square Sibship size N=1000 individuals Heritability=0.5 10,000 simulations

Average chi-squared statistics: poorly informative marker linked to 20% QTL Average chi-square Sibship size N=1000 individuals Heritability= simulations

Average chi-squares: selected sib pairs, NOT linked to 20% QTL Selection scheme Average chi-square 20,000 simulations 10% of 5,000 sib pairs selected

Average chi-squares: selected sib pairs, linkage to 20% QTL Selection scheme Average chi-square 2,000 simulations 10% of 5,000 sib pairs selected

Mis-specification of the mean, 2000 random sib quads, 20% QTL ="Not linked, full"

Mis-specification of the covariance, 2000 random sib quads, 20% QTL ="Not linked, full"

Mis-specification of the variance, 2000 random sib quads, 20% QTL ="Not linked, full"

Cousin pedigree

Average chi-squares for 200 cousin pedigrees, 20% QTL Poor marker informationFull marker information REGVCREGVC Not linked Linked

Conclusion The regression approach can be extended to general pedigrees is slightly more powerful than maximum likelihood variance components in large sibships can handle imperfect IBD information is easily applicable to selected samples provides unbiased estimate of QTL variance provides simple measure of family informativeness is robust to minor deviation from normality But assumes knowledge of mean, variance and covariances of trait distribution in population

Example Application: Angiotensin Converting Enzyme British population Circulating ACE levels Normalized separately for males / females 10 di-allelic polymorphisms 26 kb Common In strong linkage disequilibrium Keavney et al, HMG, 1998

Check The Data The input data is in three files: keavney.dat keavney.ped keavney.map These are text files, so you can peek at their contents, using more or notepad A better way is to used pedstats …

Pedstats Checks contents of pedigree and data files pedstats –d keavney.dat –p keavney.ped Useful options: --pairStatisticsInformation about relative pairs --pdfProduce graphical summary --hardyWeinbergCheck markers for HWE --minGenos 1Focus on genotyped individuals What did you learn about the sample?

Regression Analysis MERLIN-REGRESS Requires pedigree (.ped), data (.dat) and map (.map) file as input Key parameters: --mean, --variance Used to standardize trait --heritability Use to predicted correlation between relatives Heritability for ACE levels is about 0.60

MERLIN-REGRESS Identify informative families --rankFamilies Customizing models for each trait -t models.tbl TRAIT, MEAN, VARIANCE, HERITABILITY in each row Convenient options for unselected samples: --randomSample --useCovariates --inverseNormal

The End