Introduction to QTL analysis Peter Visscher University of Edinburgh

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Planning breeding programs for impact
Qualitative and Quantitative traits
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Linkage analysis: basic principles Manuel Ferreira & Pak Sham Boulder Advanced Course 2005.
Genetic Analysis.
Human Genetics Genetic Epidemiology.
QTL Mapping R. M. Sundaram.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Quantitative Genetics
1 Bojan Basrak Department of Mathematics, University of Zagreb, Croatia EVA 2005, Gothenburg EXTREME VALUES, COPULAS AND GENETIC MAPPING.
Quantitative Genetics
Linkage Analysis in Merlin
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:
Multifactorial Traits
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Non-Mendelian Genetics
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Whole genome approaches to quantitative genetics Leuven 2008.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Recombination and Linkage
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Why you should know about experimental crosses. To save you from embarrassment.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Introduction to Genetic Theory
Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Power in QTL linkage analysis
Regression Models for Linkage: Merlin Regress
Recombination (Crossing Over)
Genes may be linked or unlinked and are inherited accordingly.
Regression-based linkage analysis
Linkage in Selected Samples
Power to detect QTL Association
Mapping Quantitative Trait Loci
MULTIPLE GENES AND QUANTITATIVE TRAITS
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Lecture 10: QTL Mapping II: Outbred Populations
Lecture 9: QTL Mapping II: Outbred Populations
IBD Estimation in Pedigrees
Chapter 7 Beyond alleles: Quantitative Genetics
Power Calculation for QTL Association
Presentation transcript:

Introduction to QTL analysis Peter Visscher University of Edinburgh

Overview Principles of QTL mapping QTL mapping using sibpairs IBD estimation from marker data Improving power –ML variance components –Selective genotyping –Large(r) pedigrees

t [Fisher, Wright] Quantitative Trait Locus = a segment of DNA that affects a quantitative trait

Mapping QTL Determining the position of a locus causing variation in the genome. Estimating the effect of the alleles and mode of action.

Why map QTL ? To provide knowledge towards a fundamental understanding of individual gene actions and interactions To enable positional cloning of the gene To improve breeding value estimation and selection response through marker assisted selection (plants, animals) Science; Medicine; Agriculture

Principles of QTL mapping Co-segregation of QTL alleles and linked marker alleles in pedigrees Unobserved QTL alleles q m Q M Observed marker alleles pair of chromosomes

Linkage = Co-segregation A2A4A2A4 A3A4A3A4 A1A3A1A3 A1A2A1A2 A2A3A2A3 A1A2A1A2 A1A4A1A4 A3A4A3A4 A3A2A3A2 Marker allele A 1 cosegregates with dominant disease

Recombination A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 Likely gametes (Non-recombinants) Unlikely gametes (Recombinants) Parental genotypes “Linkage analysis = counting recombinants"

Map distance Map distance between two loci (Morgans) = Expected number of crossovers per meiosis Note: Map distances are additive. Recombination frequencies are not. 1 Morgan = 100 cM; 1 cM ~ 1 Mb

Recombination & map distance Haldane (1919) Map Function

Principles of QTL mapping Co-segregation of phenotypes and genotypes in pedigrees –Genetic markers give information on IBD sharing between relatives [genotypes] –Association between phenotypes and genotypes gives information on QTL location and effect [linkage] Need informative mapping population

Mapping populations

Informative pig pedigree X © Roslin Institute QQqq QQQqqq

Line cross Only two QTL alleles segregating QTL effect can be estimated as the mean difference between genotype groups Power depends on sample size & effect of QTL Ascertain divergent lines Resolution of QTL map is low: ~10-40 Mb

 =0.0001, power = 90%, F 2 population

Outbred populations: Complications  Markers not fully informative (segregating in the parental generation)  QTL not segregating in all families  (All F 1 segregate in inbred line cross)  Association between marker and QTL at the family rather than population level  (i.e. linkage phase differs between families)  Additional variance between families due to other loci

Line cross vs. outbred population CrossOutbred # QTL alleles2  2 # Generations3  2 Required sample size100s1000s QTL EstimationMeanVariance

QTL as a random effect y i =  +Q i +A i +E i Q i =QTL genotype contribution for chrom. segment A i =Contribution from rest of genome var(y)=  q 2 +  a 2 +  e 2

Logical extension of linear models used during the course This week: partitioning (co)variances into (causal) components QTL mapping: partitioning genetic variance into underlying components –Linkage analysis: dissecting within-family genetic variation

Genetic covariance between relatives cov(y i,y j )=  ij  q 2 +a ij  a 2 a ij =average prop. of alleles shared in the genome (kinship matrix)  ij =proportion of alleles IBD at QTL (0, ½ or 1) E(  ij )= a ij

  ij = Pr(2 alleles IBD) + ½Pr(1 allele IBD) = proportion of alleles IBD in non- inbred pedigree Estimate  ij with genetic markers

Fully informative marker Determine IBD sharing between sibpairs unambiguously Example: Dad = 1/2 Mum= 3/4 –Transmitted allele from Dad is either 1 or 2 –Transmitted allele from Mum is either 3 or 4

Sibpairs & fully informative marker # Alleles IBD  Pr. 00¼ 1½ ½ 21¼ E(  ) =   Pr(  ) = ½ E(  2 ) =   2 Pr(  ) = 3 / 8 var(  )= E(  2 ) – E(  ) 2 = 1 / 8 CV = 0.5  2 = 70%

Haseman-Elston (1972) “The more alleles pairs of relatives share at a QTL, the greater their phenotypic similarity” or “The more alleles they share IBD, the smaller the difference in their phenotype”

Population sib-pair trait distribution

No linkage

Under linkage

Sib pair (or DZ twins) design to map QTL Multiple ‘families’ of two (or more) sibs Phenotypes on sibs Marker genotypes on sibs (& parents) Correlate phenotypes and genotypes of sibs

Data structure is simple PairPhenotypesProp. alleles IBD 1y 11 y 12  1 2y 21 y 22  ny n1 y n2  n  =0, ½ or 1 for fully informative markers

Notation Y D=(y 1 – y 2 ) D 2 =(y 1 – y 2 ) 2 S=[(y 1 –  ) + (y 2 –  )] S 2 =[(y 1 –  ) + (y 2 –  )] 2 CP=(y 1 –  )(y 2 –  )

Proposed analysis…... DataMethodReference y 1 & y 2 ML ‘LOD’Parametric linkage analysis D 2 RegressionHaseman & Elston (1972) D 2 & S 2 RegressionDrigalenko (1998) Xu et al. (2000); Sham & Purcell (2001); Forrest (2001) CPRegressionElston et al. (2000) y 1 & y 2 ML VCGoldgar (1990); Schork (1993) DMLKruglyak & Lander (1995) D & SML VCFulker & Cherny (1996); Wright (1997)

Properties of squared differences E(Y 1 – Y 2 ) 2 = var(Y 1 – Y 2 ) + (E(Y 1 – Y 2 )) 2 var(Y 1 – Y 2 ) = var(Y 1 ) + var(Y 2 ) -2cov(Y 1,Y 2 ) If E(Y i ) = 0 and var(Y 1 )=var(Y 2 ), then E(Y 1 – Y 2 ) 2 = 2(1-r)var(Y)

Haseman-Elston method Phenotype on relative pair j: Y j =(y 1j -y 2j ) 2 E(Y i )=E[(Q 1j - Q 2j + A 1j - A 2j + (e 1j - e 2j ) 2 ] =E[(Q 1j - Q 2j ) 2 ] + {2(1-a ij )  a  e 2 } =2[  q 2 - cov(Q 1j,Q 2j )] + {   2 } =(2  q 2 +   2 ) - 2  jt  q 2  jt =proportion of alleles IBD at QTL (trait, t) for relative pair j

Conditional expectation E(Y j |  jt )=(2  q 2 +   2 )-  jt 2  q 2 negative slope of Y on  if  q 2 > 0 estimate  jt from marker data (  jm ) use simple linear regression to detect QTL: E(Y j |  jm )=  +  jm

A significant negative slope indicates linkage to a QTL

Single fully informative marker  =-2(1 - 2r) 2  q 2 (1 - 2r) 2  q 2 term is analogous to variance explained by a single marker in a backcross/F 2 design  =2[1 - 2(1-r)r]  q 2 +   2 r=recombination fraction between marker & QTL Statistical test:  = 0 versus  < 0 Disadvantage of method –not powerful –confounding between QTL location and effect

Interval mapping for sibpair analysis (Fulker & Cardon, 1994) Estimate  jt from IBD status at flanking markers Allows genome screen, separating effect & location –regression with largest R 2 indicates map position of QTL

Example from Cardon et al. (1994) [Lynch & Walsh, page 520]

Calculating  jt |  jm For  jt midway between two flanking markers:  jt ~r 2 /c + ½[(1 - 2r)/c]  jm1 + ½[(1 - 2r)/c]  jm2 c=1-2r+2r 2 r=recombination fraction between markers  jmk =  jm at flanking marker k Assumption: flanking markers are fully informative

Examples rc  jt /25(2/34) + (15/34)  jm1 + (15/34)  jm2 [if  jm1 and  jm2 are 1,  jt = 32/34 < 1]

Exercise Calculate  jt for a location midway between two markers that are 30 cM apart, when the proportion of alleles shared at the flanking markers are 1.0 and 0.5. Use the Haldane mapping function to calculate the recombination rate between the markers.  jm1 = 1,  jm2 = 0.5

Extensions to Haseman-Elston method Interval mapping Alternative models –QTL with dominance Other methods to estimate  jt –Using all markers on a chromosome ( Merlin ) –Monte Carlo sampling methods –Using both markers info & phenotypic info Add linkage information from: –Z j =[(y 1j -  ) + (y 2j -  )] 2

Power = 90%. Type-I error = 10 -5

Estimating  when marker is not fully informative Using: –Mendelian segregation rules –Marker allele frequencies in the population

IBD can be trivial… / 22 / 2 / 2 / IBD=0

Two Other Simple Cases… / 2 / 2 / 11 / 112 / 2 / IBD=2 22 / 22 /

A little more complicated… 12 / IBD=1 (50% chance) 22 / 12 / 12 / IBD=2 (50% chance)

And even more complicated… 11 / IBD=? 11 /

Bayes Theorem for IBD Probabilities prior Prob(data) posterior

P(Marker Genotype|IBD State) [Assumes Hardy-Weinberg proportions of genotypes in the population]

Worked Example 11 / 11 /

Exercise 12 / 12 /

Using multiple markers Mendelian segregation rules Marker allele frequencies in the population Linkage between markers Efficient multi-marker (multi-point) algorithms available (e.g., Merlin, Genehunter )

Software for QTL analysis of sibpairs Mx Merlin Genehunter S.A.G.E. ($) QTL Express (regression) Solar (complex pedigrees) Lots of others…

George Seaton, Sara Knott, Chris Haley, Peter Visscher Roslin Institute University of Edinburgh QTL Express: User-friendly web-based software to map QTL in outbred populations

Conclusions (sibpairs) Power of sib pair design is low –more relative pairs needed more contrasts e.g. extended pedigrees selective genotyping –extreme phenotypes are most informative for linkage –more powerful analysis methods ML variance component analysis

Maximum likelihood for sibpairs (assuming bivariate normality |  & fully informative marker) Full model: -2ln(L) =  n  ln|V  | +  (y-  )V  -1 (y-  ) V  =f 2 + q 2 + r 2 f 2 +  q 2 f 2 +  q 2 f 2 + q 2 + r 2

Maximum likelihood Reduced model: -2ln(L)=nln|V|+(y-  )V -1 (y-  ) V=f 2 + r 2 f 2 f 2 f 2 + r 2

Test statistic LRT = 2ln(ML full ) - 2ln(ML reduced ) H 0 (q 2 =0): LRT ~ ½  2 (1) + ½(0)

[Fisher et al. 1999] Example: QTL analysis for dyslexia on chromosome 6p using sib-pairs Phenotype: Irregular word test 181 sib-pairs ~15 Mb

or distribution approach in analysis? Expectation approach: use Distribution approach: use IBD probabilities and mixture distribution

Selective genotyping & sibpairs Concordant pairs –both sibs in upper or lower tail of the phenotypic distribution Discordant pairs –one sib in upper tail, other in lower tail Powerful design –requires many (cheap) phenotypes

Anxiety QTLs [Fullerton et al. 2003] Selection from ~30,000 sibpairs

Results [Fullerton et al. 2003] ~5 QTLs detected

Variance component analysis in complex pedigrees Partition observed variation in quantitative traits into causal components, e.g., –Polygenic –Common environment (‘household’) –QTL –Residual, including measurement error IBD proportions (  ) estimated from multiple markers “ACEQ” model

[Blackwood et al. 1996] Bipolar pedigree

Blackwood et al. (1996) data

Example: QTL analysis for BMI using a complex pedigree [Deng et al. 2002]