Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001
Linkage Mapping Compares inheritance pattern of trait with the inheritance pattern of chromosomal regions First gene-mapping in 1913 (Sturtevant) Uses naturally occurring DNA variation (polymorphisms) as genetic markers >400 Mendelian (single gene) disorders mapped Current challenge is to map QTLs
Linkage = Co-segregation A2A4A2A4 A3A4A3A4 A1A3A1A3 A1A2A1A2 A2A3A2A3 A1A2A1A2 A1A4A1A4 A3A4A3A4 A3A2A3A2 Marker allele A 1 cosegregates with dominant disease
Recombination A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 Likely gametes (Non-recombinants) Unlikely gametes (Recombinants) Parental genotypes
Recombination of three linked loci (1- 1 )(1- 2 ) 1 2 (1- 1 ) 2 1 (1- 2 ) 1212
Map distance Map distance between two loci (Morgans) = Expected number of crossovers per meiosis Note: Map distances are additive
Recombination & map distance Haldane map function
Methods of Linkage Analysis Model-based lod scores Assumes explicit trait model Model-free allele sharing methods Affected sib pairs Affected pedigree members Quantitative trait loci Variance-components models
Double Backcross : Fully Informative Gametes AaBb aabb AABB aabb AaBbaabb Aabb aaBb Non-recombinantRecombinant
Linkage Analysis : Fully Informative Gametes Count DataRecombinant Gametes: R Non-recombinant Gametes: N ParameterRecombination Fraction: LikelihoodL( ) = R (1- ) N Parameter Chi-square
Phase Unknown Meioses AaBb aabb AaBbaabb Aabb aaBb Non-recombinantRecombinant Non-recombinant Either : Or :
Linkage Analysis : Phase-unknown Meioses Count DataRecombinant Gametes: X Non-recombinant Gametes: Y orRecombinant Gametes: Y Non-recombinant Gametes: X LikelihoodL( ) = X (1- ) Y + Y (1- ) X An example of incomplete data : Mixture distribution likelihood function
Parental genotypes unknown Likelihood will be a function of allele frequencies (population parameters) (transmission parameter) AaBbaabb Aabb aaBb
Trait phenotypes Penetrance parameters Genotype Phenotype f2f2 AA aa Aa Disease Normal f1f1 f0f0 1- f 2 1- f 1 1- f 0 Each phenotype is compatible with multiple genotypes.
General Pedigree Likelihood Likelihood is a sum of products (mixture distribution likelihood) number of terms = (m 1, m 2 …..m k ) 2n where m j is number of alleles at locus j
Elston-Stewart algorithm Reduces computations by Peeling: Step 1 Condition likelihoods of family 1 on genotype of X. 1 2 X Step 2 Joint likelihood of families 2 and 1
Lod Score: Morton (1955) Lod > 3 conclude linkage Prior odds linkage ratioPosterior odds 1: :1 Lod <-2 exclude linkage
Linkage Analysis Admixture Test Model Probabilty of linkage in family = Likelihood L( , ) = L( ) + (1- ) L( =1/2)
Allele sharing (non-parametric) methods Penrose (1935): Sib Pair linkage For rare diseaseIBD Concordant affected Concordant normal Discordant Therefore Affected sib pair design Test H 0 : Proportion of alleles IBD =1/2
Affected sib pairs: incomplete marker information Parameters: IBD sharing probabilities Z=(z 0, z 1, z 2 ) Marker Genotype Data M: Finite Mixture Likelihood SPLINK, ASPEX
Joint distribution of Pedigree IBD IBD of relative pairs are independent e.g If IBD(1,2) = 2 and IBD (1,3) = 2 then IBD(2,3) = 2 Inheritance vector gives joint IBD distribution Each element indicates whether paternally inherited allele is transmitted (1) ormaternally inherited allele is transmitted (0) Vector of 2N elements (N = # of non-founders)
Pedigree allele-sharing methods Problem APM: Affected family members Uses IBS ERPA: Extended Relative Pairs AnalysisDodgy statistic Genehunter NPL: Non-Parametric LinkageConservative Genehunter-PLUS: Likelihood (“tilting”) All these methods consider affected members only
Convergence of parametric and non-parametric methods Curtis and Sham (1995) MFLINK: Treats penetrance as parameter Terwilliger et al (2000) Complex recombination fractions Parameters with no simple biological interpretation
Quantitative Sib Pair Linkage X, Y standardised to mean 0, variance 1 r = sib correlation V A = additive QTL variance (X-Y) 2 = 2(1-r) – 2V A ( -0.5) + Haseman-Elston Regression (1972) Haseman-Elston Revisited (2000) XY = r + V A ( -0.5) +
Improved Haseman-Elston Sham and Purcell (2001) Use as dependent variable Gives equivalent power to variance components model for sib pair data
Variance components linkage Models trait values of pedigree members jointly Assumes multivariate normality conditional on IBD Covariance between relative pairs = Vr + V A [ -E( )] WhereV = trait variance r = correlation (depends on relationship) V A = QTL additive variance E( ) = expected proportion IBD
QTL linkage model for sib-pair data P T1 QS N P T2 QSN 1 [0 / 0.5 / 1] nqsnsq
No linkage
Under linkage
Incomplete Marker Information IBD sharing cannot be deduced from marker genotypes with certainty Obtain probabilities of all possible IBD values Finite mixture likelihood Pi-hat likelihood
QTL linkage model for sib-pair data P T1 QS N P T2 QSN 1 nqsnsq
Conditioning on Trait Values Usual test Conditional test Z i = IBD probability estimated from marker genotypes P i = IBD probability given relationship
QTL linkage: some problems Sensitivity to marker misspecification of marker allele frequencies and positions Sensitivity to non-normality / phenotypic selection Heavy computational demand for large pedigrees or many marker loci Sensitivity to marker genotype and relationship errors Low power and poor localisation for minor QTL