IBD Estimation in Pedigrees Gonçalo Abecasis University of Oxford
3 Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those genes? Linkage analysis What are those genes? Association analysis
Relationship Checking
Where are those genes?
Tracing Chromosomes
Sometimes it is easy…
Sharing, or Not?
Data Polymorphic markers Task Eg. Microsatellite repeats, SNPs Allele frequency Location Task Phase markers Place recombinants
Complexity of the Problem For each meiosis In a pedigree with n non-founders, there are 2n meioses each with 2 possible outcomes For each location One for each of m markers Up to 4nm distinct outcomes
Elston-Stewart Algorithm Factorize likelihood by individual Each step assigns phase for all markers for one individual Complexity n * 4m Small number of markers Large pedigrees With little inbreeding
Lander-Green Algorithm Factorize likelihood by marker Each step assigns phase For one marker For all individuals in the pedigree Complexity m * 4n Large number of markers Assumes no interference Relatively small pedigrees
Markov-Chain Monte-Carlo Approximate solutions Explore only most likely outcomes Remove restrictions Pedigree size Number of markers Inbreeding Assuming no interference Computationally intensive
Popular Packages Elston-Stewart Algorithm Lander-Green Algorithm MCMC LINKAGE / FASTLINK (Lathrop et al, 1985) VITESSE (O’Connell and Weeks, 1995) Lander-Green Algorithm Genehunter (Kruglyak et al, 1995) Allegro (Gudbjartsson et al, 2000) MCMC Simwalk2 (Sobel et al, 1996) LOKI (Heath, 1998)
1. Enumerate Possibilities Enumerate gene-flow patterns Gene-flow pattern: Sets transmitted allele for each meiosis Implies founder allele for each individual
2. Founder Allele Sets For each gene flow pattern v Enumerate set A(G,v) All allele states a = [a1, …, a2f] Compatible with both: Gene flow v Genotypes G The likelihood is L(v|G) = 2-2nai f(ai) f(ai) is the frequency of allele ai
Three one alleles required. For example ... Genotypes Gene Flow Founder Alleles Four meioses. Three one alleles required. Likelihood = ½4 f(a1)3
Single Marker Probabilities We now have ... Likelihood for each gene flow pattern Conditional on genotypes Conditional on allele frequencies Conditional on a single marker Probability for each gene-flow pattern P(v) = L(v) / vL(v)
3. Allowing for Recombination Transition Probability T(vavb, ) = (1-)nr(Va,Vb)r(Va,Vb) Transition Matrix Location A Location B
Moving along chromosome Input Vector v of likelihoods at location A Matrix T of transition probabilities AB Output Vector v’ of likelihoods at location B Conditional on likelihoods at A For k vectors, requires k2 operations
Elston and Idury Algorithm Requires k log2 k operations
Moving Along Chromosome
Markov-Chains Single Marker Left Conditional Right Conditional Full Likelihood
MERLIN Fast multipoint calculations Non-parametric linkage analyses Error detection e.g., unlikely obligate recombinants Haplotyping most likely, exhaustive lists, sampling
Sparse Gene Flow Trees
Dense maps Computational challenge Computational advantages Require more memory Require Lander-Green algorithm Limited pedigree size Computational advantages Reduced recombination between markers Approximate solutions possible if steps with many recombinants are ignored
MERLIN: Example Pedigrees
MERLIN: Timings
MERLIN: Memory Usage
Command Line Options
Effect of Genotyping Error Modest levels are likely Up to 1% may be typical Mendelian inheritance checks Detect up to 30% of errors for SNPs Effect on power Linkage vs. Association SNPs vs. Microsatellites
Affected Sib Pair Sample
Unselected Sample
Association Analysis
Error Detection Genotype errors can introduce unlikely recombinants Change likelihood Replace (1-q) with q Test sensitivity of likelihood to each genotype Detects errors that have largest effect on linkage
Practical Exercise Lon Cardon Stacey Cherny