Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor

Handling Marker-Marker Linkage Disequilibrium: Pedigree Analysis with Clustered Markers
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor (in American Journal of Human Genetics, 2005)

Motivation Traditional linkage analysis algorithms assume independence between markers i.e., markers are relatively “far apart” from each other (on the chromosome) Want linkage analysis based on Single-Nucleotide Polymorphic markers (SNPs) i.e., the distance between SNP markers is very small (also known as Linkage Disequilibrium (LD)) Solution: Extend the Lander-Green algorithm to incorporate marker-marker LD

Problem Statement INPUT: OUTPUT: Pedigree with
f founders (i.e. with unknown parents) n descendants Genotype data available at a series of genetic markers for one or more individuals in the pedigree (some markers can be in LD) OUTPUT: Extract the inheritance information i.e., LOD score, maximum-likelihood haplotypes, etc.

Assumptions Markers can be organized into non-overlapping clusters such that: markers in the same cluster may be in LD markers in different clusters may exhibit low levels of LD i.e., ignore LD between markers in different clusters recombination rate is extremely low within each cluster (set to 0) i.e., θ = 0, inside a cluster

Lander-Green Algorithm
Hidden Markov Model G variables represent the observed genotypes (similar to Dan Geiger X variables V variables represent the “inheritance vectors” (similar to Dan Geiger’s selector variables) (e.g., compute P(G1,…,GK|θ) needed for LOD scores)

Lander-Green Algorithm (I)
Step 1: Enumeration of all possible “inheritance vectors” in the input pedigree Given n non-founders, the inheritance vector vi for marker Mi is a 2n vector recording the transmission of the paternal or maternal allele (i.e., selector variables in Geiger’s model) There are up to 22n inheritance vectors (Lander&Green1987)

Lander-Green Algorithm (II)
Step 2: Iterating over inheritance vectors and markers to calculate the probability of the observed genotypes for each marker conditioned on a particular inheritance vector: P(Gi|vi) This is done using the “genetic descendant graph” (see Geiger slides)

=1 =0 ={A1,A2} Model for locus 2 L21m L21f L22m L22f S23m X21 X22 S23f
Assume only individual 3 is genotyped. For the inheritance vector (0,1), the founder alleles L21m and L22f are not restricted by the data while (L21f,L22m) have two possible joint assignments (A1,A2) or (A2,A1) only: p(x21, x22 , x23 |s23m=1,s23f =0) = p(A1)p(A2) + p(A2)p(A1) In general. Every inheritance vector defines a subgraph of the Bayesian network above. We build a founder graph

=1 =0 ={A1,A2} Model for locus 2 {A1,A2} L21m L21f L22m L22f S23m X21
S23f =1 =0 L23m L23f X23 Model for locus 2 ={A1,A2} In general. Every inheritance vector defines a subgraph as indicated by the black lines above. Construct a founder graph whose vertices are the founder variables and where there is an edge between two vertices if they have a common typed descendent. The label of an edge is the constraint dictated by the common typed descendent. Now find all consistent assignments for every connected component. {A1,A2} L21m L21f L22m L22f

Lander-Green Algorithm (III)
Step 3: Compute the transition probabilities between inheritance vectors at consecutive markers: P(vi+1|vi), then do the Markov-chain calculations in a standard way

The transition matrix Recall that:
Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 22n  22n, encoding four options of recombination/non-recombination for the two parental meiosis: (The Kronecker product) For n non-founders, the transition matrix is the n-fold Kronecker product:

Efficient Product So, if we start with a matrix of size 22n, we will need 22n multiplications if we had matrix A in hands. Continuing recursively, at most 2n times, yields a complexity of O(2n22n), far less than O(24n) needed for regular multiplication. With n=10 non-founders, we drop from non-feasible region to feasible one.

Summary Quantities that we need for the Lander-Green algorithm to work: Inheritance vectors vi for each marker Mi Genotype probabilities: P(Gi|vi) Transition probabilities: P(vi+1|vi)

Lander-Green with LD Markers
Step 1 and 3 remain unchanged Step 2 needs to compute P(G1,G2…GM|vcluster) !!

Probability of Observed Genotypes within a Cluster: P(G1,G2…GM|vcluster)
INPUT: G1,G2…GM cluster h distinct haplotypes in the population p1,…,ph – their frequencies Hi – state of founder haplotype i ( where i = 1…2f ) OUTPUT: for each inheritance vector v compute P(G1…GM|p1…ph,v)

This is either 1 if the implied haplotypes
for each individual are compatible with the observed genotypes, and 0 otherwise! Where S(…) is the set of founder haplotype configurations compatible with the inheritance vector v and observed genotype data G1, …, GM (go to paper for S explanation)

Estimation of Haplotype Frequencies in General Pedigrees
Founder haplotypes frequencies for each cluster are generally unknown Gene-counting EM algorithm for estimating the haplotype frequencies in each cluster

Experiments Software package MERLIN Synthetic dataset
500 sibships, each with three affected siblings and one genotyped parent Real dataset – Psoriasis (Stuart et al.2005) 3,158 individuals in 274 families in Germany and USA, and 2,598 individuals were genotyped

Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor

Similar presentations

Presentation on theme: "Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor

Similar presentations

Presentation on theme: "Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor"— Presentation transcript:

Similar presentations

About project

Feedback