Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor

Slides:



Advertisements
Similar presentations
. Exact Inference in Bayesian Networks Lecture 9.
Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
MALD Mapping by Admixture Linkage Disequilibrium.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
Parametric and Non-Parametric analysis of complex diseases Lecture #8
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Tutorial #5 by Ma’ayan Fishelson
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
Non-Mendelian Genetics
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
Gene Hunting: Linkage and Association
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 29,
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 15: Linkage Analysis VII
1 Haplotyping Algorithm Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 6, 2008.
California Pacific Medical Center
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
The Haplotype Blocks Problems Wu Ling-Yun
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Constrained Hidden Markov Models for Population-based Haplotyping
Copyright © 2001 American Medical Association. All rights reserved.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linkage: Statistically, genes act like beads on a string
Power to detect QTL Association
Error Checking for Linkage Analyses
Correlation for a pair of relatives
Calculation of IBD probabilities
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Basic Model For Genetic Linkage Analysis Lecture #3
Caroline Durrant, Krina T. Zondervan, Lon R
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Balanced Translocation detected by FISH
Emily C. Walsh, Kristie A. Mather, Stephen F
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
IBD Estimation in Pedigrees
Linkage Analysis Problems
Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals  Cornelis A. Albers, Jim.
Genetic linkage analysis
X-chromosomal markers and FamLinkX
Tutorial #6 by Ma’ayan Fishelson
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Gonçalo R. Abecasis, Janis E. Wigginton 
Presentation transcript:

Handling Marker-Marker Linkage Disequilibrium: Pedigree Analysis with Clustered Markers Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor (in American Journal of Human Genetics, 2005)

Motivation Traditional linkage analysis algorithms assume independence between markers i.e., markers are relatively “far apart” from each other (on the chromosome) Want linkage analysis based on Single-Nucleotide Polymorphic markers (SNPs) i.e., the distance between SNP markers is very small (also known as Linkage Disequilibrium (LD)) Solution: Extend the Lander-Green algorithm to incorporate marker-marker LD

Problem Statement INPUT: OUTPUT: Pedigree with f founders (i.e. with unknown parents) n descendants Genotype data available at a series of genetic markers for one or more individuals in the pedigree (some markers can be in LD) OUTPUT: Extract the inheritance information i.e., LOD score, maximum-likelihood haplotypes, etc.

Assumptions Markers can be organized into non-overlapping clusters such that: markers in the same cluster may be in LD markers in different clusters may exhibit low levels of LD i.e., ignore LD between markers in different clusters recombination rate is extremely low within each cluster (set to 0) i.e., θ = 0, inside a cluster

Lander-Green Algorithm Hidden Markov Model G variables represent the observed genotypes (similar to Dan Geiger X variables V variables represent the “inheritance vectors” (similar to Dan Geiger’s selector variables) (e.g., compute P(G1,…,GK|θ) needed for LOD scores)

Lander-Green Algorithm (I) Step 1: Enumeration of all possible “inheritance vectors” in the input pedigree Given n non-founders, the inheritance vector vi for marker Mi is a 2n vector recording the transmission of the paternal or maternal allele (i.e., selector variables in Geiger’s model) There are up to 22n inheritance vectors (Lander&Green1987)

Lander-Green Algorithm (II) Step 2: Iterating over inheritance vectors and markers to calculate the probability of the observed genotypes for each marker conditioned on a particular inheritance vector: P(Gi|vi) This is done using the “genetic descendant graph” (see Geiger slides)

=1 =0 ={A1,A2} Model for locus 2 L21m L21f L22m L22f S23m X21 X22 S23f Assume only individual 3 is genotyped. For the inheritance vector (0,1), the founder alleles L21m and L22f are not restricted by the data while (L21f,L22m) have two possible joint assignments (A1,A2) or (A2,A1) only: p(x21, x22 , x23 |s23m=1,s23f =0) = p(A1)p(A2) + p(A2)p(A1) In general. Every inheritance vector defines a subgraph of the Bayesian network above. We build a founder graph

=1 =0 ={A1,A2} Model for locus 2 {A1,A2} L21m L21f L22m L22f S23m X21 S23f =1 =0 L23m L23f X23 Model for locus 2 ={A1,A2} In general. Every inheritance vector defines a subgraph as indicated by the black lines above. Construct a founder graph whose vertices are the founder variables and where there is an edge between two vertices if they have a common typed descendent. The label of an edge is the constraint dictated by the common typed descendent. Now find all consistent assignments for every connected component. {A1,A2} L21m L21f L22m L22f

Lander-Green Algorithm (III) Step 3: Compute the transition probabilities between inheritance vectors at consecutive markers: P(vi+1|vi), then do the Markov-chain calculations in a standard way

The transition matrix Recall that: Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 22n  22n, encoding four options of recombination/non-recombination for the two parental meiosis: (The Kronecker product) For n non-founders, the transition matrix is the n-fold Kronecker product:

Efficient Product So, if we start with a matrix of size 22n, we will need 22n multiplications if we had matrix A in hands. Continuing recursively, at most 2n times, yields a complexity of O(2n22n), far less than O(24n) needed for regular multiplication. With n=10 non-founders, we drop from non-feasible region to feasible one.

Summary Quantities that we need for the Lander-Green algorithm to work: Inheritance vectors vi for each marker Mi Genotype probabilities: P(Gi|vi) Transition probabilities: P(vi+1|vi)

Lander-Green with LD Markers Step 1 and 3 remain unchanged Step 2 needs to compute P(G1,G2…GM|vcluster) !!

Probability of Observed Genotypes within a Cluster: P(G1,G2…GM|vcluster) INPUT: G1,G2…GM cluster h distinct haplotypes in the population p1,…,ph – their frequencies Hi – state of founder haplotype i ( where i = 1…2f ) OUTPUT: for each inheritance vector v compute P(G1…GM|p1…ph,v)

This is either 1 if the implied haplotypes for each individual are compatible with the observed genotypes, and 0 otherwise! Where S(…) is the set of founder haplotype configurations compatible with the inheritance vector v and observed genotype data G1, …, GM (go to paper for S explanation)

Estimation of Haplotype Frequencies in General Pedigrees Founder haplotypes frequencies for each cluster are generally unknown Gene-counting EM algorithm for estimating the haplotype frequencies in each cluster

Experiments Software package MERLIN Synthetic dataset 500 sibships, each with three affected siblings and one genotyped parent Real dataset – Psoriasis (Stuart et al.2005) 3,158 individuals in 274 families in Germany and USA, and 2,598 individuals were genotyped