Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.

Slides:



Advertisements
Similar presentations
Tutorial #8 by Ma’ayan Fishelson. Computational Difficulties Algorithms that perform multipoint likelihood computations sum over all the possible ordered.
Advertisements

. Exact Inference in Bayesian Networks Lecture 9.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Algorithms, games, and evolution Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani Nasim Mobasheri Spring 2015.
MALD Mapping by Admixture Linkage Disequilibrium.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
1 Directional consistency Chapter 4 ICS-275 Spring 2007.
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
Tutorial #5 by Ma’ayan Fishelson
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
PowerPoint Slides for Chapter 16: Variation and Population Genetics Section 16.2: How can population genetic information be used to predict evolution?
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Modular Decomposition and Interval Graphs recognition Speaker: Asaf Shapira.
(CSC 102) Lecture 29 Discrete Structures. Graphs.
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
INTRODUCTION TO ASSOCIATION MAPPING
1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 29,
Recombination and Linkage
Markov Chain Monte Carlo Hadas Barkay Anat Hashavit.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
1 Haplotyping Algorithm Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 6, 2008.
Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
1 HMM in crosses and small pedigrees Lecture 8, Statistics 246, February 17, 2004.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang,
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Cell Lineage Analysis of a Mouse Tumor
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
HMM in crosses and small pedigrees, cont.
Recombination (Crossing Over)
Error Checking for Linkage Analyses
Basic Model For Genetic Linkage Analysis Lecture #3
IBD Estimation in Pedigrees
Linkage Analysis Problems
Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals  Cornelis A. Albers, Jim.
Genetic linkage analysis
Tutorial #6 by Ma’ayan Fishelson
Gonçalo R. Abecasis, Janis E. Wigginton 
Presentation transcript:

Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed

Background – Lander & Green’s HMM Complexity: –Linear in the number of loci, and number of founders. –Exponential in the number of non-founders. Recombinations across successive intervals are independent  sequential computation across loci using the forward-backward algorithm is enabled. The algorithm computing the probability of the data given an inheritance vector is linear in the number of founders. We need to sum over all possible inheritance vectors (exponential in the number of non-founders).

Goal Compute Pr[m l | v l ], at locus l. marker data at this locus (evidence). A certain inheritance vector.

References The algorithm presented herein was introduced by Sobel and Lange [2], and Kruglyak et al. [1]. 1.E. Sobel and K. Lange. Descent graphs in pedigree analysis: applications to haplotyping, location score, and marker-sharing statistics. Am. J. Hum. Genet., 58: L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S. Lander. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet., 58: , 1996.

Main Idea Let a = (a 1,…,a 2f ) be a vector of alleles assigned to founders of the pedigree (f is the number of founders). We want to represent by a graph the restrictions imposed by the observed marker genotypes on the vectors a that can be assigned to the founder genes. The algorithm extracts from the graph only vectors a compatible with the marker data. Pr[m|v] is obtained via a sum over all compatible vectors a.

Example – marker data on a pedigree a/b a/b 2324 b/d a/c

Descent Graph Corresponds to a specific inheritance vector. Vertices: the individuals’ genes (2 genes for each individual in the pedigree). Edges: represent the gene flow specified by the inheritance vector. A child’s gene is connected by an edge to the parent’s gene from which it flowed.

Example – Descent Graph (vertices) a/b a/b 2324 b/d a/c (a,b) (a,c)(b,d)(a,b) Descent Graph Assume that the descent graph vertices below represent the pedigree on the left.

Example – Descent Graph (cont.) (a,b) (a,c)(b,d)(a,b) Descent Graph 1. Assume that paternally inherited genes are on the left. 2. Assume that non-founders are placed in increasing order. 3.A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene.  The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )

Founder Graph Vertices: the founder genes. Edges: connect the genes appearing together in a genotyped individual for the gene flow specified by the inheritance vector v. Note: the edges are labeled with the genotype of the corresponding individuals.

Example – Founder Graph (b,d) (a,b) (a,c) (a,b) Founder Graph (a,b) (a,c)(b,d)(a,b) Descent Graph

Founder Graph Includes m connected components, C 1,…C m. The founder genes assigned to different components appear in different genotyped individuals, by construction. Under random mating and Hardy-Weinberg equilibrium, the vectors of alleles assigned to different components are independent Each component can be processed individually.

Singleton Components The vertices corresponding to genes that never passed through genotyped individuals form singleton components. Any allele type can be assigned to singleton components (b,d) (a,b) (a,c) (a,b) Singleton component

Singleton Components (cont.) (a,b) (a,c)(b,d)(a,b)

Find compatible allelic assignments for non-singleton components 1.Identify the set of compatible alleles for each vertex. This is the intersection of the genotypes. attached to the edges incident to the vertex (b,d) (a,b) (a,c) (a,b) {a,b} ∩ {a,b} = {a,b}{a,b} ∩ {b,d} = {b}

Find compatible allelic assignments for non-singleton components (cont.) 2.Utilize the whole structure of the component to find allelic assignments compatible with observed genotypes for the component. I.Pick an arbitrary vertex in the component. II.If the set of compatible alleles for that vertex contains one element  select that allele type. Otherwise, repeat step III for each of the 2 allele types. III.Traverse the graph & record the alleles assigned to each vertex to obtain a compatible allelic assignment (when selecting one allele type, the allele types of the adjacent vertices are determined…). IV.If an incompatibility is encountered at some point  there’s no compatible assignment for the allele type we started from.

Possible Allelic Assignments (example) (b,d) (a,b) (a,c) (a,b) {a,b} {a,c} {a} {b} {b,d} {a,b,c,d} Allelic AssignmentsGraph Component (a), (b), (c), (d)(2) (a,b,a), (b,a,b)(1,3,5) (a,b,c,d)(4,6,7,8)

Compatible Allelic Assignments Denote by A 1,…,A m the set of compatible allelic assignments obtained for each connected component at the end of the algorithm. Except for singleton components, each A i contains 0,1, or 2 assignments. If for some i, A i is empty  Pr[m|v] = 0. The compatible assignments are those in the Cartesian product A 1 x…xA m.

Computing Pr[m|v] The probability of singleton components is 1  we can ignore them. Let a hi be an element of A i (a vector of alleles assigned to the vertices of component C i ).

Computing Pr[m|v] – Complexity The summation contains at most 2 terms. The product is over 2f elements. The maximum number of operations is 4f. The computation scales linearly with the no. of founders.