Tutorial #6 by Ma’ayan Fishelson

Slides:



Advertisements
Similar presentations
Tutorial #8 by Ma’ayan Fishelson. Computational Difficulties Algorithms that perform multipoint likelihood computations sum over all the possible ordered.
Advertisements

. Exact Inference in Bayesian Networks Lecture 9.
Map Overlay Algorithm. Birch forest Wolves Map 1: Vegetation Map 2: Animals.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Algorithms, games, and evolution Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani Nasim Mobasheri Spring 2015.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
Tutorial #5 by Ma’ayan Fishelson
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
(CSC 102) Lecture 29 Discrete Structures. Graphs.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Markov Chain Monte Carlo Hadas Barkay Anat Hashavit.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
1 HMM in crosses and small pedigrees Lecture 8, Statistics 246, February 17, 2004.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang,
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Cell Lineage Analysis of a Mouse Tumor
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Tracing An Algorithm for Strongly Connected Components that uses Depth First Search Graph obtained from Text, page a-al: Geetika Tewari.
Constrained Hidden Markov Models for Population-based Haplotyping
Genetic Linkage.
HMM in crosses and small pedigrees, cont.
Genome Scan for Predisposing Loci for Distal Interphalangeal Joint Osteoarthritis: Evidence for a Locus on 2q  Jenni Leppävuori, Urho Kujala, Jaakko Kinnunen,
PC trees and Circular One Arrangements
An quick survey of human 2-point linkage analysis
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Recombination (Crossing Over)
Basic concepts on population genetics
Error Checking for Linkage Analyses
Calculation of IBD probabilities
Graphs Part 2 Adjacency Matrix
GENETIC EQUILIBRIUM II
Basic Model For Genetic Linkage Analysis Lecture #3
Modern Evolutionary Biology I. Population Genetics
IBD Estimation in Pedigrees
Linkage Analysis Problems
Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals  Cornelis A. Albers, Jim.
Genetic linkage analysis
Hardy-Weinberg Lab Data
Detection and Integration of Genotyping Errors in Statistical Genetics
Gonçalo R. Abecasis, Janis E. Wigginton 
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

Tutorial #6 by Ma’ayan Fishelson Computing the Probability of Marker Genotypes Given an Inheritance Vector Tutorial #6 by Ma’ayan Fishelson References: Kenneth Lange “Mathematical and Statistical Methods for Genetic Analysis” Jurg Ott “Analysis of Human Genetic Linkage” http://www.accessexcellence.com/AB/GG/ http://www.nhgri.nih.gov/DIR/VIP/Glossary/index.html http://www.tokyo-med.ac.jp/genet/index-e.htm Based on notes by Terry Speed

Background – Lander & Green’s HMM Complexity: Linear in the number of loci, and number of founders. Exponential in the number of non-founders. Recombinations across successive intervals are independent  sequential computation across loci using the forward-backward algorithm is enabled. The algorithm computing the probability of the data given an inheritance vector is linear in the number of founders. We need to sum over all possible inheritance vectors (exponential in the number of non-founders).

Goal Compute Pr[ml | vl], at locus l. A certain inheritance vector. marker data at this locus (evidence).

References The algorithm presented herein was introduced by Sobel and Lange [2], and Kruglyak et al. [1]. E. Sobel and K. Lange. Descent graphs in pedigree analysis: applications to haplotyping, location score, and marker-sharing statistics. Am. J. Hum. Genet., 58:1323--1337. 1996. L. Kruglyak, M.J. Daly, M.P. Reeve-Daly, and E.S. Lander. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet., 58:1347--1363, 1996.

Main Idea Let a = (a1,…,a2f) be a vector of alleles assigned to founders of the pedigree (f is the number of founders). We want to represent by a graph the restrictions imposed by the observed marker genotypes on the vectors a that can be assigned to the founder genes. The algorithm extracts from the graph only vectors a compatible with the marker data. Pr[m|v] is obtained via a sum over all compatible vectors a.

Example – marker data on a pedigree 1 2 12 11 a/b 21 13 22 14 23 24 b/d a/c

Descent Graph Corresponds to a specific inheritance vector. Vertices: the individuals’ genes (2 genes for each individual in the pedigree). Edges: represent the gene flow specified by the inheritance vector. A child’s gene is connected by an edge to the parent’s gene from which it flowed.

Example – Descent Graph (vertices) 1 2 12 11 a/b 21 13 22 14 23 24 b/d a/c Assume that the descent graph vertices below represent the pedigree on the left. Descent Graph 3 4 5 6 1 2 7 8 (a,b) (a,b) (a,b) (a,b) (a,c) (b,d)

Example – Descent Graph (cont.) 3 4 5 6 1 2 7 8 (a,b) (a,b) (a,b) (a,b) (a,c) (b,d) Assume that paternally inherited genes are on the left. Assume that non-founders are placed in increasing order. A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene.  The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )

Founder Graph Vertices: the founder genes. Edges: connect the genes appearing together in a genotyped individual for the gene flow specified by the inheritance vector v. Note: the edges are labeled with the genotype of the corresponding individuals.

Example – Founder Graph Descent Graph 3 4 5 6 1 2 7 8 (a,b) (a,b) (a,b) (a,b) (a,c) (b,d) Founder Graph (a,b) (a,b) 5 3 6 4 (a,b) (b,d) (a,c) 2 1 8 7

Founder Graph Includes m connected components, C1,…Cm. The founder genes assigned to different components appear in different genotyped individuals, by construction. Under random mating and Hardy-Weinberg equilibrium, the vectors of alleles assigned to different components are independent Each component can be processed individually.

Singleton Components The vertices corresponding to genes that never passed through genotyped individuals form singleton components. Any allele type can be assigned to singleton components. 5 3 2 1 6 4 8 7 (b,d) (a,b) (a,c) Singleton component

Singleton Components (cont.) 3 4 5 6 1 2 7 8 (a,b) (a,b) (a,b) (a,b) (a,c) (b,d)

Find compatible allelic assignments for non-singleton components Identify the set of compatible alleles for each vertex. This is the intersection of the genotypes. attached to the edges incident to the vertex. {a,b} ∩ {a,b} = {a,b} {a,b} ∩ {b,d} = {b} 5 3 2 1 6 4 8 7 (b,d) (a,b) (a,c)

Find compatible allelic assignments for non-singleton components (cont Utilize the whole structure of the component to find allelic assignments compatible with observed genotypes for the component. Pick an arbitrary vertex in the component. If the set of compatible alleles for that vertex contains one element  select that allele type. Otherwise, repeat step III for each of the 2 allele types. Traverse the graph & record the alleles assigned to each vertex to obtain a compatible allelic assignment (when selecting one allele type, the allele types of the adjacent vertices are determined…). If an incompatibility is encountered at some point  there’s no compatible assignment for the allele type we started from.

Possible Allelic Assignments (example) {a,b} {a,b} 5 3 2 1 6 4 8 7 (b,d) (a,b) (a,c) {a,b,c,d} {a,b} {b,d} {a,c} Allelic Assignments Graph Component (a), (b), (c), (d) (2) (a,b,a), (b,a,b) (1,3,5) (a,b,c,d) (4,6,7,8)

Compatible Allelic Assignments Denote by A1,…,Am the set of compatible allelic assignments obtained for each connected component at the end of the algorithm. Except for singleton components, each Ai contains 0,1, or 2 assignments. If for some i, Ai is empty  Pr[m|v] = 0. The compatible assignments are those in the Cartesian product A1x…xAm.

Computing Pr[m|v] The probability of singleton components is 1  we can ignore them. Let ahi be an element of Ai (a vector of alleles assigned to the vertices of component Ci).

Computing Pr[m|v] – Complexity The product is over 2f elements. The summation contains at most 2 terms. The maximum number of operations is 4f. The computation scales linearly with the no. of founders.