Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Slides:



Advertisements
Similar presentations
Tutorial #8 by Ma’ayan Fishelson. Computational Difficulties Algorithms that perform multipoint likelihood computations sum over all the possible ordered.
Advertisements

. Exact Inference in Bayesian Networks Lecture 9.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Lecture 17 Path Algebra Matrix multiplication of adjacency matrices of directed graphs give important information about the graphs. Manipulating these.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Lectures on Network Flows
1 Maximum Flow Networks Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘capacity’ u ij. Goal: Determine the maximum amount.
From Variable Elimination to Junction Trees
Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.
Applied Discrete Mathematics Week 12: Trees
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
HCS Clustering Algorithm
The Game of Nim on Graphs: NimG Gwendolyn Stockman Alan Frieze and Juan Vera.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
1 8-ShortestPaths Shortest Paths in a Graph Fundamental Algorithms.
Belief Propagation, Junction Trees, and Factor Graphs
Aho-Corasick String Matching An Efficient String Matching.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
Building Suffix Trees in O(m) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
CS 473 All Pairs Shortest Paths1 CS473 – Algorithms I All Pairs Shortest Paths.
Tutorial #5 by Ma’ayan Fishelson
DATA STRUCTURE Subject Code -14B11CI211.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
Analysis of Algorithms
Recursion Textbook chapter Recursive Function Call a recursive call is a function call in which the called function is the same as the one making.
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE.
1 Convex Recoloring of Trees Reuven Bar-Yehuda Ido Feldman.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Human Genetic Pedigrees. What is a Genetic Pedigree? l A genetic pedigree is an easy way to track your family traits. It looks like a family tree, but.
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
15.082J & 6.855J & ESD.78J September 30, 2010 The Label Correcting Algorithm.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Great Theoretical Ideas in Computer Science for Some.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Computer Science cpsc322, Lecture 13
Lectures on Network Flows
Computer Science cpsc322, Lecture 13
Algorithms (2IL15) – Lecture 5 SINGLE-SOURCE SHORTEST PATHS
Minimum Spanning Tree Algorithms
Basic Model For Genetic Linkage Analysis Lecture #3
EMIS 8374 Search Algorithms Updated 9 February 2004
IBD Estimation in Pedigrees
Discrete Mathematics for Computer Science
The Greedy Approach Young CS 530 Adv. Algo. Greedy.
Tutorial #6 by Ma’ayan Fishelson
EMIS 8374 Search Algorithms Updated 12 February 2008
Graph Traversals Some applications require visiting every vertex in the graph exactly once. The application may require that vertices be visited in some.
Presentation transcript:

Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla

2 Overview What is Allegro Allegro vs. Genehunter Reduced inheritance vectors Founder couple reduction Fast tree traversal  Formalization  Calculation of S pairs  Single locus probability calculation (if time permits)

Guy Grebla 3 What is Allegro Allegro is based on Genehunter. Allegro runs faster than Genehunter due to algorithmic improvements.

Guy Grebla 4 Allegro vs. Genehunter(1) Allegro runs much faster than Genehunter, typically the speedup is fold, and in many cases as high as 100 fold. If necessary, Allegro is capable, at a cost of 10-30% in run time, to cut down the memory requirements by a factor of compared with Genehunter.

Guy Grebla 5 Allegro vs. Genehunter(2) Recall that the time complexity of Genehunter is exponential in the pedigree’s size, therefore it is infeasible to run Genehunter with large pedigree’s size. Due to the algorithmic improvements, Allegro is capable of handling significantly larger pedigrees (even though its time complexity is still exponential in the pedigree’s size).

Guy Grebla 6 Reduced inheritance vectors – the idea The idea is based on symmetry that exists between the two alleles of a founder. 1 0 V=(0,1,1,0) V=(1,1,0,0) 1 n1n1 n2n2

Guy Grebla 7 Reduced inheritance vectors For male (female) founder, the corresponding paternal (maternal) bit of his (her) first child is set to 0 and not expressed in the reduced vector (it is called hidden). Result: let m be the number of non-founders, f the number of founders, the vector size is reduced to 2m-f

Guy Grebla 8 Reduced inheritance vectors (Cont.) n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c0 1 І ІІ ІІІ

Guy Grebla 9 Founder couple reduction Consider a couple of founders which:  Have at least one grandchild  Both not genotyped  Aren’t married twice

Guy Grebla 10 Founder couple reduction (Cont.) v * is like v but :  Invert the corresponding bit of each of the grandchildren.  The paternal and maternal bit of each child are switched n1n1 n2n2 n3n3 n4n4 1 [0] a / c 0 1 Corresponding bit v and v* has the same probability

Guy Grebla 11 Founder couple reduction - results With the founder couple reduction, the effective number of bits is 2m-f-c where c is the number of founder couples satisfying the stated conditions. Therefore, we’ve improved by a factor of 2 c over the previous reduction.

Guy Grebla 12 Fast tree traversal The basic structure of the algorithms implemented in the Genehunter program loops over inheritance vectors in the outermost loop and over people in the pedigree in an inner loop Drawback: for vectors that only differ for branches of the pedigree, part of the calculation will be duplicated.

Guy Grebla 13 Fast tree traversal (Cont.) Idea: changing the order of looping to avoid the repeated calculations.

Guy Grebla 14 Fast tree traversal – na ï ve example Say we want to calculate for each vector v of length n, the number of 1’s in v. “Genehunter” method: for each vector calculate the number of 1’s. (add each bit of the vector to the sum) “Allegro” method: pass the vectors and save calculations along the way.

Guy Grebla 15 na ï ve example – Allegro method Less additions !

Guy Grebla 16 Fast tree traversal - formalization For each inheritance vector v, S(v) is known. We traverse the pedigree from the top down. When a child is born:  If it has i hidden bits – 2 2-i possibilities for its bits  For each possibility the inheritance vector is appropriately updated and the branch is descended We add a bit b to update vector v to v+ D(v) is a collection of data N=2 2m-f - number of possible inheritance vectors

Guy Grebla 17 Fast tree traversal - formalization(2) Recursive algorithm: addbit(v, D, b): for b = 0, 1 do set v + = (v,b) and calculate D + = D(v + ) if there are more bits, addbit(v +,D +, next bit), else D + contains data for s(v+) If the calculation of D + and s are both O(1) then the total time complexity of the calculation is O(N)

Guy Grebla 18 Example – calculation of S pairs Ø ij (p,q)= 1 if allele i of p and allele j of q are IBD and 0 otherwise S pq (v) = ∑ 1 i=0 ∑ 1 j=0 Ø ij (p,q) S pairs (v) = ∑ (p,q) is a pair of affecteds S pq (v) k i - the number of times founder allele i turns up among the affected. s – the value of S pairs for the traversed portion D = (s,k 1,k 2,…,k 2f )

Guy Grebla 19 Example (Cont.) When an unaffected person is added, do nothing (s + =s, k i + =k i, k j + =k j ) When an affected person is added, perform: s +  s + k i + k j k i +  k i + 1 k j +  k j + 1

Guy Grebla 20 Example (Cont.) n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c V=(0,1,1,1,1) Init (no vector bits) s=1, k 1 =1, k 3 =2, k 4 =1 ІІІ 1 is added s=2, k 1 =1, k 3 =2, k 4 =2, k 5 =1 ІІІ 2 is added s=4, k 1 =1, k 3 =2, k 4 =3, k 5 =1,k 6 =1 0 1 І ІІ ІІІ

Guy Grebla 21 S pairs calculation – Genehunter vs. Allegro Genehunter calculates S pairs by calculating S pq for each affected pair, and add it to S pairs This process requires O(Nα 2 ) where α is the number of affected. We saved a factor of α 2 (!)

Guy Grebla 22 Additional improvements Allegro use FFT for matrices multiplication, some classical computational techniques have been used to speed the FFT by a factor of three or four.

Guy Grebla 23 References “Fast multipoint linkage analysis and the program Allegro”, Daniel F.Gudbjartsson, Kristjan Jonasson, Michael L.Frigge, Augustine Kong "Allegro, a new computer program for linkage analysis,"Gudbjartsson DF, Jonasson K, Frigge ML, Kong A. Nat Genet May;25(1):12-3.

Guy Grebla 24 BACKUP

Guy Grebla 25 Single locus probability calculation Goal: compute Pr[m l | v l ], at locus l for every vector v l marker data at this locus (evidence). A certain inheritance vector.

Guy Grebla 26 Single locus probability calculation(Cont.) In general: p(m l | v l ) = ∑ aєP ∏ 2f i=1 p(a i ) where P is the set of possible allele assignments a=(a 1,…a 2f ) to (n 1,…,n 2f ) This probability may be calculated for each v l using Fast tree traversal. Denote p(m l | v l ) as q(v)

Guy Grebla 27 Single locus probability - notations n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c 0 1 І ІІ ІІІ Founder nodes Assume our founder nodes are numbered, node n i is numbered i

Guy Grebla 28 Single locus probability – notations(2) Founder nodes are classified to 3 disjoint sets:  A – assigned nodes.  E – contains edges – each edge is labeled with 2 distinct alleles.  U – unassigned nodes. a i – allele assigned to i (i єA)

Guy Grebla 29 Single locus probability - initialization Init:  E  nodes of genotyped founders (edges).  U  rest of the founder nodes.  A  nil (empty)  q(v)  0 Goal: build a founder graph. From the graph we can calculate q(v)

Guy Grebla 30 Single locus probability – algorithm When a person genotyped a / b is added:  The value of v (so far) determines the sources of the alleles of the person among the founders.  Denote the corresponding founders by i and j, and consider the edge (i,j).

Guy Grebla 31 Single locus probability – algorithm (2) 6 options for edge (i,j): AU E i i i i i j j j j j i j

Guy Grebla 32 Single locus probability – case by case Case 1:  Put (i,j) in E, remove i,j from U Case 2:  check whether {a,b} = {a i,a j } Case 3:  Check if a i is one of a and b, and if it is, assign the other to a j, and move j from U to A

Guy Grebla 33 Single locus probability – case by case(2) Case 4:  Check if a i is one of a and b  Check if the other one is consistent with the labeling of an edge (j,k) in E and if it’s consistent force the assignment Cases 5,6:  May need another loop.  Set a i =a, a j =b, check and handle consistency  Set a i =b, a j =a, check and handle consistency

Guy Grebla 34 Single locus probability – algorithm(3) After the last bit of the vector was added, for the probability calculation a product over the edges in E is needed: Let (a e,b e )єE q(v) is updated by adding to it: ∏ i єA ∏ e єE 2p(a e )p(b e )