Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Similar presentations


Presentation on theme: "Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla."— Presentation transcript:

1 Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla

2 2 Overview What is Allegro Allegro vs. Genehunter Reduced inheritance vectors Founder couple reduction Fast tree traversal  Formalization  Calculation of S pairs  Single locus probability calculation (if time permits)

3 Guy Grebla 3 What is Allegro Allegro is based on Genehunter. Allegro runs faster than Genehunter due to algorithmic improvements.

4 Guy Grebla 4 Allegro vs. Genehunter(1) Allegro runs much faster than Genehunter, typically the speedup is 20-40 fold, and in many cases as high as 100 fold. If necessary, Allegro is capable, at a cost of 10-30% in run time, to cut down the memory requirements by a factor of 20-60 compared with Genehunter.

5 Guy Grebla 5 Allegro vs. Genehunter(2) Recall that the time complexity of Genehunter is exponential in the pedigree’s size, therefore it is infeasible to run Genehunter with large pedigree’s size. Due to the algorithmic improvements, Allegro is capable of handling significantly larger pedigrees (even though its time complexity is still exponential in the pedigree’s size).

6 Guy Grebla 6 Reduced inheritance vectors – the idea The idea is based on symmetry that exists between the two alleles of a founder. 1 0 V=(0,1,1,0) 0 1 0 V=(1,1,0,0) 1 n1n1 n2n2

7 Guy Grebla 7 Reduced inheritance vectors For male (female) founder, the corresponding paternal (maternal) bit of his (her) first child is set to 0 and not expressed in the reduced vector (it is called hidden). Result: let m be the number of non-founders, f the number of founders, the vector size is reduced to 2m-f

8 Guy Grebla 8 Reduced inheritance vectors (Cont.) n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c0 1 І ІІ ІІІ

9 Guy Grebla 9 Founder couple reduction Consider a couple of founders which:  Have at least one grandchild  Both not genotyped  Aren’t married twice

10 Guy Grebla 10 Founder couple reduction (Cont.) v * is like v but :  Invert the corresponding bit of each of the grandchildren.  The paternal and maternal bit of each child are switched n1n1 n2n2 n3n3 n4n4 1 [0] a / c 0 1 Corresponding bit v and v* has the same probability

11 Guy Grebla 11 Founder couple reduction - results With the founder couple reduction, the effective number of bits is 2m-f-c where c is the number of founder couples satisfying the stated conditions. Therefore, we’ve improved by a factor of 2 c over the previous reduction.

12 Guy Grebla 12 Fast tree traversal The basic structure of the algorithms implemented in the Genehunter program loops over inheritance vectors in the outermost loop and over people in the pedigree in an inner loop Drawback: for vectors that only differ for branches of the pedigree, part of the calculation will be duplicated.

13 Guy Grebla 13 Fast tree traversal (Cont.) Idea: changing the order of looping to avoid the repeated calculations.

14 Guy Grebla 14 Fast tree traversal – na ï ve example Say we want to calculate for each vector v of length n, the number of 1’s in v. “Genehunter” method: for each vector calculate the number of 1’s. (add each bit of the vector to the sum) “Allegro” method: pass the vectors and save calculations along the way.

15 Guy Grebla 15 na ï ve example – Allegro method 0 0 01 1 12 0 0 1 0 1 1 Less additions !

16 Guy Grebla 16 Fast tree traversal - formalization For each inheritance vector v, S(v) is known. We traverse the pedigree from the top down. When a child is born:  If it has i hidden bits – 2 2-i possibilities for its bits  For each possibility the inheritance vector is appropriately updated and the branch is descended We add a bit b to update vector v to v+ D(v) is a collection of data N=2 2m-f - number of possible inheritance vectors

17 Guy Grebla 17 Fast tree traversal - formalization(2) Recursive algorithm: addbit(v, D, b): for b = 0, 1 do set v + = (v,b) and calculate D + = D(v + ) if there are more bits, addbit(v +,D +, next bit), else D + contains data for s(v+) If the calculation of D + and s are both O(1) then the total time complexity of the calculation is O(N)

18 Guy Grebla 18 Example – calculation of S pairs Ø ij (p,q)= 1 if allele i of p and allele j of q are IBD and 0 otherwise S pq (v) = ∑ 1 i=0 ∑ 1 j=0 Ø ij (p,q) S pairs (v) = ∑ (p,q) is a pair of affecteds S pq (v) k i - the number of times founder allele i turns up among the affected. s – the value of S pairs for the traversed portion D = (s,k 1,k 2,…,k 2f )

19 Guy Grebla 19 Example (Cont.) When an unaffected person is added, do nothing (s + =s, k i + =k i, k j + =k j ) When an affected person is added, perform: s +  s + k i + k j k i +  k i + 1 k j +  k j + 1

20 Guy Grebla 20 Example (Cont.) n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c V=(0,1,1,1,1) Init (no vector bits) s=1, k 1 =1, k 3 =2, k 4 =1 ІІІ 1 is added s=2, k 1 =1, k 3 =2, k 4 =2, k 5 =1 ІІІ 2 is added s=4, k 1 =1, k 3 =2, k 4 =3, k 5 =1,k 6 =1 0 1 І ІІ ІІІ

21 Guy Grebla 21 S pairs calculation – Genehunter vs. Allegro Genehunter calculates S pairs by calculating S pq for each affected pair, and add it to S pairs This process requires O(Nα 2 ) where α is the number of affected. We saved a factor of α 2 (!)

22 Guy Grebla 22 Additional improvements Allegro use FFT for matrices multiplication, some classical computational techniques have been used to speed the FFT by a factor of three or four.

23 Guy Grebla 23 References “Fast multipoint linkage analysis and the program Allegro”, Daniel F.Gudbjartsson, Kristjan Jonasson, Michael L.Frigge, Augustine Kong "Allegro, a new computer program for linkage analysis,"Gudbjartsson DF, Jonasson K, Frigge ML, Kong A. Nat Genet. 2000 May;25(1):12-3.

24 Guy Grebla 24 BACKUP

25 Guy Grebla 25 Single locus probability calculation Goal: compute Pr[m l | v l ], at locus l for every vector v l marker data at this locus (evidence). A certain inheritance vector.

26 Guy Grebla 26 Single locus probability calculation(Cont.) In general: p(m l | v l ) = ∑ aєP ∏ 2f i=1 p(a i ) where P is the set of possible allele assignments a=(a 1,…a 2f ) to (n 1,…,n 2f ) This probability may be calculated for each v l using Fast tree traversal. Denote p(m l | v l ) as q(v)

27 Guy Grebla 27 Single locus probability - notations n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 a / b [0 0] a / b 1 a / b 1 [0] a / c b / c 0 1 І ІІ ІІІ Founder nodes Assume our founder nodes are numbered, node n i is numbered i

28 Guy Grebla 28 Single locus probability – notations(2) Founder nodes are classified to 3 disjoint sets:  A – assigned nodes.  E – contains edges – each edge is labeled with 2 distinct alleles.  U – unassigned nodes. a i – allele assigned to i (i єA)

29 Guy Grebla 29 Single locus probability - initialization Init:  E  nodes of genotyped founders (edges).  U  rest of the founder nodes.  A  nil (empty)  q(v)  0 Goal: build a founder graph. From the graph we can calculate q(v)

30 Guy Grebla 30 Single locus probability – algorithm When a person genotyped a / b is added:  The value of v (so far) determines the sources of the alleles of the person among the founders.  Denote the corresponding founders by i and j, and consider the edge (i,j).

31 Guy Grebla 31 Single locus probability – algorithm (2) 6 options for edge (i,j): 12 3 4 5 6 AU E i i i i i j j j j j i j

32 Guy Grebla 32 Single locus probability – case by case Case 1:  Put (i,j) in E, remove i,j from U Case 2:  check whether {a,b} = {a i,a j } Case 3:  Check if a i is one of a and b, and if it is, assign the other to a j, and move j from U to A

33 Guy Grebla 33 Single locus probability – case by case(2) Case 4:  Check if a i is one of a and b  Check if the other one is consistent with the labeling of an edge (j,k) in E and if it’s consistent force the assignment Cases 5,6:  May need another loop.  Set a i =a, a j =b, check and handle consistency  Set a i =b, a j =a, check and handle consistency

34 Guy Grebla 34 Single locus probability – algorithm(3) After the last bit of the vector was added, for the probability calculation a product over the edges in E is needed: Let (a e,b e )єE q(v) is updated by adding to it: ∏ i єA ∏ e єE 2p(a e )p(b e )


Download ppt "Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla."

Similar presentations


Ads by Google