Download presentation
Presentation is loading. Please wait.
Published byAlyson Allison Modified over 8 years ago
1
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ., China
2
Outline Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion
3
Pedigree An example: British Royal Family
4
Biological Background Basic concepts Mendelian Law : one haplotype comes from the father and the other comes from the mother. Example: Mendelian experiment paternal maternal 12: heterozgyous 11 22: homozygous 2|1 1|2
5
Notations and Recombinant 11221122 22222222 Genotype 12221222 21222122 Haplotype Configuration 0 recombinant 11111111 22222222 22222222 22222222 11111111 22222222 Mother Father Child : recombinant 11111111 22222222 22222222 22222222 11221122 22222222 1 recombinant MotherFather Child
6
Haplotype Configuration Reconstruction Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain In biological application, genotypes instead of haplotypes are collected. How to reconstruct haplotype from genotype? recombination-free assumption 12 12 12 21 12 12 (b)
7
The ZRHC problem Problem definition Given a pedigree and the genotype information for each member, find a recombination-free haplotype configuration for each member that obeys the Mendelian law of inheritance.
8
Previous Work Li and Jiang introduced a system of linear equations over F[2] and presented an time algorithm for ZRHC [LJ03], where m is #loci and n is #members in pedigree. Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06]. Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.
9
Related work Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k 2.376 ) on k equations with k unknowns The Lanczos and conjugate gradient algorithms are only heuristics [GV96]. The Wiedeman algorithm has expected quadratic running time [W86]
10
Our Result We present a much faster algorithm for ZRHC with running time. Ax=b transformation redundancy elimination O(n log 2 n log log n) O(n)
11
Outline Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion Ax=b
12
The New Linear System n, m m : #loci n : #members in pedigree Unknowns : the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j 1 and a child j.
13
The New Linear System 01000100 11011101 00000000 01110111 00010001 11011101 j 2 j 1 j P j1,1 p j1,2 p j1,3 p j1,4 j 2 j j 1 P j2,1 p j2,2 p j2,3 p j2,4 P j2,1 +0 p j2,2 +1 p j2,3 +1 p j2,4 +1 P j,1 p j,2 p j,3 p j,4 P j,1 +1 p j,2 +1 p j,3 +0 p j,4 +0 h j1,j h j2,j P j1 +w j1 P j1 P j2 P j2 +w j2 P j1,1 +1 p j1,2 +0 p j1,3 +0 p j1,4 +1 PjPj P j +w j p j1,2 =1 p j1,3 =0
14
The Linear System O(mn) equations on O(mn) unknowns. Given a homozygous locus i on a member j (with a child j 1 ), p j [i] and p j1 [i] are pre-determined.
15
Pedigree Graph A pedigree with genotype 1 6 9 8 3 2 475 12 11 12 11 12 22 12 22 12 11 22 12 11 12 22 12 1 6 9 8 3 2 475 Pedigree graph G #edges · 2n
16
Locus Graph Locus graph G i 1 6 9 8 3 2 475 122211 12 11 12 22 Example: Locus graph for the 3 rd locus G i = (V, E i ), where E i = {(k,j)| k is a parent of j, w k [i]=1} (a) Genotype info Zero-weight : 1 6 9 8 3 2 475 ? 1 0 1 1 1 0 1 0 h 1,4 h 4,9 h 8,9 h 6,8 (b) Locus graph
17
Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion Outline Ax=b transformation O(n) O(mn)
18
An Observation For any cycle or any path in a locus graph connecting two pre- determined vertices, the summation of h -variables along the path is a constant. We can use paths to denote constraints! a constant + d j 0, j 1 … P j 1 [i] h j 1, j 2 P j 2 [i]P j k-1 [i]P j k [i] h j k-1, j k d j 1, j 2 d j k-1, j k P j 1 [i]+ d j 1, j 2 + h j 1, j 2 = P j 2 [i] P j 2 [i]+ d j 2, j 3 + h j 2, j 2 = P j 3 [i] … P j k-1 [i]+ d j k-1, j k + h j k-1, j k = P j k [i] P j 0 [i] h j 0, j 1 d j 0, j 1 P j 0 [i]= P j 1 [i] + h j 0, j 1 (proof sketch) Assume the path in locus graph G i connecting two pre-determined vertices j 0 and j k.
19
Examples of Linear Constraints 1 6 9 8 3 2 475 ?1 0 1 1 1 0 1 0 h 8,9 h 6,8 (a) 1 st locus graph h 6,8 + h 8,9 = 1 1 6 9 8 3 2 475 0? ? 1 ? ? 1 0 1 : (b) 2 nd locus graph h 3,5 + h 3,6 + h 2,5 + h 2,6 = 0 h 2,5 h 3,5 h 3,6 h 2,6 1 6 9 8 3 2 475 ?? ? ? ? ? ? 0 1 h 6,8 h 2,4 h 2,5 h 3,5 h 3,6 h 4,9 (c) 3 rd locus graph h 4,9 + h 2,4 + h 2,5 + h 3,5 + h 3,6 + h 6,8 = 0
20
Linear Constraints Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. Moreover, we can upper bound #constraints in each locus graph as O( n ), while the trivial analysis gives an upper bound O( n 2 ). Total #constraints = O( mn ).
21
The ZRHC-PHASE algorithm Algorithm ZRHC_PHASE input: a pedigree G =( V, E ) and genotype {g j } output: a general solution of {p j } begin Step 1. Preprocessing Step 2. Linear constraint generation on h -variables Step 3. Solve h -variables by Gaussian Elimination Step 4. Solve the p -variables by propagation from pre-determined p -variables to others. end Our method Solve h -variables and p - variables separately O(mn) linear equations on O(n) h -variables. Traditional method Solve h -variables and p - variables together O(mn) equations on O(mn) unknowns: O(mn) p- variables and O(n) h- variable s.
22
Outline Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion Ax=b transformation redundancy elimination O(n log 2 n log log n) O(n) O(mn)
23
Redundant Equation Elimination j0j0 j1j1 j k-1 jkjk j k-2 j2j2 … An observation Given a cycle, assume that there are constraints among each pair of vertices. Originally, there are O ( k 2 ) constraints. Notice that they are not independent. However, we can replace the original constraints by an equivalent set of constraints with size O ( k ). j 2 ~ j k-1 j0 ~ j2j0 ~ j2 j 0 ~ j k-1 Remove the redundant equations without solving them! Key lemma
24
Given a spanning tree, the stretch of an edge ( k, j ) is defined as the length of the unique path between k and j on the tree. Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with average stretch O(log 2 n log log n). The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sum of stretches O(nlog 2 n log log n). Redundant Equation Elimination
25
Conclusion We present an efficient algorithm for ZRHC with running time O(mn 2 + n 3 log 2 n log log n). It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O( mn 2 + n 3 ) or lower. Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants
26
Thanks for your time and attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.