Download presentation
Presentation is loading. Please wait.
1
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside USA,
2
Outline Introduction and problem definition The linear system for ZRHC A linear-time algorithm for Loop-free ZRHC Conclusion
3
Pedigree An example: British Royal Family
4
Biological Background Basic concepts Mendelian Law : one haplotype comes from the father and the other comes from the mother. Example: Mendelian experiment paternal maternal 12: heterozgyous 11 22: homozygous 2|1 : ps-value 1 1|2: ps-value 0
5
Notations and Recombinant 11221122 22222222 Genotype 12221222 21222122 Haplotype Configuration 0 recombinant 11111111 22222222 22222222 22222222 11111111 22222222 Mother Father Child : recombinant 11111111 22222222 22222222 22222222 11221122 22222222 1 recombinant MotherFather Child
6
Haplotype Configuration Reconstruction Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain In biological application, genotypes instead of haplotypes are collected. How to reconstruct haplotype from genotype? recombination-free assumption 12 12 12 21 12 12 (b)
7
The Loop-free ZRHC problem Problem definition Given a loop-free pedigree and the genotype information for each member, find a recombination- free haplotype configuration for each member that obeys the Mendelian law of inheritance.
8
Solutions to the ZRHC problem A particular solution: any numerical assignment A general solution: the span of a basis in the solution space to its associated homogeneous system, offset from the origin by a vector, namely by any particular solution.
9
An Example 0 12 12 12 12 12 12 0: 1 | 2 1: 2 | 1 0 0 0 0 0 A general solution 21 12 12 21 12 21 x x+z y+z y x+z+w y+z+w 0 1 0 1 1 0 x=0 y=1 z=0 w=1 A general solution Input genotype
10
Previous Work and Our Progress ZRHC Li and Jiang introduced a system of linear equations over F[2] and presented an O(m 3 n 3 ) time algorithm for ZRHC [LJ03] Xiao et al. present a much faster algorithm for ZRHC with running time O(mn 2 + n 3 log 2 n log log n) to generate a general solution and O(mn + n 3 log 2 n log log n) to produce a particular solution. [XLX+07] Loop-fee ZRHC Xiao et al. ’ s algorithm has running time O(mn 2 + n 3 ) to produce a general solution and O(mn + n 3 ) to generate a particular solution. [XLX+07] Chan et al. proposed a linear-time (i.e. O(mn) time) algorithm to find a particular solution. [CCC+06] We present a novel algorithm with running time O(mn 2 ) to produce a general solution and O(mn) to generate a particular solution. In pedigree m : #loci n : #members
11
Related work Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k 2.376 ) on k equations with k unknowns The Lanczos and conjugate gradient algorithms are only heuristics [GV96]. The Wiedeman algorithm has expected quadratic running time [W86]
12
Outline Introduction and problem definition The linear system for ZRHC A linear-time algorithm for Loop-free ZRHC Conclusion
13
The New Linear System n, m m : #loci n : #members in pedigree Unknowns : the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j 1 and a child j.
14
The New Linear System 01000100 11011101 00000000 01110111 00010001 11011101 j 2 j 1 j P j1,1 p j1,2 p j1,3 p j1,4 j 2 j j 1 P j2,1 p j2,2 p j2,3 p j2,4 P j2,1 +0 p j2,2 +1 p j2,3 +1 p j2,4 +1 P j,1 p j,2 p j,3 p j,4 P j,1 +1 p j,2 +1 p j,3 +0 p j,4 +0 h j1,j h j2,j P j1 +w j1 P j1 P j2 P j2 +w j2 P j1,1 +1 p j1,2 +0 p j1,3 +0 p j1,4 +1 PjPj P j +w j p j1,2 =1 p j1,3 =0
15
The Linear System O(mn) equations on O(mn) unknowns. Given a homozygous locus i on a member j (with a child j 1 ), p j [i] and p j1 [i] are pre-determined. Ax=b
16
Pedigree Graph A pedigree with genotype 1 6 9 8 2 47 12 11 12 11 12 22 12 22 12 11 22 12 22 12 1 6 9 8 2 47 Pedigree graph G #edges · 2n
17
Locus Graph Locus graph G i 1 6 9 8 2 47 1222 12 11 12 22 Example: Locus graph for the 3 rd locus G i = (V, E i ), where E i = {(k,j)| k is a parent of j, w k [i]=1} (a) Genotype info Zero-weight : 1 6 9 8 2 47 ? 1 11 0 1 0 h 1,4 h 4,9 h 8,9 h 6,8 (b) Locus graph
18
An Observation For any path in a locus graph connecting two pre-determined vertices, the summation of h -variables along the path is a constant. We can use paths to denote constraints! a constant + d j 0, j 1 … P j 1 [i] h j 1, j 2 P j 2 [i]P j k-1 [i]P j k [i] h j k-1, j k d j 1, j 2 d j k-1, j k P j 1 [i]+ d j 1, j 2 + h j 1, j 2 = P j 2 [i] P j 2 [i]+ d j 2, j 3 + h j 2, j 2 = P j 3 [i] … P j k-1 [i]+ d j k-1, j k + h j k-1, j k = P j k [i] P j 0 [i] h j 0, j 1 d j 0, j 1 P j 0 [i]= P j 1 [i] + h j 0, j 1 (proof sketch) Assume the path in locus graph G i connecting two pre-determined vertices j 0 and j k.
19
Examples of Linear Constraints 1 6 9 8 2 47 ?1 11 0 1 0 h 8,9 h 6,8 (a) 1 st locus graph h 6,8 + h 8,9 = 1
20
Linear Constraints Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. Moreover, we can upper bound #constraints in each locus graph as O( n ), while the trivial analysis gives an upper bound O( n 2 ). Total #constraints = O( mn ). Ax=b transformation O(mn) O(n) The linear constraints only contain h -variables
21
Outline Introduction and problem definition The linear equations for ZRHC A linear-time algorithm for ZRHC Conclusion
22
The Loop-free ZRHC-PHASE algorithm Algorithm Loop-free ZRHC_PHASE input: a pedigree G =( V, E ) and genotype {g j } output: a general solution of {p j } begin Step 1. Preprocessing Step 2. Linear constraint generation on h -variables Step 3. Solve h -variables by redundant equation elimination and a novel mapping method Step 4. Solve the p -variables by propagation from pre- determined p -variables to others. end Our method Solve h -variables and p - variables separately O(mn) linear equations on O(n) h -variables. Traditional method Solve h -variables and p - variables together O(mn) equations on O(mn) unknowns: O(mn) p- variables and O(n) h- variable s.
23
Redundant Equation Elimination j0j0 j1j1 j k-1 jkjk j k-2 j2j2 … An observation Given a path P = j 0, …,j k, assume that there are constraints among each pair of vertices. Originally, there are O ( k 2 ) constraints. Notice that they are not independent. However, we can replace the original constraints by an equivalent set of constraints with size O ( k ). j 2 ~ j k-1 j0 ~ j2j0 ~ j2 j 0 ~ j k-1 Remove the redundant equations without solving them! Key lemma Given a set S of constraints on a tree pedigree T, we can reduce S to an equivalent constraint set of size at most n in time O( mn ).
24
Ax=b transformation redundancy elimination O(n )
25
Solving h -variables In order to obtain a linear-time algorithm, we want to avoid the Gaussian elimination method. j0j0 j1j1 jkjk … j k-1 An observation Given a constraint along a path j 0, j 1,…, j k-1, j k h +h + …+ h = b j 0, j 1 j 1, j 2 j k-1, j k Assign the h -variables on edges ( j 0, j 1 ), ( j 1, j 2 ), …, ( j k-2, j k-1 ) arbitrarily. Assign the h -variables on the last edge ( j k- 1, j k ) as a fixed value to satisfy the constraint: h = h + …+ h + b. j 0, j 1 j k-2, j k-1 j k-1, j k We can solve the constraint in the following way:
26
Solving h -variables Based on the Mapping f We have constructed the infective mapping f : S -> E, where S is the constraint set and E is the edge set. h -variables can be solved by a single BFS Traversal. We solve h -variables as follows: For each h -variable corresponding to an edge e not in f (S), assign an arbitrary value. For each h -variable corresponding to an edge e in f (S), assign a fixed value based on the constraint f –1 (e), such that the constraint is satisfied.
27
Conclusion We present an efficient algorithm for Loop-fee ZRHC with running time O(mn) to generate a particular solution and O(mn 2 ) to generate a general solution.
28
Thanks for your time and attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.