Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang,

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Boosting Textual Compression in Optimal Linear Time.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Social network partition Presenter: Xiaofei Cao Partick Berg.
. Exact Inference in Bayesian Networks Lecture 9.
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Fast Algorithms For Hierarchical Range Histogram Constructions
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Solving linear systems through nested dissection Noga Alon Tel Aviv University Raphael Yuster University of Haifa.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Beyond Trilateration: On the Localizability of Wireless Ad Hoc Networks Reported by: 莫斌.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Mobile Ad Hoc Networks Theory of Data Flow and Random Placement.
Network Coding Project presentation Communication Theory 16:332:545 Amith Vikram Atin Kumar Jasvinder Singh Vinoo Ganesan.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
RAPTOR CODES AMIN SHOKROLLAHI DF Digital Fountain Technical Report.
Crossing Lemma - Part I1 Computational Geometry Seminar Lecture 7 The “Crossing Lemma” and applications Ori Orenbach.
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Some Algorithmic Problems Concerning the Inference and Analysis of TagSNPs, Haplotypes and Pedigrees PH.D candidate: Lan Liu Advisor: Tao Jiang.
Hardness Results for Problems
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Tutorial #5 by Ma’ayan Fishelson
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2011 Hawkes Learning Systems. All rights reserved. Hawkes Learning Systems College Algebra.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
The Quasi-Randomness of Hypergraph Cut Properties Asaf Shapira & Raphael Yuster.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Section 4-1: Introduction to Linear Systems. To understand and solve linear systems.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
1 Efficient Haplotype Inference on Pedigrees and Applications Tao Jiang Dept of Computer Science University of California – Riverside (joint work with.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
On Graphs Supporting Greedy Forwarding for Directional Wireless Networks W. Si, B. Scholz, G. Mao, R. Boreli, et al. University of Western Sydney National.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Flow in Network. Graph, oriented graph, network A graph G =(V, E) is specified by a non empty set of nodes V and a set of edges E such that each edge.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 2 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
15.082J & 6.855J & ESD.78J September 30, 2010 The Label Correcting Algorithm.
Great Theoretical Ideas in Computer Science for Some.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
Computation of the solutions of nonlinear polynomial systems
New Characterizations in Turnstile Streams with Applications
Hans Bodlaender, Marek Cygan and Stefan Kratsch
The minimum cost flow problem
Analysis of algorithms
12. Graphs and Trees 2 Summary
Umans Complexity Theory Lectures
Root-Locus Analysis (1)
Efficient Haplotype Inference on Pedigrees and Applications
Lesson 4.7 Graph Linear Functions
CSC 380: Design and Analysis of Algorithms
Analysis of algorithms
Tutorial #6 by Ma’ayan Fishelson
Presentation transcript:

Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ., China

Outline  Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion

Pedigree An example: British Royal Family

Biological Background Basic concepts Mendelian Law : one haplotype comes from the father and the other comes from the mother. Example: Mendelian experiment paternal maternal 12: heterozgyous 11 22: homozygous 2|1 1|2

Notations and Recombinant Genotype Haplotype Configuration 0 recombinant Mother Father Child : recombinant recombinant MotherFather Child

Haplotype Configuration Reconstruction Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain In biological application, genotypes instead of haplotypes are collected. How to reconstruct haplotype from genotype? recombination-free assumption (b)

The ZRHC problem Problem definition Given a pedigree and the genotype information for each member, find a recombination-free haplotype configuration for each member that obeys the Mendelian law of inheritance.

Previous Work Li and Jiang introduced a system of linear equations over F[2] and presented an time algorithm for ZRHC [LJ03], where m is #loci and n is #members in pedigree. Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06]. Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.

Related work Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k ) on k equations with k unknowns The Lanczos and conjugate gradient algorithms are only heuristics [GV96]. The Wiedeman algorithm has expected quadratic running time [W86]

Our Result We present a much faster algorithm for ZRHC with running time. Ax=b transformation redundancy elimination O(n log 2 n log log n) O(n)

Outline  Introduction and problem definition  A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion Ax=b

The New Linear System n, m m : #loci n : #members in pedigree Unknowns : the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j 1 and a child j.

The New Linear System j 2 j 1 j P j1,1 p j1,2 p j1,3 p j1,4 j 2 j j 1 P j2,1 p j2,2 p j2,3 p j2,4 P j2,1 +0 p j2,2 +1 p j2,3 +1 p j2,4 +1 P j,1 p j,2 p j,3 p j,4 P j,1 +1 p j,2 +1 p j,3 +0 p j,4 +0 h j1,j h j2,j P j1 +w j1 P j1 P j2 P j2 +w j2 P j1,1 +1 p j1,2 +0 p j1,3 +0 p j1,4 +1 PjPj P j +w j p j1,2 =1 p j1,3 =0

The Linear System  O(mn) equations on O(mn) unknowns.  Given a homozygous locus i on a member j (with a child j 1 ), p j [i] and p j1 [i] are pre-determined.

Pedigree Graph A pedigree with genotype Pedigree graph G #edges · 2n

Locus Graph  Locus graph G i Example: Locus graph for the 3 rd locus G i = (V, E i ), where E i = {(k,j)| k is a parent of j, w k [i]=1} (a) Genotype info Zero-weight : ? h 1,4 h 4,9 h 8,9 h 6,8 (b) Locus graph

Introduction and problem definition A new system of linear equations for ZRHC  An O(mn 3 ) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion Outline Ax=b transformation O(n) O(mn)

An Observation  For any cycle or any path in a locus graph connecting two pre- determined vertices, the summation of h -variables along the path is a constant. We can use paths to denote constraints! a constant + d j 0, j 1 … P j 1 [i] h j 1, j 2 P j 2 [i]P j k-1 [i]P j k [i] h j k-1, j k d j 1, j 2 d j k-1, j k P j 1 [i]+ d j 1, j 2 + h j 1, j 2 = P j 2 [i] P j 2 [i]+ d j 2, j 3 + h j 2, j 2 = P j 3 [i] … P j k-1 [i]+ d j k-1, j k + h j k-1, j k = P j k [i] P j 0 [i] h j 0, j 1 d j 0, j 1 P j 0 [i]= P j 1 [i] + h j 0, j 1  (proof sketch) Assume the path in locus graph G i connecting two pre-determined vertices j 0 and j k.

Examples of Linear Constraints ? h 8,9 h 6,8 (a) 1 st locus graph h 6,8 + h 8,9 = ? ? 1 ? ? : (b) 2 nd locus graph h 3,5 + h 3,6 + h 2,5 + h 2,6 = 0 h 2,5 h 3,5 h 3,6 h 2, ?? ? ? ? ? ? 0 1 h 6,8 h 2,4 h 2,5 h 3,5 h 3,6 h 4,9 (c) 3 rd locus graph h 4,9 + h 2,4 + h 2,5 + h 3,5 + h 3,6 + h 6,8 = 0

Linear Constraints Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. Moreover, we can upper bound #constraints in each locus graph as O( n ), while the trivial analysis gives an upper bound O( n 2 ). Total #constraints = O( mn ).

The ZRHC-PHASE algorithm Algorithm ZRHC_PHASE input: a pedigree G =( V, E ) and genotype {g j } output: a general solution of {p j } begin Step 1. Preprocessing Step 2. Linear constraint generation on h -variables Step 3. Solve h -variables by Gaussian Elimination Step 4. Solve the p -variables by propagation from pre-determined p -variables to others. end Our method  Solve h -variables and p - variables separately  O(mn) linear equations on O(n) h -variables. Traditional method  Solve h -variables and p - variables together  O(mn) equations on O(mn) unknowns: O(mn) p- variables and O(n) h- variable s.

Outline Introduction and problem definition A new system of linear equations for ZRHC An O(mn 3 ) time algorithm for ZRHC  An improved algorithm for ZRHC Conclusion Ax=b transformation redundancy elimination O(n log 2 n log log n) O(n) O(mn)

Redundant Equation Elimination j0j0 j1j1 j k-1 jkjk j k-2 j2j2 … An observation Given a cycle, assume that there are constraints among each pair of vertices. Originally, there are O ( k 2 ) constraints. Notice that they are not independent. However, we can replace the original constraints by an equivalent set of constraints with size O ( k ). j 2 ~ j k-1 j0 ~ j2j0 ~ j2 j 0 ~ j k-1 Remove the redundant equations without solving them! Key lemma

Given a spanning tree, the stretch of an edge ( k, j ) is defined as the length of the unique path between k and j on the tree. Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with average stretch O(log 2 n log log n). The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sum of stretches O(nlog 2 n log log n). Redundant Equation Elimination

Conclusion We present an efficient algorithm for ZRHC with running time O(mn 2 + n 3 log 2 n log log n). It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O( mn 2 + n 3 ) or lower. Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants

Thanks for your time and attention!