Huei-Hun E Tseng1 and Martin Tompa BMC Bioinformatics 2009 Presenter : Seyed Ali Rokni Algorithms for locating extremely conserved elements in multiple.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Greedy Algorithms Be greedy! always make the choice that looks best at the moment. Local optimization. Not always yielding a globally optimal solution.
Approximation Algorithms for Capacitated Set Cover Ravishankar Krishnaswamy (joint work with Nikhil Bansal and Barna Saha)
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
Greedy Algorithms Basic idea Connection to dynamic programming
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Greedy Algorithms Basic idea Connection to dynamic programming Proof Techniques.
Lectures on Network Flows
Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Multiple Sequence alignment Chitta Baral Arizona State University.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Ch 13 – Backtracking + Branch-and-Bound
What is entry A in the matrix multiplication: ( a) 1 (b) -2(c) 5 (d) 11(e) 13 (f) 0.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Perfect Phylogeny MLE for Phylogeny Lecture 14
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Lecture 9 Illustrations Lattices. Fixpoints Abstract Interpretation.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Gene expression & Clustering (Chapter 10)
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Approximation Algorithms for Knapsack Problems 1 Tsvi Kopelowitz Modified by Ariel Rosenfeld.
Phylogenetics II.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
1 Chapter 6 Dynamic Programming. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, optimizing some local criterion. Divide-and-conquer.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
Matrix Multiplication The Introduction. Look at the matrix sizes.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
The Subset-sum Problem
Lectures on Network Flows
Topic: Divide and Conquer
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
Advanced Algorithms Analysis and Design
Polynomial time approximation scheme
CSE 589 Applied Algorithms Spring 1999
Lecture 5 Algorithm Analysis
Flow Feasibility Problems
Discussion Section Week 9
Perfect Phylogeny Tutorial #10
Presentation transcript:

Huei-Hun E Tseng1 and Martin Tompa BMC Bioinformatics 2009 Presenter : Seyed Ali Rokni Algorithms for locating extremely conserved elements in multiple sequence alignments

Hundreds of long genomic sequences extraordinarily conserved across human, mouse, and rat Ultraconserved Element At least 200 consecutive alignment columns 100% perfectly conserved in human, mouse, and rat 481 such elements across the human genome some fractions are also well conserved in dog, in chicken Human-mouse-dog, Human-chicken percent of perfectly conserved columns phylogeny phylogenetic hidden Markov model Introduction

Phylogenetic tree

Multiple Sequence Alignment

Dynamic Programming (i-1,j-1,k-1) (i,j-1,k-1) (i,j-1,k) (i-1,j-1,k) (i-1,j,k) (i,j,k) (i-1,j,k-1) (i,j,k-1)

Limited to 2 or 3 species current 44-vertebrate whole-genome alignment UCSC Genome Browser Goal: Finding long regions of this alignment that are extraordinarily well conserved across all or most of the 44 species Example: Min Length of Col: at least 100 consecutive alignment columns Min Size of Subset: for some subset S of at least 40 of the 44 species Min Percentage: at least 80% of the columns are perfectly conserved approximately 250 GB Generalization of Ultraconserved Elements

Inputs: m × n alignment matrix M with entries from {A, C, G, T, -}, integer s ≤ m, integer t ≤ n, and real number 0 < c ≤ 1. Problem: Determine if M has a subset S of rows, |S| ≥ s a subset T of consecutive columns (ignoring gap character “-” in every row of S), |T| ≥ t a subset U of T, |U| ≥ c|T| s.t in M restricted to S × U, every column is perfectly conserved Example: m = 44, n ≈ 3.8 × 109, s = 40, t = 100, and c = 0.8 Problem Formal Definition

If s = m, the Extremely Conserved Element problem can be solved in time O(mn). Proof: Assume: no column contains the gap character “-” in every row For 1 ≤ i ≤ n, let qi = 1 if column i is perfectly conserved, and qi = 0 otherwise. The results then follows Theorem 2. Theorem 1

Proof

Proof (Cont.)

Y X Non-increasing Merge Yj and Xi are adjacent  maximal interval qi+1... qj During merging maximum interval can be found

The dual of Theorem 2, maximizing c subject to a lower bound on j - i, also O(n) for s = m, the maximum value of c can be determined in time O(mn) the maximum value of t can be determined in time O(mn) Dual of Theorem 2

If c = 1, the Extremely Conserved Element problem can be solved in polynomial time. In fact, the maximum value of s can be determined in this time Proof: For every choice T of at least t consecutive columns, sort the rows of T lexicographically Find s identical rows, with at least t nongap characters Theorem 3

The general Extremely Conserved Element problem is NP-hard Idea: Want a solution of A Knowing a solution of B A  B Solve B Knowing A is NP-Hard  B is NP-hard Theorem 4

177 EC(40, 100, 0.8) elements Partially coding: overlaps a human coding exon The longest element is 355 columns long and is perfectly conserved in 80% across 41 of the species missing only gorilla, shrew, and lamprey Results

Lamprey, missing from 170 Gorilla missing from 41 Cat missing from 35 Zebrafish missing from 30 Fugu missing from only 4 Zebra finch missing from only 3 Chicken is missing from only 2 Lizard is missing from only 1

Questions