Extending Alignments Υλικό βασισμένο στο κεφάλαιο 13 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.

Slides:



Advertisements
Similar presentations
Review: Search problem formulation
Advertisements

UNC Chapel Hill Lin/Foskey/Manocha Steps in DP: Step 1 Think what decision is the “last piece in the puzzle” –Where to place the outermost parentheses.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Longest Common Subsequence
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Review Binary Search Trees Operations on Binary Search Tree
Fast Algorithms For Hierarchical Range Histogram Constructions
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Greedy Algorithms Greed is good. (Some of the time)
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
Steps in DP: Step 1 Think what decision is the “last piece in the puzzle” –Where to place the outermost parentheses in a matrix chain multiplication (A.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Dynamic Programming Reading Material: Chapter 7..
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Multiple Sequence alignment Chitta Baral Arizona State University.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Dynamic Programming Reading Material: Chapter 7 Sections and 6.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Binary Trees Chapter 6.
Dynamic Programming – Part 2 Introduction to Algorithms Dynamic Programming – Part 2 CSE 680 Prof. Roger Crawfis.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Sequence Alignment.
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Lectures on Greedy Algorithms and Dynamic Programming
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Copyright © Cengage Learning. All rights reserved.
Introduction to Graphs And Breadth First Search. Graphs: what are they? Representations of pairwise relationships Collections of objects under some specified.
State space representations and search strategies - 2 Spring 2007, Juris Vīksna.
LIMITATIONS OF ALGORITHM POWER
Searching for Solutions
Chapter 7 Dynamic Programming 7.1 Introduction 7.2 The Longest Common Subsequence Problem 7.3 Matrix Chain Multiplication 7.4 The dynamic Programming Paradigm.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Bushy Binary Search Tree from Ordered List. Behavior of the Algorithm Binary Search Tree Recall that tree_search is based closely on binary search. If.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Core String Edits, Alignments, and Dynamic Programming.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
Extending the Cοre Problems. Parametric sequence alignment When using sequence alignment methods to study DNA or amino acid sequences, there is often.
Topological Sort In this topic, we will discuss: Motivations
Advanced Algorithms Analysis and Design
KD Tree A binary search tree where every node is a
Orthogonal Range Searching and Kd-Trees
Dynamic Programming Several problems Principle of dynamic programming
Dynamic Programming Comp 122, Fall 2004.
Planarity Testing.
Enumerating Distances Using Spanners of Bounded Degree
3.5 Minimum Cuts in Undirected Graphs
ICS 353: Design and Analysis of Algorithms
Applied Combinatorics, 4th Ed. Alan Tucker
Haitao Wang Utah State University WADS 2017, St. John’s, Canada
Dynamic Programming Comp 122, Fall 2004.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Extending Alignments Υλικό βασισμένο στο κεφάλαιο 13 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press

Parametric alignment with the use of scoring matrices Definition: For any alignment A of two strings, let smt A and sms A, respectively, denote the total score (obtained from the scoring matrix) for the specific matches in A and the total score for the specific mismatches in A. Αs before, id A and gp A denote the number of indels and gaps contained in A. Using scoring matrices, the parametric value of alignment A is α x smt A + β x sms A - γ x id A + δ gp A.

Efficient algorithms for computing a polygonal decomposition Ray-search problem :Given an alignment A, a point p where A is optimal, and a ray h in γ, δ space starting at p, find the furthest point (call it r*) from p on ray h where A remains optimal. If A remains optimal until h reaches a border of the parameter space, then r* is that border point on h. It is also possible that r*=p.

Newtοn’s ray-search algorithm Set r to the (γ, δ) point where h intersects a border of the parameter space. While A is not an optimal alignment at point r do begin Find an optimal alignlnent A* at point r. Set r to be the unique point on h where the value of A equals the value of A*. end, Set r* to r. Lemma: 1) Newton’s ray-search algorithm finds r* exactly. 2)Unless A is optimal at the initial setting of r, the last computed alignment A* is cooptimal with Α at r* and yet is also optimal on h for some nonzero distance beyond r* 3) When Newtοn’s ray- search algorithm computes an alignment at a point r on h, none of the alignments computed previously (in this execution of Newton's algorithm ) are optimal at r. Follows: if r* = p, then Newton’s method discovers this and returns an alignment A* that is optimal at p and also optimal for some nonzero distance along h. For any polygon Ρ intersected by h, a single ray-search computes alignments at no more than two points of P

Uses fοr parametric alignment Sensitivity analysis: check to see how sensitive the alignment is to changes in the parameters Efficient computation of all cooptimals

Computing suboptimaΙ alignments Optimal alignment, even with a wide range of models and parameter choices, does not always identify the biological phenomena that it is intended to reflect. ▫The available objective functions might not reflect the full range of biological forces that cause differences between strings ▫The objective functions might not induce the optimal alignment tο form the desired shape ▫The data might contain errors that confound in algorithms ▫There may be ties for the optimal alignment ▫There may be many nearly optimal alignments that are biologically more significant than any optimal one

Δ near-optimal alignments Theorem: For any s-to-t path R, Corollary: Consider a path R’ from s to u and let δ denote. Then the s-to-t path R consisting of path R’ followed by the longest u-to-t path is a δ-near-optimal path. Proof: By definition of e(e), e (e) = 0 for any edge e on the longest u-to-t path. Hence δ(R) = δ by the previous Theorem.

Counting and enumerating near- optimal paths - How to count Definition: Let N(v, δ) be the number of δ-near- optimal s-to-t paths that go through node v. For a given value Δ, the number of s-to-t paths whose deviation from R* is at most Δ is We compute that sum by evaluating the following recurrence for each node v and for each “needed” value οf δ:

Counting and enumerating near- optimal paths - Enumeration The δ-near-optimal paths can be enumerated in order of increasing δ, and the enumeration can be terminated when δ = Δ or when some fixed number of paths have been found. Α tree enumerating partial paths is maintained.

A οne-dimensional chaining problem Consider a set of r (possibly) overlapping intervals drawn on the line R, where each interval j has some associated value v(j). The problem is to select a subset of nonoverlapping intervals whose values sum to as large a number as possible

one-dimensional Algorithm Let I be a list of all the 2r numbers representing the locations of the endpoints of the intervals in L. Sort the numbers in I, annotate each entry in I with the name of the interval it is part of and whether it is a left or a right endpoint. For convenience, let I be a one- dimensional array. Set max to zero. Fοr i from 1 to 2r do begin Ιf I[i] represents the left end of an interval say interval j, then set V[j] to v(j)+mαx. Ιf I[i] represents the right end of interval j, then set max tο the maximum of max and V[j]. end.

The two-dimensional chain problem

Definition Α subset of the rectangles is called a chain if no horizontal or vertical line intersects more than one rectangle in the subset and if the rectangles can be ordered so that each one is below and to the right of its predecessor. The value of a chain is the sum of the values of the rectangles in the chain. The Chain Problem Find a chain with maximum value over all chains.

Τwο-dimensional chain aΙgorithm List L begins empty. For i frοm tο 2r do begin If I[i ] is the left end of a rectangle, say rectangle k, then begin search L for the last triple where l j is greater than h k, That is, find the clοsest (in the y dimension) rectangle j with a triple in L whose lowest point is strictly above the highest point of rectangle k Set V(k) to v(k) + V(j). end Else If I[i] is the right end of rectangle k, then begin Search L for the first triple where l j is less than or equal to l k. If l j V(j), then insert the triple (l k, V (k), k) into L, in the proper location to keep the triples sorted by their l values. Delete from L the triple for every rectangle j’ where l j’ V(j’). end end.

Τwο-dimensional chain aΙgorithm Theorem: An optimal chain can be found in O(rlogr) time.