Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze,

Slides:



Advertisements
Similar presentations
Chapter 9 Greedy Technique. Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible - b feasible.
Advertisements

Problem solving with graph search
Set Cover 資工碩一 簡裕峰. Set Cover Problem 2.1 (Set Cover) Given a universe U of n elements, a collection of subsets of U, S ={S 1,…,S k }, and a cost.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Chapter 6. Relaxation (1) Superstring Ding-Zhu Du.
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Data Compression.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Chapter 9 Greedy Technique Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Data Structures – LECTURE 10 Huffman coding
Costas Busch - RPI1 Mathematical Preliminaries. Costas Busch - RPI2 Mathematical Preliminaries Sets Functions Relations Graphs Proof Techniques.
Courtesy Costas Busch - RPI1 Mathematical Preliminaries.
Coloring Algorithms and Networks. Coloring2 Graph coloring Vertex coloring: –Function f: V  C, such that for all {v,w}  E: f(v)  f(w) Chromatic number.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Genome Assembly Charles Yan Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.
The Shortest Path Problem
Assignment 4. (Due on Dec 2. 2:30 p.m.) This time, Prof. Yao and I can explain the questions, but we will NOT tell you how to solve the problems. Question.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Randomized Algorithms - Treaps
Computability Reports. More examples. Homework: Optimization. Other follow- ups. Start to plan presentation.
MA/CSSE 473 Days Optimal BSTs. MA/CSSE 473 Days Student Questions? Expected Lookup time in a Binary Search Tree Optimal static Binary Search.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 A -Approximation Algorithm for Shortest Superstring Speaker: Chuang-Chieh Lin Advisor: R. C. T. Lee National Chi-Nan University Sweedyk, Z. SIAM Journal.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
Mathematical Preliminaries. Sets Functions Relations Graphs Proof Techniques.
1 Combinatorial Algorithms Parametric Pruning. 2 Metric k-center Given a complete undirected graph G = (V, E) with nonnegative edge costs satisfying the.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chapter 2 Greedy Strategy I. Independent System Ding-Zhu Du.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 9 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Large Scale Assembly of DNA Strings using Suffix Trees David Rivshin Parallel 2 4/11/2001.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
1 Greedy Technique Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible b locally optimal.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Graph Algorithms Maximum Flow - Best algorithms [Adapted from R.Solis-Oba]
CSEP 521 Applied Algorithms Richard Anderson Winter 2013 Lecture 3.
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Approximation Algorithms based on linear programming.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
COMP261 Lecture 22 Data Compression 2.
Chapter 5 : Trees.
Greedy Technique.
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Enumerating Distances Using Spanners of Bounded Degree
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Autumn 2015 Lecture 10 Minimum Spanning Trees
Approximation Algorithms
Greedy Algorithms Alexandra Stefan.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Fragment Assembly 7/30/2019.
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze, Carnegie Mellon University, USA W. Szpankowski, Purdue University, USA

Contents of the report Description of the problem Formulation of main results Ideas of the proof

Shortest common superstring Problem (SCS) Given a collection of strings: We want to find a superstring that contains each as a substring

Example Three words: fear nature arena Superstring: fearenature

Algorithms It is known, that the problem is NP-hard. It is of a great interest to develop a good approximation algorithm. Some greedy algorithms will be presented

Definitions We define an overlap and a special sum of two strings Alphabet

Example

Total optimal overlap Let us have a set of n strings Assume that we had solved the SCS The minimal superstring is S

Descriptions of algorithms Generic greedy algorithm : GREEDY In step (*) we choose x,y in order to maximize o(x,y); RGREEDY In step (*) x is a string z from the previous iteration. It means that we have only one long string, which grows by addition of strings at the right-hand end. (*)

Bernoulli model All strings are of the same length Symbols are independent on the previous ones. Let be the associated entropy for the Bernoulli model where

Main result Theorem. Let us consider the SCS problem under the Bernoulli model. Then, with high probability: provided where

Another models Markovian model: each symbol depends only on the previous one Mixing model When we are solving more complicated problems using SCS, two previous models are too unrealistic. Then it needs to use mixing model. The main idea of it is as follows: Let we have the string: The farther the symbol the lesser the dependence on it

Compression SCS can be used to compress strings. Instead of storing all strings of total length nℓ we can store the SCS and n pointers indicating the beginning of an original strings plus lengths of all strings. Compression ratio will be: But when the length of a string grows faster then logn, then compression ratio will be 1

Ideas of the proof First of all we shall show that it is unlikely that there is a pair of strings with overlap more than half of their length Let E denotes the event that there is no such a pair If, then provided

Two halves of a string Let’s consider a part of our superstring :...ancneanaosaasunanssana.. Overlaps of two nearby strings are marked by red. Each string has two marked overlaps the “tail” and the “nose”. Knowing that such overlaps practically never takes more than half of the string, we would divide every string into two parts: the first half and the second one, each has a length of l/2. Then, if we want to consider an overlap of two strings a and b, we should operate only with the first half of a and the second part of b.

RGREEDY and trees We now consider a tree process that is equal to RGREEDY Tree T would be an infinite rooted M-ary tree. M (size of an alphabet) edges leading down from each vertex will be labeled with Thus, each vertex of depth d is identified with string of length d.

Modeling RGREADY For the each string we will mark each vertex with a number of strings that has prefix associated with this vertex. “k” means that there are k strings starting from

Example 1. We will climb down to our tree, following letters from the second half of the first string, to the depth of l/2. 2. We’ll stop at the highest positive integer, let’s call it t. 3. So, we can find t strings that have suffixes equal to the prefix of the first string. 4. Let’s consider one of these t stings in the similar way (see 1.). We’ll repeat such procedure n times.

GREEDY Let D be the digraph –with edges weights We sort the edges A into so that Thus the problem is to find a path along the edges, which has the maximum weight.