Efficient Algorithms for Some Variants of the Farthest String Problem Chih Huai Cheng, Ching Chiang Huang, Shu Yu Hu, Kun-Mao Chao.

Slides:



Advertisements
Similar presentations
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Advertisements

College of Information Technology & Design
Greedy Algorithms.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Analysis of Algorithms
Greedy Algorithms Amihood Amir Bar-Ilan University.
Analysis of Algorithms
1 The TSP : Approximation and Hardness of Approximation All exact science is dominated by the idea of approximation. -- Bertrand Russell ( )
Fixed-Parameter Algorithms for CLOSEST STRING and Related Problems Algorithmica(2003) Jens Gramm, Rolf Niedermeier, Peter Rossmanith.
NP-complete and NP-hard problems Transitivity of polynomial-time many-one reductions Concept of Completeness and hardness for a complexity class Definition.
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Lecture 10 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Tractable and intractable problems for parallel computers
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Algorithms in Exponential Time. Outline Backtracking Local Search Randomization: Reducing to a Polynomial-Time Case Randomization: Permuting the Evaluation.
The Theory of NP-Completeness
CSE 830: Design and Theory of Algorithms
Analysis of Algorithms CS 477/677
Polynomial time approximation scheme Lecture 17: Mar 13.
Chapter 11: Limitations of Algorithmic Power
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Algorithms for SAT Based on Search in Hamming Balls Author : Evgeny Dantsin, Edward A. Hirsch, and Alexander Wolpert Speaker : 羅正偉.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CSCI 2670 Introduction to Theory of Computing November 17, 2005.
LIMITATIONS OF ALGORITHM POWER
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Algorithms for hard problems Parameterized complexity – definitions, sample algorithms Juris Viksna, 2015.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Lecture 5 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
The Theory of NP-Completeness
Decision trees Polynomial-Time
Design and Analysis of Algorithm
Analysis and design of algorithm
Linear Programming Duality, Reductions, and Bipartite Matching
Chapter 11 Limitations of Algorithm Power
Backtracking and Branch-and-Bound
The Theory of NP-Completeness
CS 583 Analysis of Algorithms
Presentation transcript:

Efficient Algorithms for Some Variants of the Farthest String Problem Chih Huai Cheng, Ching Chiang Huang, Shu Yu Hu, Kun-Mao Chao

Abstract Given k strings of the same length L and an integer d, find a string s such that the hamming distance between s and the k strings are greater than d. NP complete if the distance d is not given. Provide an efficient algorithm for a fixed k and L.

Let’s Begin Input: Strings s 1, s 2,..., s k over alphabet Σ of length L, and a nonnegative integer d. Question: Is there a string s of length L such that d H (s, s i ) >= d for all i = 1,..., k? FARTHEST STRING can be solved in O( kL(|Σ|(L-d)) (L-d) ) time, yielding a bounded search tree algorithm for fixed parameters L and d.

Definitions s : a string with length L S : a set of length L strings d H ( s 1, s 2 ) : the hamming distance between the two strings s 1, s 2.

Key Observation Given a set of binary strings S = { s 1, s 2,..., s k } and a positive integer d. If there are i, j ∈ {1,..., k} with d H (s i, s j ) > x, then there is no string s with min i=1,...,k {d H (s, s i )} > L-x/2. same differen t contribute L-x contribute x/2 L-x +x/2 = L-x/2

Key Observation Given a set of binary strings S = { s 1, s 2,..., s k } and a positive integer d. If there are i, j ∈ {1,..., k} with d H (, s j ) = d. This can be used to discard some of the strings.

The Idea Of The Algorithm Choose a “candidate string” first e.g.. A string s i, i = 2,..., k, that matches with the candidate string in more than L-d positions, we recursively try several ways to move the candidate string “away from” s i. Stop either if the candidate is “too far away” from or if we find a solution. By a careful selection of subcases, we can limit the size of this search tree to O( L-d L-d )

Algorithm By Pseudo-Code In the beginning FSd(,L-d) This means that we moved more than L-d steps. From previous observation Find the answer. Choose an unsatisfied string The positions that “candidate string” and the unsatisfied string have the same alphabet. Change a position once. Recursive call Set of positions Choose a subset of P that has L-d+1 positions

Illustration By Graph the number of the branch nodes The height of the tree In this example, If L = 7,d = 5 ………………………

Pseudo-Code & Time Complexity O(1) O( L-d L-d ) recursive calls O(KL) total = O(KL(L-d L-d ))

Correctness Case 2: is not a solution but there exists a string s that satisfies the condition that min i=1,...,k {d H (s, s i )} ≥ d. Case 1: satisfies min i=1,...,k {d H (s, s i )} ≥ d We have to show that Algorithm FSD will find a string s with min i=1,...,k {d H (s, s i )} ≥ d, if such an s exists. There is a string s i, i = 2,..., k, such that d H (, s i ) < d We will explain why the algorithm creates L-d+1 subcases and prove that it can achieve the correct answer.

Correctness – Case2 1. d H (s, ) ≥ d, this means there are most L-d positions that have the same alphabet between s and. 2. d H (s, s i ) ≥ d, this means there are most L-d positions that have the same alphabet between s and s i. 3. We choose L-d+1 positions that use the same alphabet between and s i. 4. Because s and s i only have at most L-d positions that have the same alphabet, by the pigeon hole theorem we know that at least one position exists that s and s i differ. 5. Choose that position, and the candidate string moves closer to the farthest string. 6. In at most L-d steps, the farthest string is achieved. Take the first recursion as example:

Correctness - Case2 In this example, L = 9, d = 4 same differen t

Farthest String by Maximum Hamming Distance Sum Input: Strings s 1, s 2, …, s k over alphabet Σ of length L. Question: Find a string s ∈ {s 1, s 2,..., s k } that maximizes ?

Naïve Approach The number of 1’s in the first bit is the least of the candidates, so we choose 1 as the minority vote. The number of 0’s in the second bit is the least, so the minority vote is 0. Approach: Select the alphabet that occurs the fewest times in each column. This is the so called minority vote. It still doesn’t work.

The concept of weighted sum Therefore we hope to be able to decide which alphabet in one column would contribute the most in terms of the total hamming distance. Then calculating the sum of hamming distance for every string. To achieve this goal we use an array to record the number of times an alphabet occurs in every column.

Key Observation We have to prove that the total sum of the hamming distance equals the total number of strings minus the times an alphabet appears. If it is proven, then the string with the maximum will be our answer. Definition: num(α, i ) is the times the alphabet αappears at ith column.

Pseudo-code & Time Complexity 1 for p=0 to L 2 for i=0 to k 3 num[s i [p]] += 1 4 farthest = 0 5 dis = 0 6 for i=0 to k 7 temp_dis = 0 8 for p=0 to L 9 temp_dis += k - num[s i [p]] 10 if temp_dis > dis 11 dis = temp_dis 12 farthest = return s farthest Calculating the weighted sum of one string takes O(L) time, and the total time is therefore O(KL). The time needed to calculate the number of times each alphabet occurs in each column and entering it into a 2- dimension array num[] takes O(KL).

Thank You