Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Lecture 24 MAS 714 Hartmut Klauck
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Combinatorial Algorithms
The number of edge-disjoint transitive triples in a tournament.
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
Asynchronous Pattern Matching - Metrics Amihood Amir.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
P-Center & The Power of Graphs A part of the facility location problem set By Kiril Yershov and Alla Segal For Geometric Optimizations course Fall 2010.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
1 Internet Networking Spring 2006 Tutorial 6 Network Cost of Minimum Spanning Tree.
Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
1 Internet Networking Spring 2004 Tutorial 6 Network Cost of Minimum Spanning Tree.
1 Internet Networking Spring 2002 Tutorial 6 Network Cost of Minimum Spanning Tree.
February 25, 2015CS21 Lecture 211 CS21 Decidability and Tractability Lecture 21 February 25, 2015.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
5.4 Shortest-path problem  Let G=(V,E,w) be a weighted connected simple graph, w is a function from edges set E to position real numbers set. We denoted.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA.
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Prabhas Chongstitvatana1 NP-complete proofs The circuit satisfiability proof of NP- completeness relies on a direct proof that L  p CIRCUIT-SAT for every.
Packing Rectangles into Bins Nikhil Bansal (CMU) Joint with Maxim Sviridenko (IBM)
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
Sorting by Cuts, Joins and Whole Chromosome Duplications
Image segmentation Prof. Noah Snavely CS1114
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom.
NP-Complete Problems Algorithm : Design & Analysis [23]
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
Vasilis Syrgkanis Cornell University
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
The NP class. NP-completeness
More NP-Complete and NP-hard Problems
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Parameterised Complexity
Richard Anderson Lecture 25 NP-Completeness
Richard Anderson Lecture 28 NP-Completeness
Chapter 34: NP-Completeness
Introduction Wireless Ad-Hoc Network
Prabhas Chongstitvatana
Richard Anderson Lecture 10 Minimum Spanning Trees
Clustering.
Locality In Distributed Graph Algorithms
Presentation transcript:

Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy

Motivation Genome rearrangements: phylogenetic information. Common assumption: one copy of each gene - permutations. In practice: assumption not defensible. Need to consider general strings. Usually complicated, simplifying assumptions.

Our work Consider general strings but simplify the rearrangement operation. Study the simple interchange rearrangement. abacbacabbabacbacabb interchange Results: interchange distance is NP-hard for general strings in unit-cost model BUT polynomial time computable in length-weighted model. Use two cost models: unit-cost model and length-weighted cost model.

Unit-cost model The interchange distance is the number of interchanges. Thm: Computing the interchange distance between two general strings is NP-hard. Proof: in two steps. show equivalence to maximum edge-disjoint cycle decomposition of digraphs (max-DCD problem). prove that max-DCD problem is NP-hard.

Equivalence to max-DCD Fact: [Amir et al., SODA06] The interchange distance of a permutation  of length m (to the identity permutation) is m-c(  ). Example: Consider It has 3 permutation cycles: (1 4 3) (2 6) (5 7) So, its distance is 7-3= Note: there’s unique cycle decomposition of the digraph.

Equivalence to max-DCD… What happens in general strings? Example: s 1 =a b a c b c s 2 =b a c b c a a bc a bc s 1 =a b a c b c s 2 =b a c b c a Note: Cycle decomposition of the digraph is not unique. Which is better? We want maximum number of cycles: max-DCD problem.

The max-DCD problem What do we know about max-DCD problem? For directed graphs Consider only graphs with no cycles of length 2. The undirected version is NP-hard [Caprara,’99].

The max-DCD problem… Lemma: In digraphs with no cycles of length 2, the problem of finding a decomposition into directed triangles is polynomially reducible to max-DCD. Proof: Let G be a digraph with no cycles of length 2. Clearly, if |max-DCD(G)|<|E|/3 there’s no decomposition into triangles. If |max-DCD(G)|=|E|/3 an optimal decomposition must be a decomposition into triangles. What do we know about triangles decomposition? The undirected version is NP-hard. Corollary from NP-hardness of edge partition into cliques of size k [Holyer,’81].

The max-DCD problem… [Holyer,’81] shows a reduction from 3-SAT to edge partition into cliques of size k. Uses a construction for general k. (2,-2,0)(0,0,0)(1,-1,0)(3,-3,0) (3,-2,-1)(1,0,-1)(2,-1,-1)(4,-3,-1) (4,-2,-2)(2,0,-2)(3,-1,-2)(5,-3,-2) (5,-2,-3)(3,0,-3)(4,-1,-3)(6,-3,-3) Holyer’s construction for k=3 (undirected triangles):

The max-DCD problem… We show Holyer’s proof works also for directed triangles. (2,-2,0)(0,0,0)(1,-1,0)(3,-3,0) (3,-2,-1)(1,0,-1)(2,-1,-1)(4,-3,-1) (4,-2,-2)(2,0,-2)(3,-1,-2)(5,-3,-2) (5,-2,-3)(3,0,-3)(4,-1,-3)(6,-3,-3) Idea: add directions to the construction while preserving its basic properties. This concludes the proof of hardness in unit-cost model.

Length-weighted cost model The weighted-interchange distance is the sum of the interchanges weights. The weight of an interchange of elements in positions i,j is |i-j|. Thm: Computing the WI-distance between two general strings is polynomial time computable. Proof: in two steps. prove the result for permutations. show how to apply to general strings.

WI-distance in permutations Definition: The L 1 -distance is min   |j-  (j)|. Lemma: Let x,y be permutations of length m. Then, WI-distance(x,y)  L1-distance(x,y)/2. Lemma: Let x,y be permutations of length m. Then, WI-distance(x,y)  L1-distance(x,y)/2. Proof: Consider the following algorithm: while there are unsorted pairs in x find a good pair i,j. interchange elements i and j. Proof: Best situation example

WI-distance in permutations… What is a good pair? Elements i,j such that interchanging them is “useful” for both (I.e., i  k  i  k). Example: Consider ,1 or 3,1 are good pairs BUT 4,2 is not a good pair. Note: The cost of interchanging good pairs never exceeds half of the L1-cost. Claim: Every unsorted permutation has a good pair. Thm: Let x,y be permutations of length m. Then, WI-distance(x,y)=L1-distance(x,y)/2.

WI-distance in permutations… Fact: [Amir et al., SODA06] The L1-distance between two general strings can be computed in polynomial time. So, if we compute the L1-distance in polynomial time… This gives the result for permutations. What about general strings? Do we need to try all pairings of same letters? How do we pair the symbols? Example: Text: ABCBAABBC Pattern: CCAABABBB

WI-distance in general strings Fact: [Amir et al., SODA06] For the L1-distance we know an optimal pairing. Example: An optimal pairing Text: ABCBAABBC Pattern: CCAABABBB Thm: Let x,y be general strings of length m. Then, WI-distance(x,y)=L1-distance(x,y)/2. Result: The WI-distance is polynomial time computable for general strings. Proof: Consider all pairings, each defines permutations for which the result holds, and use the L1-optimal pairing.

Conclusions The general strings situation probably difficult in unit-cost model for all well-known rearrangement operations. possible direction: length-weighted model. Note: Length-weighted cost models are considered biologically meaningful by some researchers (e.g. [Bender et al., ’04]). So, this direction might be applicable as well as computable.