1 A -Approximation Algorithm for Shortest Superstring Speaker: Chuang-Chieh Lin Advisor: R. C. T. Lee National Chi-Nan University Sweedyk, Z. SIAM Journal.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Boosting Textual Compression in Optimal Linear Time.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Set Cover 資工碩一 簡裕峰. Set Cover Problem 2.1 (Set Cover) Given a universe U of n elements, a collection of subsets of U, S ={S 1,…,S k }, and a cost.
On the Density of a Graph and its Blowup Raphael Yuster Joint work with Asaf Shapira.
Chapter 6. Relaxation (1) Superstring Ding-Zhu Du.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1 Appendix B: Solving TSP by Dynamic Programming Course: Algorithm Design and Analysis.
Combinatorial Algorithms
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
Combinatorial Algorithms
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
Chapter 3 The Greedy Method 3.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Yangjun Chen 1 Bipartite Graphs What is a bipartite graph? Properties of bipartite graphs Matching and maximum matching - alternative paths - augmenting.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Approximation Algorithms
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Approximation Algorithms: Concepts Approximation algorithm: An algorithm that returns near-optimal solutions (i.e. is "provably good“) is called an approximation.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Approximation Algorithms
Yangjun Chen 1 Bipartite Graph 1.A graph G is bipartite if the node set V can be partitioned into two sets V 1 and V 2 in such a way that no nodes from.
Minimal Spanning Trees. Spanning Tree Assume you have an undirected graph G = (V,E) Spanning tree of graph G is tree T = (V,E T E, R) –Tree has same set.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Copyright © Cengage Learning. All rights reserved. 5 Integrals.
Approximation Algorithms
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
CSIE in National Chi-Nan University1 Approximate Matching of Polygonal Shapes Speaker: Chuang-Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
Logic Circuits Chapter 2. Overview  Many important functions computed with straight-line programs No loops nor branches Conveniently described with circuits.
1 On Finding Minimal Length Superstrings Speaker: Chuang-Chieh Lin Advisor: R. C. T. Lee National Chi-Nan University John Gallant, David Maier and James.
On a Network Creation Game PoA Seminar Presenting: Oren Gilon Based on an article by Fabrikant et al 1.
1 Combinatorial Algorithms Parametric Pruning. 2 Metric k-center Given a complete undirected graph G = (V, E) with nonnegative edge costs satisfying the.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Approximation algorithms for TSP with neighborhoods in the plane R 郭秉鈞 R 林傳健.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chapter 2 Greedy Strategy I. Independent System Ding-Zhu Du.
Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze,
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
CS 203: Introduction to Formal Languages and Automata
The full Steiner tree problem Theoretical Computer Science 306 (2003) C. L. Lu, C. Y. Tang, R. C. T. Lee Reporter: Cheng-Chung Li 2004/06/28.
1 Covering Non-uniform Hypergraphs Endre Boros Yair Caro Zoltán Füredi Raphael Yuster.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Introduction to NP Instructor: Neelima Gupta 1.
Approximation Algorithms by bounding the OPT Instructor Neelima Gupta
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
1 Recap lecture 28 Examples of Myhill Nerode theorem, Quotient of a language, examples, Pseudo theorem: Quotient of a language is regular, prefixes of.
Greedy Technique.
Bipartite Graphs What is a bipartite graph?
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Enumerating Distances Using Spanners of Bounded Degree
Introduction Wireless Ad-Hoc Network
Approximation Algorithms
Presentation transcript:

1 A -Approximation Algorithm for Shortest Superstring Speaker: Chuang-Chieh Lin Advisor: R. C. T. Lee National Chi-Nan University Sweedyk, Z. SIAM Journal on Computing, Vol. 29, No. 3, 1999, pp

2 Outline Introduction Basic definitions String functions The approximation algorithm The upper bound The lower bound Conclusion

3 Outline Introduction Basic definitions String functions The approximation algorithm The upper bound The lower bound Conclusion

4 Introduction Let S = {s 1, s 2, …, s n } be a set of strings. A superstring of S is a string containing each as a contiguous substring. The shortest superstring problem is to find a minimum length superstring of the input set S. This problem has important applications in computational biology and in data compression.

5 For example, S = { ab, bcd, de, abc }, then abcde is a superstring of length 5 of S and abcabcde is a superstring of length 8 of S.

6 Outline Introduction Basic definitions String functions The approximation algorithm The upper bound The lower bound Conclusion

7 Basic definitions Let’s introduce some basic definitions.

8 Overlap Let s and t be two strings. Let the suffix f of s and the prefix p of t are the same, then we call f or p the overlap of s with respect to t. For example, s = cabab t = babcba bab is the overlap of s with respect to t.

9 OV (s, t) is the set of overlaps of s with respect to t. For example, s = cabab, t = bababa OV (s, t) = {ε, b, bab }, OV (s, s) = {ε}, OV (t, t) = {ε, ba, baba }, OV (t, s) = {ε}. OV (s, t)

10 We use ov (s, t) to denote the longest string in OV (s, t); pref (s, t) and suff (s, t) denote the prefix of s and suffix of t corresponding to ov (s, t). Furthermore, we use δ S to denote pref (s, s) For example, u 1 = cabab u 1 = cabab u 2 = bababa u 2 = bababa u 1 = cabab u 2 = bababa So, pref (u 1, u 2 ) = ca, suff (u 1, u 2 ) = aba, ov (s, t), pref (s, t) and suff (s, t)

11 Let S be a set of strings. The distance/ overlap graph G S is a complete diagraph with vertex set S; each edge of the graph is assigned a positive length as follows. the edge e from s to t has length | e | = | pref (s, t) |. Distance/ overlap graph

12 u0u0 u1u1 u2u For example, S = { u 0, u 1, u 2 }, where u 0 = ababc, u 1 = cabab, u 2 = bababa. The following graph is G S. u 0 = ababc u 1 = cabab u 0 = ababc u 1 = cabab

13 The distance/ overlap multigraph g S We define overlap ov (e) = ov (s, t). The distance/ overlap multigraph g S for S is constructed out of the distance/ overlap graph. Every and every an edge from s to t has length and overlap | v |.

14 For example, S = {u 0, u 1, u 2 } u 0 = ababc, u 1 = cabab, u 2 = bababa u0u0 u1u1 u2u2 4, 1 1, 4 We use “m, n” to denote the “length and the overlap” of that edge. 5, 0 3, 32, 3 2, 4 6, 0 5, 0

15 Why are the above graph useful? Consider the Hamiltonian path u 0 -u 1 -u 2. Its total overlap is = 4. The corresponding superstring is ababcabababa (12) Consider the Hamiltonian path u 1 -u 2 -u 0. Its total overlap is = 6. Its corresponding superstring is cababababc (10) (optimal solution).

16 Roughly speaking, we are interested in a cycle which covers all vertices with the largest sum of overlaps, or the smallest sum of lengths.

17 We have oversimplified the problem, because there may well be more than one cycle in the cycle cover. In this case, we have to combine cycles.

18 A cycle cover of G S is a set of simple cycles that cover all the vertices of the graph. Cycle cover

19 u0u0 u1u1 u2u2 4, 1 3, 32, 3 The following cycle c = (u 0, u 1, u 2 ) is a cycle cover of G S where S = { u 0, u 1, u 2 }, u 0 = ababc, u 1 = cabab, u 2 = bababa c

20 The following cycles also form a cycle cover of G S. u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 S = { u 0, u 1, u 2 }, u 0 = ababc, u 1 = cabab, u 2 = bababa

21 The following red and blue cycles also form a cycle cover. v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

22 A minimum-length cycle cover C S * is a cycle cover of G S with minimum sum of lengths of edges. The greedy algorithm can be used to construct C S *.

23 Since each cycle cover corresponds to several superstrings, the minimum cycle cover somehow corresponds to a rather short superstring.

24 For example, Let S = {v 1, v 2, v 3, v 4, v 5 } v 0 = aggtt, v 1 = gttaag, v 2 = taagc, v 3 = gcata, v 4 = tacc. Then g S is as follows: v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

25 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0 And we proceed the greedy algorithm to construct C S * : v 0 = aggtt, v 1 = gttaag, v 2 = taagc, v 3 = gcata, v 4 = tacc

26 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

27 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

28 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

29 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

30 v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0

31 v1v1 v2v2 v3v3 v4v4 v0v0 3, 2 2, 3 4, 2 3, 2 4, 0 Now, the following graph is C S * v 0 = aggtt, v 1 = gttaag, v 2 = taagc, v 3 = gcata, v 4 = tacc c1c1 c2c2 c3c3

32 The superstrings corresponding to the cycles of this cycle cover are as follows v 0 - v 1 : aggttaag v 2 - v 3 : taagcata v 4 :tacc The superstring:aggttaagtaagcatacc can be obtained by concatenating the three cycles. v 0 = aggtt, v 1 = gttaag, v 2 = taagc, v 3 = gcata, v 4 = tacc.

33 Why do we use “cycles”?

34 Open Let c = (s 0, s 1,…, s j-1, s 0 ) be a cycle of G S. For any l, the string, where the indices are taken modulo j, is called an open of c.

35 A cycle c may have many opens. We can regard opens as local superstrings.

36 u0u0 u1u1 u2u2 4, 1 1, 4 4, 2 c2c2 c1c1 For example, u 0 = ababc u 1 = cabab u 2 = bababa c 1 = (u 2, u 2 ) c 2 = (u 0, u 1, u 0 ) Let x 1 = bababa, x 21 = ababcabab, x 22 = cababc x 1 is an open of c 1. x 21 and x 22 are opens of c 2.

37 For any cycle c, an open is a Hamiltonian path of this cycle.

38 For, we denote OP(c) to be the set of opens of c and U S * =

39 u0u0 u1u1 u2u2 4, 1 1, 4 4, 2 c2c2 c1c1 For example, u 0 = ababc u 1 = cabab u 2 = bababa c 1 = (u 2, u 2 ) c 2 = (u 0, u 1, u 0 ) OP(c 1 ) = { bababa } OP(c 2 ) = { ababcabab, cababc }

40 The vertices are called, respectively, x first and x last and the edge is called the opening edge of x. An opening edge of x is an edge whose removal creates the open x. For example, is the opening edge of x 1 is the opening edge of x 21

41 Lemma 2.12 Let c be a cycle. We denote sop (c) to be the shortest open of c. If the minimum length cycle cover C S * consists of a single cycle c, sop (c) is a shortest superstring of S.

42 For example, Cycle cover c 2 is a minimum length cycle cover and c 2 consists of just one cycle. OP (c 2 ) = { ababcabab, cababc }. So sop (c 2 ) = cababc is a shortest superstring of u 0 = ababc and u 1 = cabab. u0u0 u1u1 4, 1 1, 4 c2c2

43 Outline Introduction Basic definitions String functions The approximation algorithm The upper bound The lower bound Conclusion

44 At first, we should know the meaning of the expansion of a cycle or an edge. String functions and lemmas

45 Expansion e = and are versions of each other and if, we say that e is an expansion of For example, s = bbcabba, t = abbabab bbcabba bbcabba abbabab abbabab Let e =,. Therefore, e is an expansion of.

46 1-expansion is an expansion of c if every edge of is an expansion of an edge in c. An edge is tight if k = |ov (s, t)| and loose otherwise. We call a cycle of g S a 1-expansion of if is an expansion of c and it has only one loose edge.

47 When we refer to a 1-expansion of c x for, we mean that the only possible loose edge is. For example, is a 1-expansion of. u0u0 u1u1 4, 1 1, 4 u0u0 u1u1 4, 1 3, 2 u 1 = cabab u 1 = cabab u 0 = ababc u 0 = ababc

48 Let’s take a look at an example here with 3 strings where an expansion of the superstring of two strings should be expanded so that the final superstring covering the three strings is even shorter.

49 y 1 = abcd y 2 = cdba y 12 = abcdba y 1 = abcd, y 2 = cdba, y 3 = cdcdbaba Case 1: without expansion: Case 2: with expansion: y 12 = abcdba y 3 = cdcdbaba y 123 = cdcdbababcdba y 1 = abcd y 2 = cdba y 12 = abcdcdba y 3 = cdcdbaba y 123 = cdcdbaba

50 The above example shows we have to consider some string functions to improve our solutions.

51 Pseudolength Let x be a string in U S * and let be an expansion of e x. We denote the 1-expansion of c x corresponding to as, where The quantity d |c x | is called the pseudolength of the edge and d is called the normalized pseudolength of the edge.

52 Actually, the pseudolength d |c x | measures the losing length after connecting to the other string y.

53 For example, u 0 = ababc, u 1 = cabab, c 2 = = (u 0, u 1, u 0 ), so. Let x 0 = ababcabab an open of c 2, =, =, so | x 0 | = 9 and ov ( ) = 2. u 1 = cabab u 0 = ababc

54 Fact 3.5 Let x be a string in U S *. The 1-expansion exists for some d if and only if there is an expansion of e x with pseudolength d |c x |. If is an expansion of e x with pseudolength d |c x |, then d ≥ 1 with equality if and only if.

55 There exist certain 1-expansions of a cycle c x based on the string functions, lemmas and corollaries. These string functions allow us to identify the expansions of c x. The string functions can shows the situations of overlap between any two strings.

56 We omit the detail of all the string functions and just give an example to describe their function simply.

57 For example, let’s take a look at the string function trade-off : Let x be a string in U S *, c x ≠ c y. The trade-off of x with respect to y, denoted tr (x, y), is defined as

58 For example, x 21 = ababcabab, x 1 = bababa ov max (x 1, x 21 ) = 3 x 1 = bababa x 1 = bababa | | = 2, | x 1 | = 6. x 1 = bababa x 21 = ababcabab x 21 = ababcabab x 1 = bababa u 0 = ababc, u 1 = cabab, u 2 =bababa x1x1 x 21 ov max (x 1, x 21 )

59 From a lemma, a 1-expansion of c x corresponding to ) with pseudolength = exists. For example, x 1 = bababa

60 Outline Introduction Basic definitions String functions and lemmas The approximation algorithm The upper bound The lower bound Conclusion

61 The approximation algorithm Before proceeding to the algorithm, we should understand the important idea: edge exchange.

62 Edge exchange and winning edge Let C be a cycle cover and let e = be an edge of G S. Assume e 1 = and e 2 =, are respectively, the out-edge of s and in-edge of t in C. The edge exchange of e is denoted, is the cycle cover where e 3 =. And e is a winning edge if

63 u0u0 u1u1 u2u2 4, 1 3, 32, 3 u0u0 u1u1 u2u2 4, 1 C 1, 4 2, 4 For example, winning edge The cycle length is 9 The cycle length is 7 u 2 = bababa

64 Another example, v1v1 v2v2 v3v3 v4v4 v0v0 4, 1 5, 0 2, 3 4, 2 6, 0 5, 0 3, 2 5, 1 5, 0 6, 0 4, 0 5, 0 4, 1 4, 0 5, 0 4, 0 3, 2 4, 0 5, 1 5, 0 4, 1 4, 0 v 0 = aggtt, v 1 = gttaag, v 2 = taagc, v 3 = gcata, v 4 = tacc

65 v1v1 v2v2 v3v3 v4v4 v0v0 3, 2 4, 0 5, 0 2, 3 6, 0 The cycle length is 20.

66 v1v1 v2v2 v3v3 v4v4 v0v0 3, 2 4, 0 5, 0 2, 3 6, 0 3, 2

67 v1v1 v2v2 v3v3 v4v4 v0v0 3, 2 4, 0 5, 0 2, 3 6, 0 3, 2

68 v1v1 v2v2 v3v3 v4v4 v0v0 4, 0 5, 0 4, 0 3, 2 2, 3 6, 0

69 v1v1 v2v2 v3v3 v4v4 v0v0 4, 0 3, 2 2, 3 6, 0 The cycle length before edge exchange:20 The cycle length after edge exchange:18 Therefore, we reduced the cycle length.

70 Let C be a cycle cover and let e = be an edge of G S. Assume e 1 = and e 2 =, are respectively, the out-edge of s and in-edge of t in C. The parsimonious edge exchange of e in C, denoted, is the cycle cover where And e 3 is called a losing edge. Parsimonious edge exchange and losing edge

71 u0u0 u1u1 u2u2 4, 1 3, 32, 3 u0u0 u1u1 u2u2 4, 1 C 1, 4 4, 2 For example, losing edge winning edge The cycle length is 9 u 2 = bababa S = { u 0, u 1, u 2 }, u 0 = ababc, u 1 = cabab, u 2 = bababa

72 v1v1 v2v2 v3v3 v4v4 v0v0 4, 0 5, 0 4, 0 3, 2 2, 3 6, 0 winning edge losing edge

73 Lemma 2.2 Let s, t, u and v be strings. If ov k (s, t), ov l (s, u), and ov j (v, t) exist for k ≥ max( j, l ), then ov m (v, u) exists for m = max(0, j + l − k). Let’s go to see an example: v s t u l j j + l − k k

74 The approximation algorithm 1. Construct G S and find C S *. Compute U S * and the string functions. 2. Build the set of merging edges W. 3. Let C = C S *. While W is nonempty do Let e = be a minimum-overlap edge in W. If s and t are in different cycles of C, then C = χ(C, e). W = W \ {e}. 4. Set AOPT S to the concatenation of sop (c),.

75 For example, S = { u 0, u 1, u 2 }, where u 0 = ababc, u 1 = cabab, u 2 = bababa. The following graph is g S. u0u0 u1u1 u2u2 4, 1 1, 4 5, 0 3, 32, 3 2, 4 6, 0 5, 0

76 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 C S * is as follows: c2c2 c1c1 c 1 = (u 2, u 2 ) c 2 = (u 0, u 1, u 0 ) OP(c 1 ) = { bababa } OP(c 2 ) = { ababcabab, cababc } U S * = {bababa, ababcabab, cababc} Let x 1 = bababa, x 21 = ababcabab, x 22 = cababc x 1 is an open of c 1. x 21 and x 22 are opens of c 2. u 0 = ababc, u 1 = cabab, u 2 = bababa

77 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 c2c2 c1c1 c 1 = (u 2, u 2 ) c 2 = (u 0, u 1, u 0 ) u 0 = ababc, u 1 = cabab, u 2 = bababa We begin the coloring action from the minimum length cycle.

78 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 c2c2 c1c1 Now, we choose merging edges to merge the cycles: According to the construction algorithm of W, we choose to merge c 1 and c 2.. u 0 = ababc, u 1 = cabab, u 2 = bababa c 1 = (u 2, u 2 ) c 2 = (u 0, u 1, u 0 ) 2, 3

79 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 c2c2 c1c1 2, 3

80 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 2, 3

81 u0u0 u1u1 u2u2 4, 1 1, 4 2, 4 2, 3 3, 3

82 u0u0 u1u1 u2u2 4, 1 2, 3 Let this cycle be c final. 3, 3

83 At last, We try to find out sop (c final ). OP (c final ) = {ababcabababa(12), cababababc(10), babababcabab(12)}. Therefore, sop (c final ) = cababababc. u0u0 u1u1 u2u2 4, 1 2, 3 c 1 = (u 2, u 2 ), c 2 = (u 0, u 1, u 0 ) u 0 = ababc, u 1 = cabab, u 2 = bababa 3, 3

84 However, the optimal solution is right cababababc with length 10. This approximation algorithm finds out the optimal solution at this case.

85 Outline Introduction Basic definitions String functions and lemmas The approximation algorithm The upper bound The lower bound Conclusion

86 Since the formal analyses of lower bound and the upper bound for the optimal solution is too complicated and difficult for us to understand, now we’re going to describe general strategy relative to simpler examples.

87 The upper bound Let S = { u 0, u 1, u 2 }, where u 0 = ababc, u 1 = cabab, u 2 = baba. u0u0 u1u1 u2u2 4, 1 1, 4 5, 0 1, 32, 3 2, 2 4, 0 5, 0

88 C S * = {c 1, c 2 }, where c 1 = (u 2, u 2 ), c 2 = (u 0, u 1, u 0 ) u0u0 u1u1 u2u2 4, 1 1, 4 2, 2 Note: u 0 = ababc, u 1 = cabab, u 2 = baba. Let x 0 = ababcabab, x 1 = cababc, x 2 = baba x 2 is an open of c 1 ; x 0 and x 1 are opens of c 2. c1c1 c2c2 | C S * | = = 7

89 From the algorithm, we obtain AOPT S = ababcabab ∙baba =ababcababa, so | AOPT S | = 10 Note: u 0 = ababc, u 1 = cabab, u 2 = baba. u0u0 u1u1 u2u2 4, 1 1, 4 2, 2 c1c1 c2c2 However, the optimal solution is OPT S = cabababc |OPT S | = 8.

90 Now, we make an expansion C U of C S * : u0u0 u1u1 u2u2 5, 0 3, 2 4, 0 Note: u 0 = ababc, u 1 = cabab, u 2 = baba. u 1 = cabab u 0 = ababc u 1 = cabab u 1 = baba u 0 = baba CUCU

91 u0u0 u1u1 5, 0 2, 3 4, 0 u2u2 And we make an parsimonious edge exchange for C U. u0u0 u1u1 5, 0 3, 2 4, 0 2, 3 u2u2

92 u0u0 u1u1 5, 0 c1c1 2, 3 4, 0 u2u2 { ababccababa(11), cababaababc(11), babaababccabab(14) } Note: u 0 = ababc, u 1 = cabab, u 2 = baba. ababccababa or cababaababc

93 So we obtain that: |C S * | ≤ | AOPT S | ≤

94 Outline Introduction Basic definitions String functions and lemmas The approximation algorithm The upper bound The lower bound Conclusion

95 The lower bound Let S = { u 0, u 1, u 2 }, where u 0 = abc, u 1 = cab, u 2 = bababa, then g S is constructed as follows: u0u0 u1u1 u2u2 2, 1 1, 2 3, 0 5, 12, 1 2, 4 6, 0 3, 0

96 Then we find a Hamiltonian cycle c = u 0 -u 1 -u 2 of g S. Clearly, c doesn’t contain. u0u0 u1u1 u2u2 2, 1 5, 12, 1

97 u0u0 u1u1 u2u2 4, 2 We find that is a winning edge for c. Let e =. We can make a cycle cover by a parsimonious edge exchange : 2, 1 5, 12, 1

98 We find that is a winning edge for c. Let e =. We can make a cycle cover by a parsimonious edge exchange : u0u0 u1u1 u2u2 2, 1 4, 2 3, 0 c1c1 c2c2

99 The length of the local superstring of u 1 to u 0 is ov (u 1, u 0 ). Thus the cycle length = = 5 is a lower bound of the local superstring of u 1 to u 0. The global superstring has to consider the connection between u 0 and u 2. We may ignore this when we calculate the lower bound.

100 Therefore, |C L | = = 9. u0u0 u1u1 u2u2 2, 1 3, 0 c1c1 c2c2 4, 2

101 However, the optimal solution is cababababc, which has length 10, so | C L | = |OPT S | − 1.

102 Outline Introduction Basic definitions String functions and lemmas The approximation algorithm The upper bound The lower bound Conclusion

103 Conclusion Probably the most interesting open question in superstring study is whether the greedy method yields a 2-approximation. Of course, the other important question in this area is whether OPT S can be approximated within a factor of 2 by any algorithm.

104 We conjecture that our algorithm can be modified slightly and the analysis improved to prove a 2 1/3 bound. Unfortunately, the analysis is even more complicated, perhaps worse, the algorithm becomes extremely complex.

105 Actually, as I looked up for the relative research, I found that the ratio has not been improved since this paper was born.

106 Thank you.

107 Happy Teacher’s Day

108 Greedy-cover algorithm Let C S * = . Order the edges of G S as, so that For i = 1,…, n 2 Add e i = to C S * if s doesn’t have an out-edge and t doesn’t have an in- edge in C S *.