Presentation is loading. Please wait.

Presentation is loading. Please wait.

ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom.

Similar presentations


Presentation on theme: "ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom."— Presentation transcript:

1 ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom

2 Motivation – the Conference Location Problem

3 Consensus String Problem Output: Find a point whose maximum Distance from all points is smallest Input: points in space.

4 Hamming Distance

5 Consensus String Problem (1-HRC)

6 History: Frances and Litman [1997]: Problem is NP -complete even for binary alphabets Therefore: 3 directions. 1.Solution for small k. 2.Fixed parameter tractability. 3.Approximation algorithms.

7 History: Solution for small k: Gramm, Niedermeier, and Rossmanith [2001] (3) Boucher, Brown, and Durocher [2008] (4 binary) A., Landau, Na, Park, Park, and Sim [2009] (3, radius & dist. sum optimization) A., Paryenty, and Roditty [2012] (5 binary, l 2 for all k: l k )

8 History: Fixed Parameter Tractability for all Parameters: Fixed l : Ben-Dor, Lancia, Perone, and Ravi [1997] Fixed k: Gramm, Niedermeier, and Rossmanith [2003] Fixed d: Sojanovic, Berman, Gumucio, Hardison, and Miller [1997] Lanctot, Li, Ma, Wang, and Zhang [1999] Sze, Lu, and Chen [2004]

9 History: Approximations: PTAS: Li, Ma, and Wang [2002] – not practical. Rounded LP: Ben-Dor, Lancia, Perone, and Ravi [1997] large number of variables: |Σ| l Chimani, Woste, and Bocker [2011]: can be reduced to: |Σ|( l -1) A., Paryenty, and Roditty [2011]: |T(S)| |Σ| (T(S)= set of column types)

10 Another Motivation – Clustering. The C-CenterStrings problem Input: 1.Points in space 2.Number c 3.Objective function f. Output: Divide the points to c sets such that for the c consensus strings c 1,c 2,…,c c, f(c 1,c 2,…,c c ) is maximum/minimum.

11 Three Types of Objective functions: Let HRC (Hamming Radius Clustering) be the consensus string problem defined before. 1. c-HRC: partition into c sets, each of which has center with radius d. 2. c-HRLC: partition into c sets, each of which has center with radius d, but center is part of input set. 3. c-HRSC: partition into c sets, each of which has a center and the sum of the radii does not exceed d.

12 The Hamming radius c-clustering problem (c-HRC) Example: For the following strings and d=1, we show it belongs to 2-HRC.

13 The Hamming radius local c-clustering problem (c-HRLC) Example: For the following strings and d=2, we show it belongs to 2- HRLC. Does it belong to 2-HRLC when d=1 ?

14 The Hamming radius c-clustering sum problem (c-HRSC) Example: For the following strings and d=2, we show it belongs to 2-HRC.

15 In this Paper: We consider: 1. Parametetrized Complexity, and 2. Approximations Small k is not too meaningful in the context of clustering.

16 C-CenterString Parameterized Complexity c Fixed k Fixed d Fixed (d=1) d/l and c Fixed l Fixed (l=2) HRC NPC polynomial time NPC polynomial time? HRLC polynomial time ? NPC HRSC NPC polynomial time? ?

17 Theorem: HRC,HRLC and HRSC can be solved in polynomial time for fixed k. If k≤c then input strings can be assigned to c centers where d=0. Otherwise c<k. There are c k <k k options for partitioning k strings to c sets. - For each set, find the consensus center in polynomial time. - The partition that gives the best result is the optimal solution.

18 C-CenterString Parameterized Complexity c Fixed k Fixed d Fixed (d=1) d/l and c Fixed l Fixed (l=2) HRC NPC polynomial time NPC polynomial time? HRLC polynomial time ? NPC HRSC NPC polynomial time? ?

19 Theorem: HRC is NP complete even if the radius is fixed to d = 1. d = 1 and the alphabet is binary By reduction from Vertex Cover For Triangle-Free Graphs Our input: G - Triangle-Free Graph t – size of vertex-cover set

20 The construction: The c parameter is t. The distance parameter d is 1. 1 4 2 3 56 7 1234567 1001000 0110000 0101000 1010000 0010100 0001010 0000101 0000011 Encode edges as bit strings of length |V|. Set the bits of the vertices on the two sides of the edge.

21 1 4 4 2 3 3 56 7 7 0110000 1010000 0010100 0010000 0001000 1001000 0101000 0001010 0000001 0000101 0000011

22 0110000 1010000 0010100 0010000

23 000???? 0000101 0001010

24 0110000 1010000 1100000 1234567 1 32

25 C-CenterString Parameterized Complexity c Fixed k Fixed d Fixed (d=1) d/l and c Fixed l Fixed (l=2) HRC NPC polynomial time NPC polynomial time? HRLC polynomial time ? NPC HRSC NPC polynomial time? ?

26 Theorem: HRLC is NP complete even if the length is fixed to l =2 We prove by reduction from Minimum Maximal Matching for Bipartite graphs Our input: G – Bipartite Graph t – size of the minimal set that is maximal matching Maximal Matching Minimum Maximal Matching

27 The construction: The c parameter is t. The distance parameter d is 1. 1 4 2 3 5 12 14 32 34 54

28 1 4 2 3 5 12 32 34 32 14 54 54

29

30 3254

31 1213 62 12 14 52 12 13 53 13 62 52 12 12 14 13 53 13

32 62 52 12 12 14 13 53 13 Move strings [6,2] and [5,2] if there are centers begins in 5 or 6 52 12 12 14 13 53 13 62 67 67 Change the center to one of the remaining strings 52 52 12 14 13 53 13 62 67 67 We keep going until there are no two centers with common symbol !

33 Approximation Algorithms 1. A linear-time 4-Approximation for the 2-HRSC problem. 2. A polynomial time 3-Approximation for the 2-HRSC problem. 3. Special case PTAS – by computing the clusters and doing 1-HRC approximation on each cluster.

34 >2d Lemma

35 Proof center

36 If we had a representative from each cluster we can associate the rest of the strings to the appropriate group Now use a known approximation algorithm of 1-HRC, for finding the consensus strings of each cluster >2d

37 >4d Lemma Cluster c-center

38 Proof ≤d

39 0000000000000000000 1111111111111000000 1110000000000000000 1110001111111000000

40 0000000000000000000 1111111111111000000 1110000000000000000 1110001111111000000

41 Polynomial time approximation algorithm for 2-HRSC problem

42

43

44

45

46 Future work 1.We presented a heuristic algorithm that did very well in practice – what is its approximation ratio? 2. There are some gaps in the parameterized complexity table: a. What happens in the HRLC/HRSC cases for fixed d? b. What happens in the HRC/HRSC cases for fixed l? 3. Is there a PTAS for c-HRC? 4. Can we approximate c-HRC using LP? SDP?


Download ppt "ON THE EFFICIENCY OF THE HAMMING C-CENTERSTRING PROBLEMS Amihood Amir Liam Roditty Jessica Ficler Oren Sar Shalom."

Similar presentations


Ads by Google