Download presentation
Presentation is loading. Please wait.
1
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011
2
Worst-case Analysis of NP-Hard Problems Exact solution methods: exponential running time in worst case. Polynomial-time approximation algorithms for optimization problems. Approximation ratios are usually unrealistically high. Parametrized complexity: polynomial-time complexity for instances with fixed parameter, but dependence on parameter is usually adverse.
3
Probabilistic Analysis and Heuristics In probabilistic analysis problem instances are drawn from simple probability distributions. Often one can prove excellent performance on the average. However, the probability distributions may not correspond to real-life instances. Heuristics are often “unreasonably effective,” for reasons not well understood. We seek systematic methods for tuning heuristics and validating them by empirical testing on training sets of representative instances.
4
Unreasonably Effective Heuristics Large traveling-salesman problems can be solved by quick tour construction methods, local improvement methods or cutting plane methods. Local improvement methods find near-optimal solutions to graph bisection problems. Huge satisfiability problems are routinely solved rapidly by branch-and-bound methods. The greedy set cover algorithm typically gives solutions within a few percent of optimal.
5
Implicit Optimization Problems Set of constraints defined implicitly by a generation algorithm rather than by an explicit list. -- Linear and convex programming: equivalence of separation and optimization -- Integer programming: cutting-plane methods -- Linear programming: column generation
6
Hitting Set Problem Ground set V For every v in V, a positive weight c(v). C*: collection of subsets of V (circuits) Goal: Find a set of minimum weight that hits every set in C* Equivalent to set cover problem
7
Complexity of the Hitting Set Problem NP-hard and hard to approximate within ratio o(log | C*|). Greedy algorithm achieves approximation ratio O(log | C*|): Repeat: Choose element v in V that minimizes ratio of c(v) to number of sets hit; Delete sets hit by v.
8
Hitting Set Problem in Practice Greedy algorithm gives good approximate solutions. CPLEX integer programming algorithm often gives optimal solutions rapidly.
9
Implicit Hitting Set Problem The collection of circuits C* has a compact implicit description. There is a polynomial-time separation oracle which, given a subset H of the ground set, either determines that H is a hitting set or produces a circuit that H does not hit. Example: in the feedback vertex set problem, the separation oracle produces vertex set of a shortest cycle in the subgraph induced by V\H.
10
Feedback vertex set in a graph or digraph: vertex sets of cycles Feedback edge set in a digraph: edge sets of cycles Max cut: edge sets of odd cycles Steiner tree: edge sets of cycles that partition the required vertices Maximum 2-sat: minimal contradictory sets of 2-element clauses Intersection of k matroids: circuits of each matroid Maximal feasible subset of set of linear inequalities; minimal infeasible subsets. Examples
11
Naïve Algorithm for Solving Implicit Hitting Set Problem Repeat until a feasible hitting set H is found: (1) Given C, a subset of C*, find a minimum- weight hitting set H for C. (2) Using the separation oracle, find a minimum- cardinality circuit c not hit by H. (3) Add c to C Return C
12
Circuit-Finding Subroutine Input: C, a set of circuits and H, a hitting set for C Repeat until H hits every circuit in C* find a circuit c not hit by H and choose an element x in c; add c to C and add x to H.
13
Refined Algorithm Input: set of circuits C and hitting set H for C (1)Execute the circuit-finding subroutine (2) Repeat until k iterations yield no circuits: construct a greedy hitting set H for C and execute the circuit-finding subroutine. (3) Using CPLEX, construct an optimal hitting set H for C. If H is infeasible, go to (1) Return H.
14
Metrics Number of circuits generated, number of calls to solver, running time of generator.
15
Application: Multi-Genome Alignment Highly similar sequences in two genomes constitute an anchor pair. The individual sequences are called anchors. A genome is a linearly ordered sequence of anchors. An alignment is a matrix with a row for each genome, and an assignment of each anchor to a column, respecting the linear orders. An anchor pair is synchronized if its two anchors lie in the same column. Goal: maximize the sum of the weights of the synchronized anchor pairs.
17
Complexity Bounds The 2-genome problem is equivalent to the maximum-weight increasing subsequence problem and is solvable in time O(n log n), where n is the cardinality of the ground set. The k-genome problem can be solved in time O(n k ) by dynamic programming.
18
Alignment as a Hitting Set Problem Ground set: anchor pairs Goal: delete a minimum-weight set of anchor pairs such that the remaining anchor pairs can be simultaneously synchronized. Directed edge (u,v): u precedes v. undirected edge (u,v) : u and v are an anchor pair Mixed cycle: contains directed and undirected edges, but at least one directed edge. An edge must be deleted from the set of undirected edges of each mixed cycle (Kececioglu).
20
Solving the Alignment Problem Run the generic implicit hitting set algorithm, with the elements as anchors and the undirected edge sets of mixed cycles as circuits. Separation oracles: given a putative hitting set H, search for a mixed cycle in the graph induced by the edges not in H. Two methods: (1) a variant of depth-first search; (2) attempt to align the remaining edges until blocked by the occurrence of a mixed cycle.
21
Performance on 4085 Problems of Aligning Five Worm Genome Time (sec.) # solved # edges 0 to 0.01 1311 (1; 52; 399) 0.01 to 0.1 764 (20; 203; 549) 0.1 to 1 1086 (26; 450; 1837) 1 to 10 632 (44; 1104; 4645) 10 to 60 151 (65; 1351; 12313) 60 to 600 75 (103; 1136; 14690) 600 to 3600 36 (166; 1236; 13916)
22
Tuning the Algorithm Within the general algorithmic strategy there are many possible choices of the separation oracle, greedy algorithm, versions of CPLEX, parameter choices etc. By tuning these choices on a training set of real-world examples we improved the performance by a factor of several hundred.
23
Acknowledgment This is joint work with Erick Moreno Centeno
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.