Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

Slides:



Advertisements
Similar presentations
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
Advertisements

1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
1 Matching Polytope x1 x2 x3 Lecture 12: Feb 22 x1 x2 x3.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.
How should we define corner points? Under any reasonable definition, point x should be considered a corner point x What is a corner point?
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Chapter 3 The Greedy Method 3.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Approximation Algorithms
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Approximation Algorithms
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
1 Robustness by cutting planes and the Uncertain Set Covering Problem AIRO 2008, Ischia, 10 September 2008 (work supported my MiUR and EU) Matteo Fischetti.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
Decision Procedures An Algorithmic Point of View
1.3 Modeling with exponentially many constr.  Some strong formulations (or even formulation itself) may involve exponentially many constraints (cutting.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Approximation Algorithms
Chapter 1. Formulations 1. Integer Programming  Mixed Integer Optimization Problem (or (Linear) Mixed Integer Program, MIP) min c’x + d’y Ax +
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
Greedy Algorithms and Matroids Andreas Klappenecker.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Chapter 13 Backtracking Introduction The 3-coloring problem
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Introduction to NP Instructor: Neelima Gupta 1.
Approximation Algorithms based on linear programming.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Approximation algorithms
CS4234 Optimiz(s)ation Algorithms L2 – Linear Programming.
The minimum cost flow problem
Approximation algorithms
Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:
6.5 Stochastic Prog. and Benders’ decomposition
Computability and Complexity
1.3 Modeling with exponentially many constr.
Chapter 6. Large Scale Optimization
Integer Programming (정수계획법)
Chapter 6. Large Scale Optimization
1.3 Modeling with exponentially many constr.
Integer Programming (정수계획법)
6.5 Stochastic Prog. and Benders’ decomposition
Chapter 1. Formulations.
Chapter 6. Large Scale Optimization
Discrete Optimization
Presentation transcript:

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary

Implicit Optimization Problems Set of constraints defined implicitly by a generation algorithm rather than by an explicit list. -- Linear and convex programming: equivalence of separation and optimization -- Integer programming: cutting-plane methods -- Linear programming: column generation

Hitting Set Problem Input: a finite ground set U, a positive weight for each element, and a family  * of subsets of U. Hitting set: a set H having a non-empty intersection with each set in  * Problem: find a hitting set of minimum weight.

Complexity of the Hitting Set Problem Equivalent to the weighted set cover problem NP-hard and hard to approximate within ratio o(log |  *|). Greedy algorithm approximates within O(log |  *|). In practice greedy algorithm gives good approximate solutions and optimal solutions can be computed fairly efficiently by a branch-and-cut algorithm.

Implicit Hitting Set Problem The collection of sets  * has a compact implicit description A separation oracle is available which, given a subset H of the ground set, either determines that H is a hitting set or returns a circuit not hit by H.

Examples Feedback vertex set in a graph or digraph Feedback edge set in a digraph Max cut Intersection of k matroids Largest consistent subset of a set of linear inequalities

Separation Oracles Efficient separation oracles exist for minimum feedback vertex set, minimum feedback arc set, max cut, k-matroid intersection.

Implicit Hitting Set Algorithm The implicit hitting set algorithm is a “dialogue” between a hitting set solver and a generator of circuits. Approach: build up a small set of critical circuits, solve the resulting explicit hitting set problem, and demonstrate that its optimal solution is feasible (and hence optimal) for the implicit hitting set problem.

Solving the Implicit Hitting Set Problem  : collection of circuits; H: optimal or near –optimal hitting set for    empty set Repeat: H  near-optimal hitting set for  ; if H hits every circuit in  * then H  optimal hitting set for  ; if H hits every circuit in  * then return H and halt; Add to  one or more circuits not hit by H

Conjecture (R.Karp snd S. Vempala) For the feedback vertex set problem on a random graph with n vertices and n 3/2 edges, there is a constant c such that, with high probability, a minimum set of vertices that hits all cycles of length up to c also hits all longer cycles.

Fractional Hitting Set Problem Min c’x subject to Ax  1, x  0 Rows of A are incidence vectors of sets in  x represents a fractional hitting set. Solvable efficiently by combination of linear programming with Lagrangian relaxation (Plotkin, Shmoys, Tardos)

Multi-Genome Alignment Input: the genomes of several closely related species. Anchor: a segment of one genome that matches a segment of another given genome. Each matching pair of anchors is assigned a weight. Alignment: assignment of a positive integer (column ) to each anchor, such that, across each genome, the assignments strictly increase from left to right. Anchors are aligned if they are assigned the same integer. Problem: Find an alignment that maximizes the sum of the weights of the aligned anchor pairs.

Complexity Bounds The 2-chain problem is equivalent to the maximum-weight increasing subsequence problem and is solvable in time O(n log n), where n is the cardinality of the ground set. The k-chain problem can be solved in time O(n k ) by dynamic programming.

Graph-Theoretic Formulation A mixed graph with a vertex for each anchor. A directed edge from each anchor to the immediately following anchor in its genome. An undirected edge between two matching anchors, with a weight indicating the significance of the match.

Mixed Cycle A cycle is mixed if it contains at least one directed edge. A set of edges can be aligned if and only if it does not include all the undirected edges of any mixed cycle (Kececioglu). This leads to an integer programming formulation and a cutting-plane algorithm for the alignment problem (Kececioglu et al).

Alignment as an Implicit Hitting Set Problem Ground set: set of undirected edges of the mixed graph (similar pairs of anchors). Hitting set: contains at least one undirected edge from each mixed cycle. The problem is implicit, since we don’t wish to list all the mixed cycles. Generator of violated constraints: given a set H, we can either determine that it is a hitting set, or find a mixed cycle that H does not hit.

Solving the Alignment Problem H : subset of ground set;  : collection of circuits (undirected edge sets of mixed cycles)   empty set Repeat: H  near-optimal hitting set for  ; if H hits every circuit in  * (every mixed cycle) then H  optimal hitting set for  ; if H hits every circuit in  * (every mixed cycle) then return H and halt; Add to  one or more mixed cycles not hit by H

Algorithmic Details A greedy algorithm is used to find a near-optimal hitting set for . The CPLEX integer programming code is used to find an optimal hitting set for . Finding mixed cycles not hit by H: Delete edges in H. Method 1: depth-first search; Method 2: Attempt to construct an alignment column by column; whenever the construction is blocked, a mixed cycle is obtained.

Lower Bound For any set of circuits , solve the LP relaxation of the explicit hitting set problem determined by . The optimal value is a lower bound on the optimal value of the implicit hitting set problem.

4096 Problems Derived from 5 Worm Genomes Provably optimal solutions were obtained for 4055 of the 4096 problems. Summary: (min time (sec.), max time, # probs., median #edges, Max # edges) (0, 0.01, 1311, 52, 399) (0.01, 0.1, 764, 203, 549) (0, 1, 1086, 450, 1387) (1, , 1104, 4645) (10, 60, 151, 1351, 12313) (60, 600, 75, 1136, 14690) (600, 3600, 36, 1236, 13916)

Tuning the Algorithm In the multi-genome alignment problem there are many choices of methods for choosing a near-optimal or optimal hitting set H given , and for generating a set of mixed cycles not hit by H. A systematic hillclimbing search through the “space” of these choices was required to obtain good performance.

Cutting Plane Methods for Integer Programming Minimize c’x subject to Ax <= b, x integer Repeat: Solve LP relaxation. If optimal point x is fractional, add a constraint satisfied by all integer solutions but violated by x.

Column Generation in LP Variables (columns of LP) not explicitly given; number of variables may be huge. Auxiliary algorithm generates new column to enter basis. Example: cutting-stock problem. Auxiliary algorithm solves a knapsack problem. Row generation: auxiliary algorithm generates rows (constraints) on demand.

Layering a Partially Ordered Set Input: partially ordered set (V, R) and graph G = (V,E) A layering of the poset is an assignment of and integer f(u) to each vertex u such that, if u R v then f(u) < f(v). An edge is synchronized if its end points are in the same layer. Problem: find a layering that maximizes the sum of the weights of the synchronized edges.

Alignment as a Hitting Set Problem Ground set: edges (similar pairs of anchors) Goal: delete a minimum- weight set of edges such that the remaining edges can be simultaneously synchronized Directed edge (u,v): u less than v in partial order; undirected edge (u,v) : u and v are similar, Mixed cycle: contains directed and undirected edges, but at least one directed edge. An edge must be deleted from the set of undirected edges of each mixed cycle (Kececioglu)

Multiple Alignment An alignment is an assignment of a positive integer label to each anchor (indicating its column in the alignment) such that the sequence of labels of anchors along each genome is increasing. Two anchors with the same label are aligned. Optimization problem: maximize the sum of the weights of the aligned pairs of labels.

Equivalence of Separation and Optimization (GLS) Minimize c’x subject to x in convex set K K described by a separation oracle: given point x, oracle either asserts that x in K or returns hyperplane separating x from K. This suffices for polynomial-time algorithm.