Connections in Networks: Hardness of Feasibility vs. Optimality Jon Conrad, Carla P. Gomes, Willem-Jan van Hoeve, Ashish Sabharwal, Jordan Suter Cornell University CP-AI-OR Conference, May 2007 Brussels, Belgium
May 25, 2007CP-AI-OR Feasibility Testing & Optimization Constraint satisfaction work often focuses on pure feasibility testing: Is there a solution? Find me one! In principle, can be used for optimization as well Worst-case complexity classes well understood Often finer-grained typical-case hardness also known (easy-hard-easy patterns, phase transitions) How does the picture change when problems combine both feasibility and optimization components? We study this in the context of connection networks Many positive results; some surprising ones!
May 25, 2007CP-AI-OR Outline of the Talk Worst-case vs. typical-case hardness Easy-hard-easy patterns; phase transition The Connection Subgraph Problem Motivation: economics and social networks Combining feasibility and optimality components Theoretical results (NP-hardness of approximation) Empirical study Easy-hard-easy patterns for pure optimality Phase transition Feasibility testing vs. optimization: a clear winner?
May 25, 2007CP-AI-OR Outline of the Talk Worst-case vs. typical-case hardness Easy-hard-easy patterns; phase transition The Connection Subgraph Problem Motivation: economics and social networks Combining feasibility and optimality components Theoretical results (NP-hardness of approximation) Empirical study Easy-hard-easy patterns for pure optimality Phase transition Feasibility testing vs. optimization: a clear winner?
May 25, 2007CP-AI-OR Typical-Case Complexity E.g. consider SAT, the Boolean Satisfiability Problem: Does a given formula have a satisfying truth assignment? Worst-case complexity: NP-complete Unless P = NP, cannot solve all instances in poly-time Of course, need solutions in practice anyway Typical-case complexity: a more detailed picture What about a majority of the instances? How about instances w.r.t. certain interesting parameters? e.g. for SAT: clause-to-variable ratio. Are some regimes easier than others? Can such parameters characterize feasibility?
May 25, 2007CP-AI-OR Key parameter: ratio #constraints / #variables Easy for very low and very high ratios Hard in the intermediate region Complexity peaks at ratio ~ 4.26 Random 3-SAT Random 3-SAT: Easy-Hard-Easy Computational hardness as a function of a key problem parameter [Mitchell, Selman, and Levesque ’92; …]
May 25, 2007CP-AI-OR Coinciding Phase Transition Before critical ratio: almost all formulas satisfiable After critical ratio: almost all formulas unsatisfiable Very sharp transition! Random 3-SAT Phase transition From satisfiable to unsatisfiable
May 25, 2007CP-AI-OR Typical-Case Complexity Is a similar behavior observed in pure optimization problems? How about problems that combine feasibility and optimization components? Goal: Obtain further insights into the problem. Note: very few constraints, e.g., implies easy to solve but not necessarily easy to optimize!
May 25, 2007CP-AI-OR Typical-Case Complexity Known: a few results for pure optimization problems Traveling sales person (TSP) under specialized cost functions like log-normal [Gent,Walsh ’96; Zhang,Korf ’96] We look at the connection subgraph problem Motivated by resource environment economics and social networks (more on this next) A generalized variant of the Steiner tree problem Combines feasibility and optimization components A budget constraint on vertex costs A utility function to be maximized
May 25, 2007CP-AI-OR Outline of the Talk Worst-case vs. typical-case hardness Easy-hard-easy patterns; phase transition The Connection Subgraph Problem Motivation: economics and social networks Combining feasibility and optimality components Theoretical results (NP-hardness of approximation) Empirical study Easy-hard-easy patterns for pure optimality Phase transition Feasibility testing vs. optimization: a clear winner?
May 25, 2007CP-AI-OR Connection Subgraph: Motivation Motivation 1: Resource environment economics Conservation corridors (a.k.a. movement or wildlife corridors) [Simberloff et al. ’97; Ando et al. ’98; Camm et al. ’02] Preserve wildlife against land fragmentation Link zones of biological significance (“reserves”) by purchasing continuous protected land parcels Limited budget; must maximize environmental benefits/utility Reserve Land parcel
May 25, 2007CP-AI-OR Connection Subgraph: Motivation Real problem data: Goal: preserve grizzly bear population in the U.S.A. by creating movement corridors 3637 land parcels (6x6 miles) connecting 3 reserves in Wyoming, Montana, and Idaho Reserves include, e.g., Yellowstone National Park Budget: ~ $2B
May 25, 2007CP-AI-OR Connection Subgraph: Motivation Motivation 2: Social networks What characterizes the connection between two individuals? The shortest path? Size of the connected component? A “good” connected subgraph? [Faloutsos, McCurley, Tompkins ’04] If a person is infected with a disease, who else is likely to be? Which people have unexpected ties to any members of a list of other individuals? Vertices in graph: people; edges: know each other or not
May 25, 2007CP-AI-OR The Connection Subgraph Problem Given An undirected graph G = (V,E) Terminal vertices T V Vertex cost function: c(v); utility function: u(v) Cost bound / budget C; desired utility U Is there a subgraph H of G such that H is connected cost(H) C; utility(H) U ? Cost optimization version: given U, minimize cost Utility optimization version: given C, maximize utility
May 25, 2007CP-AI-OR Main Results Worst-case complexity of the connection subgraph problem: NP-hard even to approximate Typical-case complexity w.r.t. increasing budget fraction 1. Without terminals: pure optimization version, always feasible, still a computational easy-hard-easy pattern 2. With terminals: a) Phase transition: Problem turns from mostly infeasible to mostly feasible at budget fraction ~ 0.13 b) Computational easy-hard-easy pattern coinciding with the phase transition c) Surprisingly, proving optimality can be substantially easier than proving infeasibility in the phase transition region
May 25, 2007CP-AI-OR Outline of the Talk Worst-case vs. typical-case hardness Easy-hard-easy patterns; phase transition The Connection Subgraph Problem Motivation: economics and social networks Combining feasibility and optimality components Theoretical results (NP-hardness of approximation) Empirical study Easy-hard-easy patterns for pure optimality Phase transition Feasibility testing vs. optimization: a clear winner?
May 25, 2007CP-AI-OR Theoretical Results: 1 NP-completeness: reduction from the Steiner Tree problem, preserving the cost function. Idea: Steiner tree problem already very similar Simulate edge costs with node costs Simulate terminal vertices with utility function NP-complete even without any terminals Recall: Steiner tree problem poly-time solvable with constant number of terminals Also holds for planar graphs
May 25, 2007CP-AI-OR v1v1 vnvn v2v2 v3v3 … … Theoretical Results: 2 NP-hardness of approximating cost optimization (factor 1.36): reduction from the Vertex Cover problem Reduction motivated by Steiner tree work [Bern, Plassmann ’89] vertex cover of size k iff connection subgraph with cost bound C = k and utility U = m
May 25, 2007CP-AI-OR Outline of the Talk Worst-case vs. typical-case hardness Easy-hard-easy patterns; phase transition The Connection Subgraph Problem Motivation: economics and social networks Combining feasibility and optimality components Theoretical results (NP-hardness of approximation) Empirical study Easy-hard-easy patterns for pure optimality Phase transition Feasibility testing vs. optimization: a clear winner?
May 25, 2007CP-AI-OR Experimental Setup Study parameter: budget fraction (budget as a fraction of the sum of all node costs) How are problem feasibility and hardness affected as the budget fraction is varied? Algorithm: CPLEX on a Mixed Integer Programming (MIP) model
May 25, 2007CP-AI-OR The MIP Model Variables: x i {0,1} for each vertex i (included or not) Cost constraint: i c i x i C Utility optimization function: maximize i u i x i Connectedness: use a network flow encoding
May 25, 2007CP-AI-OR The MIP Model: Connectedness New source vertex 0, connected to arbitrary terminal t (slightly different construction when no terminals) Initial flow sent from 0 equals number of vertices New variables y i,j Z + for each directed edge (i,j) (flow from i to j) Flow passes through i iff v i retains 1 unit of flow Each terminal t retains 1 unit of flow Conservation of flow constraints
May 25, 2007CP-AI-OR Graphs for Evaluation Problem evaluated on semi-structured graphs m x m lattice / grid graph with k terminals Inspired by the conservation corridors problem Place a terminal each on top-left and bottom-right Maximizes grid use Place remaining terminals randomly Assign uniform random costs and utilities from {0, 1, …, 10} m = 4 k = 4
May 25, 2007CP-AI-OR Results: without terminals No terminals “find the connected component that maximizes the utility within the given budget” Pure optimization problem; always feasible Still NP-hard Budget fraction Runtime (logscale) x 6 8 x 8 10 x 10 A clear easy-hard-easy pattern with uniform random costs & utilities Note 1: plot in log-scale for better viewing of the sharp transitions Note 2: each data point is median over 100+ random instances
May 25, 2007CP-AI-OR Results: with terminals Easy-hard-easy pattern, peaking at budget fraction ~ 0.13 Sharp phase transition near 0.13: from infeasible to feasible Note: not in log scale
May 25, 2007CP-AI-OR Results: feasibility vs. optimization Split instances into feasible and infeasible; plot median runtime For feasible ones : computation involves proving optimality For infeasible ones: computation involves proving infeasibility Infeasible instances take much longer than the feasible ones!
May 25, 2007CP-AI-OR With 10 Terminals The results are even more striking. Median times: Hardest instances : 1,200 sec Hardest feasible instances: 200 sec Hardest infeasible instances : 30,000 sec (150x)
May 25, 2007CP-AI-OR With 20 Terminals The phenomena still clearly present Instances a bit easier than for 10 terminals. Median times: Hardest instances : 340 sec Hardest feasible instances : 60 sec Hardest infeasible instances: 7,000 sec (110x)
May 25, 2007CP-AI-OR Other Observations Peak for pure optimality component without terminals (~0.2) is slightly to the right of the peak for feasibility component (~0.13) Easy-hard-easy pattern also w.r.t. number of terminals 3 terminals: easy, 10: hard, 20 again easy Intuitively, more terminals are harder to connect +++ leave fewer choices for other vertices to include Competing constraints a hard intermediate region
May 25, 2007CP-AI-OR Could Other Models / Solvers Significantly Change the Picture? Perhaps, although some other natural options appear unlikely to. Within Cplex, first check for feasibility then apply optimization Problem: checking feasibility of the cost constraint equivalent to the metric Steiner tree problem; solvable in O(n k+1 ), which grows quickly with #terminals. Also, unlikely to be Fixed Parameter Tractable (FPT) [cf. Promel, Steger ’02] Constraint Prog. (CP) model more promising for feasibility? Problem: appears promising only as a global constraint, but hard to filter efficiently (unlikely to be FPT); Also, weighted sum not easy to optimize with CP.
May 25, 2007CP-AI-OR Summary Combining feasibility and optimization components can result in intriguing typical-case properties Connection subgraphs: NP-hard to approximate Clear easy-hard-easy patterns and phase transitions Feasibility testing can be much harder than optimization