Sharlee Climer Department of Computer Science and Engineering

Slides:



Advertisements
Similar presentations
Thursday, April 11 Some more applications of integer
Advertisements

G5BAIM Artificial Intelligence Methods
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,
Lecture 10: Integer Programming & Branch-and-Bound
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
The 2 Period Travelling Salesman Problem Applied to Milk Collection in Ireland By Professor H P Williams,London School of Economics Dr Martin Butler, University.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Techniques for Computing and Using Bounds for Combinatorial Optimization Problems Sharlee Climer and Weixiong Zhang Department of Computer Science and.
Integer Programming 3 Brief Review of Branch and Bound
Computational Methods for Management and Economics Carla Gomes
Approximation Algorithms
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in.
LP formulation of Economic Dispatch
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Search Methods An Annotated Overview Edward Tsang.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.
A Linear Search Strategy Using Bounds Sharlee Climer and Weixiong Zhang.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Optimization Problems
CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
IE 312 Review 1. The Process 2 Problem Model Conclusions Problem Formulation Analysis.
Introduction to Integer Programming Integer programming models Thursday, April 4 Handouts: Lecture Notes.
Systematic Bounding Techniques for Combinatorial Optimization Sharlee Climer and Weixiong Zhang Department of Computer Science and Engineering Washington.
Management Science 461 Lecture 7 – Routing (TSP) October 28, 2008.
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
Chapter 6 Optimization Models with Integer Variables.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Discrete Optimization MA2827 Fondements de l’optimisation discrète Material from P. Van Hentenryck’s course.
Optimization Problems
Hard Problems Some problems are hard to solve.
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
Data Structures Lab Algorithm Animation.
Signal processing and Networking for Big Data Applications: Lecture 9 Mix Integer Programming: Benders decomposition And Branch & Bound NOTE: To change.
Lecture 11: Tree Search © J. Christopher Beck 2008.
The CPLEX Library: Mixed Integer Programming
Design and Analysis of Algorithm
Introduction to Operations Research
Comparing Genetic Algorithm and Guided Local Search Methods
1.3 Modeling with exponentially many constr.
metaheuristic methods and their applications
How Accurate is Pure Parsimony Haplotype Inferencing
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 7, 2000
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
Integer Programming (정수계획법)
Optimization Problems
1.206J/16.77J/ESD.215J Airline Schedule Planning
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
Multi-Objective Optimization
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
1.3 Modeling with exponentially many constr.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Branch and Bound Searching Strategies
Integer Programming (정수계획법)
Topic 15 Job Shop Scheduling.
Chapter 6 Network Flow Models.
Backtracking and Branch-and-Bound
Traveling Salesman Problem by Genetic Algorithm
Major Design Strategies
Alex Bolsoy, Jonathan Suggs, Casey Wenner
Branch-and-Bound Algorithm for Integer Program
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
REVIEW FOR EXAM 1 Chapters 3, 4, 5 & 6.
Presentation transcript:

A Formalization of the Use of Bounds with Applications in Biology and Engineering Sharlee Climer Department of Computer Science and Engineering Department of Biology Washington University in St. Louis This research was funded in part by NDSEG and Olin Fellowships, and by NSF grants IIS-0196057, ITR/EIA-0113618, and IIS-0535257.

Washington University in St. Louis Overview Introduction Limit crossing Cut-and-solve TSP Haplotyping 11/27/2018 Washington University in St. Louis

Washington University in St. Louis upper bound The use of bounds optimal solution Used in a number of search strategies as well as a large number of algorithms for particular problems. Many of these algorithms use bounds implicitly and it is never stated that bounds have been used. lower bound 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Use of bounds Bounds have been extensively studied in both computer science and operations research Pruning rules in branch-and-bound search Previous efforts focused on relaxations Vast number of ways that bounds can be produced 11/27/2018 Washington University in St. Louis

Formulation and notation Techniques presented can be applied to a variety of optimization problems We’ll use integer linear programs (IPs) as basic problem structure Without loss of generality, we consider only minimization problems 11/27/2018 Washington University in St. Louis

Integer Linear Programs Great number of research and engineering problems CS applications: Traveling Salesman Problem Constraint Satisfaction Problem Robotic motion problems Clustering Multiple sequence alignment Haplotype inferencing VLSI circuit design Computer disk read head scheduling Derivation of physical structures of programs Delay-Tolerant Network routing Cellular radio network base station locations Minimum-energy multicast problem in wireless ad hoc networks In addition to STRIPS-style problems, IPs have been used to model a number of additional AI problems such as… Defend the use of IPs for the model. Remind general ideas presented should be applicable in other domains. 11/27/2018 Washington University in St. Louis

Integer Linear Programs Minimize Z = Sci xi (objective function) Subject to: a set of linear constraints xi integer If xi integer constraints omitted, would have a linear program (LP) Minimize a linear expression. Define objective function, decision variables, and solution space. LPs are easy to solve, IPs usually are not. 11/27/2018 Washington University in St. Louis

Linear program example Minimize Z = -11x + 4y Subject to: 3x + 8y <= 40 11x - 8y <= 16 x,y >= 0 Integrality not required. Easily solved using simplex. 11/27/2018 Washington University in St. Louis

Linear program example Minimize Z = -11x + 4y y = 11/4 x + Z/4 Family of parallel lines with slope of 11/4 and unknown y-intercept 11/27/2018 Washington University in St. Louis

Linear program example Optimal solution x = 4 y = 7/2 Z = -30 Optimal solution is always on a vertex or edge 11/27/2018 Washington University in St. Louis

Integer linear program Minimize Z = -11x + 4y Subject to: 3x + 8y <= 40 11x - 8y <= 16 x,y >= 0 x,y integer Optimal solution x = 3 y = 3 Z = -21 Relaxing integrality is a lower bound. LP easy to solve. IP may be NP-hard. 11/27/2018 Washington University in St. Louis

The Traveling Salesman Problem The Traveling Salesman Problem (TSP) is the problem of finding a minimum cost complete tour of a set of cities NP-hard 11/27/2018 Washington University in St. Louis

Optimal solution for 49-city TSP 11/27/2018 Washington University in St. Louis

The Traveling Salesman Problem Minimize Z = SScij xij s.t.: Sxij = 1 for j = 1,…,n Sxij = 1 for i = 1,…,n SSxij <= |W| - 1, for all proper non- empty subsets W of V xij = {0,1} 11/27/2018 Washington University in St. Louis

Branch-and-bound search Branching rules Determine structure of search tree Relaxations Lower-bounding modification Pruning Heuristics to guide search 11/27/2018 Washington University in St. Louis

TSP: Omit subtour elimination constraints Minimize Z = SScij xij s.t.: Sxij = 1 for j = 1,…,n Sxij = 1 for i = 1,…,n SSxij <= |W| - 1, for all proper non- empty subsets W of V xij = {0,1} The assignment problem Can be solved in polynomial time Insert picture of 49-city TSP with subtours 11/27/2018 Washington University in St. Louis

TSP: Omit subtour elimination constraints 11/27/2018 Washington University in St. Louis

TSP: Relax integrality constraints Minimize Z = SScij xij s.t.: Sxij = 1 for j = 1,…,n Sxij = 1 for i = 1,…,n SSxij <= |W| - 1, for all proper non- empty subsets W of V xij = {0,1} 0 <= xij <= 1 Linear program (LP) relaxation Can be solved in polynomial time Insert picture of 49-city TSP with subtours 11/27/2018 Washington University in St. Louis

TSP: Relax integrality constraints 11/27/2018 Washington University in St. Louis

Branch-and-bound search Incumbent solution 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Limit crossing A 2-step procedure for exploring the use of bounds Has been implicitly used in a number of algorithms and search strategies To our knowledge, hasn’t been formalized Broaden focus beyond traditional search 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Limit crossing 2 steps: (1) Find a simple upper or lower bound (2) Combine upper-bounding and lower- bounding modifications and solve If solution of the doubly-modified problem exceeds the simple upper bound, upper-bounding modification in step (2) is invalid If solution of doubly-modified problem is less than the simple lower bound, lower-bounding modification in step (2) is invalid 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Limit crossing Find a simple upper or lower bound that is tight Systematically apply modifications to produce doubly-modified problems Either modification can be difficult to solve Only need the combination of the two modifications to be relatively easy Not limiting ourselves to setting variable values for upper-bounding modification of doubly-modified problem. 11/27/2018 Washington University in St. Louis

Modifications to obtain bounds Many possibilities for obtaining bounds have been previously overlooked Examine every aspect of problem description Modifications of IPs to produce bounds Relaxing or tightening constraints Modifying objective function Adding or deleting decision variables Use simple example problem to demonstrate. 11/27/2018 Washington University in St. Louis

Limit crossing strategies Cut-and-solve [Climer and Zhang, Artificial Intelligence, to appear] An iterative search strategy Useful for general combinatorial optimization problems Backbone and fat identifier [Climer and Zhang, AAAI-02] Used to identify characteristic variables 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Cut-and-solve For each iteration: Step 1: A chunk of the solution space is cut away and solved Step 2: A relaxed solution is found for remaining solution space Iterate until relaxed solution is greater than or equal to incumbent Incumbent is guaranteed to be optimal 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Example x >= 0 y <= 3 y + 13/6 x <= 9 y – 5/13 x >= 1/14 y + 3/5 x >= 6/5 x,y integers 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Optimal solution Minimize Z = y – 4/5 x x = 2 y = 1 Z = -0.6 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 1, first step Cut away a chunk of the solution space: y – 17/3 x >= -14 and solve sparse problem 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 1, first step x = 3 y = 2 Z = -0.4 Incumbent solution is -0.4 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 1, second step Add new constraint: y – 17/3 x <= -14 to cut off chunk of solution space Relax integrality and solve 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 1, second step x = 2.6 y = 1.0 Z = -1.1 Incumbent solution is -0.4, so need to run another iteration 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 2, first step Cut away a chunk of the solution space and solve sparse problem 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 2, first step x = 2 y = 1 Z = -0.6 This solution is less than incumbent, so incumbent becomes -0.6 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 2, second step Add constraint to cut off solved chunk Relax integrality and solve 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Iteration 2, second step x = 1.0 y = 0.6 Z = -0.2 Incumbent value: Z = -0.6 Solution is greater than incumbent, so incumbent must be optimal 11/27/2018 Washington University in St. Louis

Cut-and-solve properties Nominal memory requirements Keep new constraints and incumbent solution from one iteration to the next No subtrees in which to get lost Can be used as complete anytime solver Can use parallel processing 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Cut-and-solve Same as two steps of limit crossing Small chunk is solved to provide simple upper bound Doubly-modified problem Piercing cuts Relaxation Unusual upper-bounding modification 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Cut-and-solve We used generic algorithm for TSP [Artificial Intelligence, to appear] 7 real-world problem classes [Cirasella, Johnson, McGeoch, Zhang, Lecture Notes in Computer Science, 2000] 500 instances solved for each class and size Comparisons with: CDT [Carpaneto, Dell’Amico, and Toth, ACM Trans. On Math. Software, 1995] Concorde [Applegate et al. www.tsp.gatech.edu] Cplex [ILOG www.ilog.com] STSPs are hard if very large, our code not designed for very large problems (arc lengths computed on the fly). A simple implementation, yet out performs state-of-the-art solvers on difficult instances. 11/27/2018 Washington University in St. Louis

Shortest common superstring 11/27/2018 Washington University in St. Louis

Tilted drilling machine (additive norm) 11/27/2018 Washington University in St. Louis

Tilted drilling machine (sup norm) 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Stacker crane 11/27/2018 Washington University in St. Louis

Computer disk read head 11/27/2018 Washington University in St. Louis

Pay phone coin collection 11/27/2018 Washington University in St. Louis

Washington University in St. Louis No-wait flow shop 11/27/2018 Washington University in St. Louis

Largest problem size solved by each method 11/27/2018 Washington University in St. Louis

Moving beyond traditional tree search Cut-and-solve Backbone & fat identifier 11/27/2018 Washington University in St. Louis

Haplotype inferencing What are haplotypes? Why should we care about them? How can we infer haplotypes? 11/27/2018 Washington University in St. Louis

Haplotype inferencing …TGGCACTTCCGAACTTTG… …TGGTACTTCCGAACATTG… …TGGCACTGCCGAACATTG… …TGGCACTGCCGAACTTTG… 11/27/2018 Washington University in St. Louis

Haplotype inferencing …TGGCACTTCCGAACTTTG… …TGGTACTTCCGAACATTG… …TGGCACTGCCGAACATTG… …TGGCACTGCCGAACTTTG… 11/27/2018 Washington University in St. Louis

Haplotype inferencing …C T T… …T T A… …C G A… …C G T… 11/27/2018 Washington University in St. Louis

Haplotype inferencing …C T T… …0 0 1… …T T A… …1 0 0… …C G A… …0 1 0… …C G T… …0 1 1… 11/27/2018 Washington University in St. Louis

Haplotype inferencing …0 0 1… …1 0 0… …0 1 0… …0 1 1… …2 0 2… …0 1 2… 11/27/2018 Washington University in St. Louis

Haplotype inferencing If a site on a genotype is the product of two different nucleotides, it is heterozygous Else it is homozygous 2k-1 feasible resolutions for k heterozygous sites 11/27/2018 Washington University in St. Louis

Haplotype inferencing Example: g1: 2 1 0 1 2 g2: 1 1 0 2 1 g3: 0 1 1 2 0 g4: 1 2 0 0 1 g5: 2 2 1 0 2 g6: 0 1 1 0 0 g7: 1 1 1 0 2 g8: 0 2 2 1 1 11/27/2018 Washington University in St. Louis

Washington University in St. Louis g1: 2 1 0 1 2 g2: 1 1 0 2 1 01010 , 11011 11011 , 11001 01011 , 11010 g3: 0 1 1 2 0 g4: 1 2 0 0 1 01110 , 01100 11001 , 10001 g5: 2 2 1 0 2 g6: 0 1 1 0 0 01100 , 10101 01100 , 01100 01101 , 10100 00100 , 11101 00101 , 11100 g7: 1 1 1 0 2 g8: 0 2 2 1 1 11100 , 11101 01011 , 00111 01111 , 00011 11/27/2018 Washington University in St. Louis

Washington University in St. Louis 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Why do we care? Genetic association studies use haplotypes Identify relationships between genes and diseases International HapMap Consortium Identify genotypes, use PHASE [Stephens and Donnelly, Am. J. of Hum. Gen., 2003] “haplotypes of extremely high quality” [The International HapMap Consortium, Nature, 2005] 11/27/2018 Washington University in St. Louis

How can we infer haplotypes? Consider genotypes from a population Different objectives have been proposed Pure parsimony PHASE 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Pure parsimony Find minimum number of haplotypes that will resolve the set Exponential time (worst case) Gusfield cast as an IP [CPM 2003] Solved some instances with 30 sites and 50 individuals Doesn’t consider similarities of haplotypes 11/27/2018 Washington University in St. Louis

12 parsimonious solutions: 11 haplotypes 11/27/2018 Washington University in St. Louis

Washington University in St. Louis PHASE Weights used to select haplotype pairs that have one already in the set Weights for haplotypes that are “similar” to those in the set Divide-and-conquer 11/27/2018 Washington University in St. Louis

PHASE solution: 11 haplotypes 11/27/2018 Washington University in St. Louis

Washington University in St. Louis PHASE solution: S dij = 7 11/27/2018 Washington University in St. Louis

Haplotype inferencing Recent study by Andres, Clark, Hixson, Boerwinkle, and Sing Computational methods including PHASE Poor performance Degree of uncertainty “highly error prone” 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Three challenges Find biologically meaningful model Space complexity Time complexity 11/27/2018 Washington University in St. Louis

Haplotype inferencing PHASE Favors reduced cardinality Favors increased similarities Our method Favors reduced cardinality and increased similarities Combinatorial approach Use a single parameter d 11/27/2018 Washington University in St. Louis

Washington University in St. Louis 11 haplotypes 11/27/2018 Washington University in St. Louis

Washington University in St. Louis S dij = 6 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Summary Limit crossing 2-step procedure for using bounds Explore every facet of model Cut-and-solve Generic algorithm for IPs TSP Outperformed other solvers for 5 out of 7 problem classes Haplotyping 11/27/2018 Washington University in St. Louis

Washington University in St. Louis Future work Haplotyping Customized limit crossing approach Accommodate multi-allelic data Automatically reduce trio data Accept phased data Genome-wide association testing Combinatorial approaches to biological problems 11/27/2018 Washington University in St. Louis