Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

Slides:



Advertisements
Similar presentations
Branch-and-Bound Technique for Solving Integer Programs
Advertisements

© Imperial College London Eplex: Harnessing Mathematical Programming Solvers for Constraint Logic Programming Kish Shen and Joachim Schimpf IC-Parc.
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Algorithms + L. Grewe.
Lecture 10: Integer Programming & Branch-and-Bound
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Introduction to Linear and Integer Programming
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structural bioinformatics
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
Computational Methods for Management and Economics Carla Gomes
A Combinatorial Maximum Cover Approach to 2D Translational Geometric Covering Karen Daniels, Arti Mathur, Roger Grinde University of Massachusetts Lowell.
1 A Second Stage Network Recourse Problem in Stochastic Airline Crew Scheduling Joyce W. Yen University of Michigan John R. Birge Northwestern University.
Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
1 Branch and Bound Searching Strategies 2 Branch-and-bound strategy 2 mechanisms: A mechanism to generate branches A mechanism to generate a bound so.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
Protein Encoding Optimization Student: Logan Everett Mentor: Endre Boros Funded by DIMACS REU 2004.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
1 Contents college 3 en 4 Book: Appendix A.1, A.3, A.4, §3.4, §3.5, §4.1, §4.2, §4.4, §4.6 (not: §3.6 - §3.8, §4.2 - §4.3) Extra literature on resource.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
LP formulation of Economic Dispatch
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
Integer programming Branch & bound algorithm ( B&B )
Operations Research Models
Clearing Algorithms for Barter Exchange Markets: Enabling Nationwide Kidney Exchanges Hyunggu Jung Computer Science University of Waterloo Oct 6, 2008.
Bold Stroke January 13, 2003 Advanced Algorithms CS 539/441 OR In Search Of Efficient General Solutions Joe Hoffert
1 Outline:  Outline of the algorithm  MILP formulation  Experimental Results  Conclusions and Remarks Advances in solving scheduling problems with.
Mathematical Programming for Optimisation Mike Morgan and Vic Grout Centre for Applied Internet Research (CAIR) University of Wales NEWI Plas Coch Campus,
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
Types of IP Models All-integer linear programs Mixed integer linear programs (MILP) Binary integer linear programs, mixed or all integer: some or all of.
MIT and James Orlin1 NP-completeness in 2005.
MILP algorithms: branch-and-bound and branch-and-cut
Chapter 3 Computational Molecular Biology Michael Smith
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Linear Programming Erasmus Mobility Program (24Apr2012) Pollack Mihály Engineering Faculty (PMMK) University of Pécs João Miranda
15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.
1 Outline:  Optimization of Timed Systems  TA-Modeling of Scheduling Tasks  Transformation of TA into Mixed-Integer Programs  Tree Search for TA using.
“LOGISTICS MODELS” Andrés Weintraub P
Column Generation By Soumitra Pal Under the guidance of Prof. A. G. Ranade.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
Static Process Scheduling
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
IE 312 Review 1. The Process 2 Problem Model Conclusions Problem Formulation Analysis.
Branch and Bound Searching Strategies
TU/e Algorithms (2IL15) – Lecture 13 1 Wrap-up lecture.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
1 Chapter 6 Reformulation-Linearization Technique and Applications.
Water resources planning and management by use of generalized Benders decomposition method to solve large-scale MINLP problems By Prof. André A. Keller.
The minimum cost flow problem
The CPLEX Library: Mixed Integer Programming
Algorithms for Routing Node-Disjoint Paths in Grids
CSCI1600: Embedded and Real Time Software
Chapter 6. Large Scale Optimization
MIP Tools Branch and Cut with Callbacks Lazy Constraint Callback
Branch-and-Bound Algorithm for Integer Program
Chapter 6. Large Scale Optimization
Discrete Optimization
Presentation transcript:

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation

CMSC 838T – Presentation Motivation u Problem paper is trying to solve  3D structure prediction using threading  Is a given target sequence likely to fold to a 3D template core?  Find the alignment that minimizes some score function  NP-complete; optimal solution not possible  MAX-SNP-hard; arbitrary approximation not possible u Why do we care  3D structure determines biological function of protein  Amino acid sequence (almost) uniquely determines 3D structure  Threading is usually less accurate than comparative modeling but easier to solve

CMSC 838T – Presentation Talk Overview u Overview of talk  Motivation  Techniques  Evaluation  Related work  Observations

CMSC 838T – Presentation Techniques u Approach  Reduce the problem to some known theoretical problem of interest l In this case, network flow  Use existing tools for solving the theoretical problem efficiently l CPLEX  Explore possibilities for parallelizing the problem  Investigate the intrinsic hardness for real biological examples

CMSC 838T – Presentation Two structurally similar proteins Spatial adjacencies (interactions) Possible threading with a sequence Objective function Mathematical Formulation

CMSC 838T – Presentation Reduction to Network Flow: An Example

CMSC 838T – Presentation Reduction to Network Flow: Variables and Constraints u Standard Network Flow  Variable x i,t for each segment to position assignment  Restricted to [0, 1]  With standard flow conservation constraints u Additional cost for non-local interactions  Variable z i,t,i’,t’ for each non-local interaction  Restricted to {0, 1}  Constrained to sum to 1 for each non-local pair (i, i’)  Upper bounded by flow entering (i, t) and leaving (i’, t’)

CMSC 838T – Presentation Drawbacks of Approach u Integer programming is hard to solve!  Relax to linear programming with (0, 1) variables  Approximate to integer solution using standard heuristics  Existing tools like CPLEX u Huge number of variables  For 36 segments and 81 positions, IP problem has rows, columns and non-zero variables!  Need to reduce number of variables and constraints  Calls for parallelization if possible

CMSC 838T – Presentation Parallel Solution u Utilize special flow constraints  Split into sub-problems that may be solved parallely  Split the k-th layer in the graph into r intervals  Force path for a sub-problem to pass through a particular interval in the layer  Pass best bound for objective function found so far as parameter to sub-problem  Sub-task aborts when dual objective function exceeds the current best bound

CMSC 838T – Presentation Improving Parallel Solution u Drawback: Hardest Sub-Problem Dominates!  Parallel strategy was found to be slower than the sequential!  Sub-problems can potentially become harder to solve  Many more difficult sub-problems than easy ones u Solution:  Break the atomicity of the tasks  Each sub-task periodically checks the current best bound and updates its cut-off  Extra overhead is still small compared to task granularity  Now the easiest executing sub-task dominates!

CMSC 838T – Presentation Evaluation u Experimental environment  Real protein sequences  ILOG CPLEX Callable Library  SUN Ultra-Sparc II, 450 Mhz  Objective function coefficients generated from FROST  Maximum of 7 processors and 29 sub-problems u Evaluation results  Sequential version much faster than previous branch-and- bound results for the same problem formulation  Time taken comparable to PROSPECT  Splitting and parallelization significantly improve turnaround  Really tiny gap between relaxed LP and ILP solutions  Mostly integer solutions even for relaxed LP!

CMSC 838T – Presentation Result Tables Comparison with branch and bound algorithm Comment: Self threading results in significantly lower scores (as should be)

CMSC 838T – Presentation Result Tables Gap between relaxed LP and ILP Comment: Tiny relaxation gap. (significance?)

CMSC 838T – Presentation Result Tables Size of the LP formulation Comment: LP problem size is still too large.

CMSC 838T – Presentation Result Tables Performance with parallel sub-tasks Comment: Longer times with more sub-problems??

CMSC 838T – Presentation Related Work u Similar / previous approaches  Lathrop and Smith, 1998 l Uses same cost function l Branch and bound algorithm for searching the space of threadings  Xu, Xu and Uberbacher, 1998 l Divide and conquer algorithm  Xu, Li, Lin, Kim and Xu, 2003 l Linear programming formulation l Solved using b&b algorithm u None of the above suggest any parallelizing scheme

CMSC 838T – Presentation Observations u Points of Interest  Mapping to a known problem of interest  Nicely utilizes particular constraints to break into independent subtasks  Threading of real amino acid sequences seems possible  Raises interesting questions about real-life protein threading being in P  Solver tailored for this particular problem may yield better results

CMSC 838T – Presentation Observations u Criticism  Not enough experiments with large number of subtasks and processors to show scaling  Prohibitively large number of variables and constraints  How accurate are the objective function coefficients?  What is the resolution of the objective function?  Threading onto multiple sequences for prediction still looks daunting  Not clear how to extend the idea for 3-way and more complex interactions u Improvements  Seems possible to break up the sub-tasks recursively

CMSC 838T – Presentation Thank you!