May 25, 20042004 GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of.

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Optimal Testing of Digital Microfluidic Biochips: A Multiple Traveling Salesman Problem R. Garfinkel 1, I.I. Măndoiu 2, B. Paşaniuc 2 and A. Zelikovsky.
Reference Assisted Nucleic Acid Sequence Reconstruction from Mass Spectrometry Data Gabriel Ilie 1, Alex Zelikovsky 2 and Ion Măndoiu 1 1 CSE Department,
The number of edge-disjoint transitive triples in a tournament.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Border Length Minimization in DNA Array Design A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)
Design and Optimization of Universal DNA Arrays Ion Mandoiu CSE Department & BME Program University of Connecticut.
Primer Selection Methods for Detection of Genomic Inversions and Deletions via PAMP Bhaskar DasGupta, University of Illinois at Chicago Jin Jun, and Ion.
Evaluation of Placement Techniques for DNA Probe Array Layout Andrew B. Kahng 1 Ion I. Mandoiu 2 Sherief Reda 1 Xu Xu 1 Alex Zelikovsky 3 (1) CSE Department,
HCS Clustering Algorithm
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.
1 Traveling Salesman Problem (TSP) Given n £ n positive distance matrix (d ij ) find permutation  on {0,1,2,..,n-1} minimizing  i=0 n-1 d  (i),  (i+1.
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department.
Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference Ion Mandoiu University of Connecticut CS&E Department.
Symmetric Connectivity With Minimum Power Consumption in Radio Networks G. Calinescu (IL-IT) I.I. Mandoiu (UCSD) A. Zelikovsky (GSU)
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.
1 Combinatorial Optimization Methods for Reliable Genomic-Based Detection Systems Ion Mandoiu University of Connecticut Computer Science & Engineering.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
APBC Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander.
Optimization Methods for Reliable Genomic- Based Pathogen Detection Systems K.M. Konwar, I.I. Mandoiu, A.C. Russell, and A.A. Shvartsman Computer Science.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Gene expression & Clustering (Chapter 10)
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Clearing Algorithms for Barter Exchange Markets: Enabling Nationwide Kidney Exchanges Hyunggu Jung Computer Science University of Waterloo Oct 6, 2008.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Informative SNP Selection Based on Multiple Linear Regression
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
© Yamacraw, Fall 2002 Power Efficient Range Assignment in Ad-hoc Wireless Networks E. Althous (MPI) G. Calinescu (IL-IT) I.I. Mandoiu (UCSD) S. Prasad.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Iterative Rounding in Graph Connectivity Problems Kamal Jain ex- Georgia Techie Microsoft Research Some slides borrowed from Lap Chi Lau.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Ion I. Mandoiu, Vijay V. Vazirani Georgia Tech Joseph L. Ganley Simplex Solutions A New Heuristic for Rectilinear Steiner Trees.
Approximation Algorithms based on linear programming.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Approximation algorithms
Confidential & Proprietary – All Rights Reserved Internal Distribution, October Quality of Service in Multimedia Distribution G. Calinescu (Illinois.
Computability and Complexity
Microarray Synthesis through Multiple-Use PCR Primer Design
Finding Subgraphs with Maximum Total Density and Limited Overlap
Ion Mandoiu Computer Science & Engineering Department
EE5900 Advanced Embedded System For Smart Infrastructure
Presentation transcript:

May 25, GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of Connecticut CS&E Department

May 25, GSU Biotech Symposium2 Combinatorial Optimization Applications in Bioinformatics Fast growing number of applications –Dynamic Programming & Integer Programming in sequence alignment –TSP and Euler paths in DNA sequencing –Integer Programming in Haplotype inference –Integer Programming & approximation algorithms for efficient pathogen identification (string barcoding) –…

May 25, GSU Biotech Symposium3 High-Thrughput Assay Design New source of combinatorial problems –Microarray probe selection –Mask design for Affy arrays –Universal tag arrays –Self-assembling microarrays –Quality control –… –This talk: Multiplex PCR primer set selection Optimization goals –Improved speed –High reliability –Reduced COST

May 25, GSU Biotech Symposium4 Outline Motivation and problem formulations Greedy algorithm for primer set selection with amplification length constraints LP-rounding algorithm for primer set selection with uniqueness constraints Experimental results Conclusions

May 25, GSU Biotech Symposium5 Uniplex PCR …

May 25, GSU Biotech Symposium6 Primer Pair Selection Problem  L L Forward primer Reverse primer amplification locus 3'3' 3'3' 5'5' 5'5'  L L Given: Genomic sequence around amplification locus Primer length k Amplification upperbound L Find: Forward and reverse primers of length k that hybridize within a distance of L of each other and optimize amplification efficiency (melting temperatures, secondary structure, cross hybridization, etc.)

May 25, GSU Biotech Symposium7 Motivation for Primer Set Selection (1) Spotted microarray synthesis [Fernandes and Skiena’02] –Need unique pair for each amplification product, but primers can be reused to minimize cost –Potential to reduce #primers from O(n) to O(n 1/2 ) for n products

May 25, GSU Biotech Symposium8 Motivation for Primer Set Selection (2) SNP Genotyping –Thousands of SNPs that must genotyped using hybridization based methods (e.g., SBE) –Selective PCR amplification needed to improve accuracy of detection steps (whole-genome amplification not appropriate) –No need for unique amplification! –Primer minimization is critical Fewer primers to buy Fewer multiplex PCR reactions

May 25, GSU Biotech Symposium9 Primer Set Selection Problem Given: Genomic sequences around each amplification locus Primer length k Amplification upperbound L Find: Minimum size set of primers S of length k such that, for each amplification locus, there are two primers in S hybridizing to the forward and reverse sequences within a distance of L of each other For some applications: S should contain a unique pair of primers amplifying each each locus

May 25, GSU Biotech Symposium10 Previous Work (1) [Pearson et al. 96][Linhart&Shamir’02][Souvenir et al.’03] - Separately select forward and reverse primers - To enforce bound of L on amplification length, select only primers that are within a distance of L/2 of the target SNP Ignores half of the feasible primer pairs Solution can increase by a factor of O(n) by ignoring them! Greedy set cover algorithm gives O(ln n) approximation factor for this formulation Cannot approximate better unless P=NP

May 25, GSU Biotech Symposium11 Previous Work (2) [Fernandes&Skiena’02] model primer selection as a minimum multicolored subgraph problem: Vertices of the graph correspond to candidate primers There is an edge colored by color i between primers u and v if they hybridize to i-th forward and reverse sequences within a distance of L Goal is to find minimum size set of vertices inducing edges of all colors No non-trivial approximation factor known previously

May 25, GSU Biotech Symposium12 Selection w/o Uniqueness Constraints Can be seen as a “simultaneous set covering” problem: - The ground set is partitioned into n disjoint sets, each with 2L elements - The goal is to select a minimum number of sets (== primers) that cover at least half of the elements in each partition Naïve modifications of the greedy set cover algorithm do not work Key idea: use potential function  for a partial solution P = minium number of elements that are not yet covered as measure of infeasibility Initially,  = nL For feasible solutions,  = 0

May 25, GSU Biotech Symposium13 Potential-Function Driven Greedy 1.Select a primer that decreases the potential function  by the largest amount (breaking ties arbitrarily) 2.Repeat until feasibility is achived Lemma: Each greedy selection reduces  by a factor of at least (1-1/OPT) Theorem: The number of primers selected by the greedy algorithm is at most ln(nL) larger than the optimum

May 25, GSU Biotech Symposium14 Selection w/ Uniqueness Constraints Can be modeled as minimum multicolored subgraph problem: add edge colored by color i between two primers if they amplify i- th SNP and do not amplify any other SNP Trivial approximation algorithm: select 2 primers for each SNP O(n 1/2 ) approximation since at least n 1/2 primers required by every solution Non-trivial approximation?

May 25, GSU Biotech Symposium15 Integer Program Formulation Variable x u for every vertex (candidate primer) u - x u set to 1 if u is selected, and to 0 otherwise Variable y e for every edge e - y e set to 1 if corresponding primer pair selected to amplify one of the SNPs Objective: minimize sum of x u ’s Constraints: - for each i, sum of {y e : e amplifying SNP i}  1 - y e  x u for every e incident to u

May 25, GSU Biotech Symposium16 LP-Rounding Algorithm 1.Solve linear programming relaxation 2.Select node u with probability x u Theorem: With probability of at least 1/3, the number of selected nodes is within a factor of O(m 1/2 lnn) of the optimum, where m is the maximum number of edges sharing the same color. For primer selection, m  L 2  approximation factor is O(Lln n)

May 25, GSU Biotech Symposium17 Experimental Setting SNP sets extracted from NCBI databases + randomly generated C/C++ code run on a 2.8GHz Dell PowerEdge running Linux Compared algorithms G-FIX: greedy primer cover algorithm of Pearson et al. - Primers restricted to be within L/2 of amplified SNPs G-VAR: naïve modification of G-FIX - For each SNP, first selected primer can be L bases away from SNP - If first selected primer is L 1 bases away from the SNP, opposite sequence is truncated to a length of L- L 1 G-POT: potential function driven greedy algorithm MIPS-PT: iterative beam-search heuristic of Souvenir et al (WABI’03)

May 25, GSU Biotech Symposium18 Experimental Results, NCBI tests

May 25, GSU Biotech Symposium19 Experimental Results, k=8

May 25, GSU Biotech Symposium20 Experimental Results, k=10

May 25, GSU Biotech Symposium21 Experimental Results, k=12

May 25, GSU Biotech Symposium22 Runtime, k=10

May 25, GSU Biotech Symposium23 Conclusions New combinatorial optimization problems arising in the area of high-throughput assay design Theoretical insights (such as approximation results) give algorithms with significant practical improvements Choosing the proper problem model is critical to solution efficiency

May 25, GSU Biotech Symposium24 Ongoing Work & Open Problems Allow degenerate primers Incorporate more biochemical constraints into the model (melting temperature, secondary structure, cross hybridization, etc.) Close gap between O(lnn) inapproximability bound and O(L lnn) approximation factor for minimum multi-colored subgraph problem Approximation algorithms for partition into multiple multiplexed PCR reactions (Aumann et al. WABI’03)

May 25, GSU Biotech Symposium25 Acknowledgments Kishori Konwar Alex Russell Alex Shvartsman Financial support from UCONN Research Foundation