APBC 20051 Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander.

Slides:



Advertisements
Similar presentations
Improved Approximation Algorithms for the Spanning Star Forest Problem Prasad Raghavendra Ning ChenC. Thach Nguyen Atri Rudra Gyanit Singh University of.
Advertisements

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Heuristics for the Hidden Clique Problem Robert Krauthgamer (IBM Almaden) Joint work with Uri Feige (Weizmann)
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Optimal Testing of Digital Microfluidic Biochips: A Multiple Traveling Salesman Problem R. Garfinkel 1, I.I. Măndoiu 2, B. Paşaniuc 2 and A. Zelikovsky.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Fingerprint Clustering - CPM Fingerprint Clustering with Bounded Number of Missing Values Paola Bonizzoni, Gianluca Della Vedova, Giancarlo Mauri.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
A Look into the Process of Marker Development Matt Robinson.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
[1][1][1][1] Lecture 4: Frequency reuse, channel assignment, and more June 15, Introduction to Algorithmic Wireless Communications David Amzallag.
Design and Optimization of Universal DNA Arrays Ion Mandoiu CSE Department & BME Program University of Connecticut.
Primer Selection Methods for Detection of Genomic Inversions and Deletions via PAMP Bhaskar DasGupta, University of Illinois at Chicago Jin Jun, and Ion.
Simultaneous Matchings Irit Katriel - BRICS, U of Aarhus, Denmark Joint work with Khaled Elabssioni and Martin Kutz - MPI, Germany Meena Mahajan - IMSC,
Approximation Algorithm: Iterative Rounding Lecture 15: March 9.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.
Approximation Algorithms
1 Traveling Salesman Problem (TSP) Given n £ n positive distance matrix (d ij ) find permutation  on {0,1,2,..,n-1} minimizing  i=0 n-1 d  (i),  (i+1.
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department.
Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference Ion Mandoiu University of Connecticut CS&E Department.
Symmetric Connectivity With Minimum Power Consumption in Radio Networks G. Calinescu (IL-IT) I.I. Mandoiu (UCSD) A. Zelikovsky (GSU)
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
UConn BioGrid REU Summer 2008 Primer Design for Multiplex PCR Nikoletta DiGirolamo.
Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Combinatorial Optimization Methods for Reliable Genomic-Based Detection Systems Ion Mandoiu University of Connecticut Computer Science & Engineering.
May 25, GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Optimization Methods for Reliable Genomic- Based Pathogen Detection Systems K.M. Konwar, I.I. Mandoiu, A.C. Russell, and A.A. Shvartsman Computer Science.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Interdisciplinary Center for Biotechnology Research
PCR- Polymerase chain reaction
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Approximation Algorithms
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Precomputing Edit-Distance Specificity of Short Oligonucleotides Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Nonunique Probe Selection and Group Testing Ding-Zhu Du.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Vasilis Syrgkanis Cornell University
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Iterative Rounding in Graph Connectivity Problems Kamal Jain ex- Georgia Techie Microsoft Research Some slides borrowed from Lap Chi Lau.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Ion I. Mandoiu, Vijay V. Vazirani Georgia Tech Joseph L. Ganley Simplex Solutions A New Heuristic for Rectilinear Steiner Trees.
Approximation Algorithms based on linear programming.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
PCR Polymerase Chain Reaction PCR Polymerase Chain Reaction Marie Černá, Markéta Čimburová, Marianna Romžová.
Slide 1 Toward Optimal Sniffer-Channel Assignment for Reliable Monitoring in Multi-Channel Wireless Networks Donghoon Shin, Saurabh Bagchi and Chih-Chun.
The Set-covering Problem Problem statement –given a finite set X and a family F of subsets where every element of X is contained in one of the subsets.
Computability and Complexity
Quality of Service in Multimedia Distribution
Finding Subgraphs with Maximum Total Density and Limited Overlap
Coverage Approximation Algorithms
Ion Mandoiu Computer Science & Engineering Department
Approximation Algorithms for the Selection of Robust Tag SNPs
Precomputing Edit-Distance Specificity of Short Oligonucleotides
Approximation Algorithms for the Selection of Robust Tag SNPs
Presentation transcript:

APBC Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander C. Russell Alexander A. Shvartsman CS&E Dept., Univ. of Connecticut

APBC Combinatorial Optimization in Bioinformatics Fast growing number of applications –Sequence alignment –DNA sequencing –Haplotype inference –Pathogen identification –… –High-throughput assay design Microarray probe selection Microarray quality control Universal tag arrays … This talk: Multiplex PCR primer set selection

APBC Outline Background and problem formulation “Potential function” greedy algorithm Approximation guarantee Experimental results Conclusions

APBC The Polymerase Chain Reaction Target Sequence Polymerase Primer 1 Primer 2 Primers Repeat cycles

APBC Primer Pair Selection Problem Given: Genomic sequence around amplification locus Primer length k Amplification upperbound L Find: Forward and reverse primers of length k that hybridize within a distance of L of each other and optimize amplification efficiency (melting temperature, secondary structure, mis-priming, etc.)  L L Forward primer Reverse primer amplification locus 3'3' 3'3' 5'5' 5'5'

APBC PCR for SNP Genotyping Thousands of SNPs to be genotyped using hybridization methods (e.g., SBE) Selective PCR amplification needed to improve accuracy of detection steps –whole-genome amplification not appropriate Simultaneous amplification OK  Multiplex PCR

APBC Multiplex PCR How it works –Multiple DNA fragments amplified simultaneously –Each amplified fragment still defined by two primers –A primer may participate in amplification of multiple targets Primer set selection –Currently done by time-consuming trial and error –An important objective is to minimize number of primers  Reduced assay cost  Higher effective concentration of primers  higher amplification efficiency  Reduced unintended amplification

APBC Primer Set Selection Problem Given: Genomic sequences around n amplification loci Primer length k Amplification upper bound L Find: Minimum size set S of primers of length k such that, for each amplification locus, there are two primers in S hybridizing with the forward and reverse genomic sequences within a distance of L of each other

APBC Previous Work on Primer Selection Well-studied problem: [Pearson et al. 96], [Linhart & Shamir’02], [Souvenir et al.’03], etc. Almost all problem formulations decouple selection of forward and reverse primers –To enforce bound of L on amplification length, select only primers that hybridize within L/2 bases of desired target –In worst case, this method can increase the number of primers by a factor of O(n) compared to the optimum [Pearson et al. 96] Greedy set cover algorithm gives O(ln n) approximation factor for the “decoupled” formulation

APBC Previous Work (2) [Fernandes&Skiena’02] study primer set selection with uniqueness constraints Minimum Multi-Colored Subgraph Problem: –Vertices correspond to candidate primers –Edge colored by color i between u and v iff corresponding primers hybridize within a distance of L of each other around i-th amplification locus –Goal is to find minimum size set of vertices inducing edges of all colors

APBC The Set Cover Problem  Given: - Universal set U with n elements - Family of sets (S x, x  X) covering all elements of U  Find: - Minimum size subset X’ of X s.t. (S x, x  X’) covers all elements of U

APBC Selection w/ Length Constraints “Simultaneous set covering” problem: - Ground set partitioned into n disjoint sets S i (one for each target), each with 2L elements - Goal is to select minimum number of sets == primers covering at least 1/2 of the elements in each partition L L SNP i

APBC Greedy Setcover Algorithm  Classical result (Johnson’74, Lovasz’75, Chvatal’79): the greedy setcover algorithm has an approximation factor of H(n)=1+1/2+1/3+…+1/n < 1+ln(n) - The approximation factor is tight - Cannot be approximated within a factor of (1-  )ln(n) unless NP=DTIME(n loglog(n) )  Greedy Algorithm: - Repeatedly pick the set with most uncovered elements

APBC Potential Functions Set cover  = #uncovered elements Initially,  = n For feasible solutions,  = 0 Primer selection with length constraints  = minimum number of elements that must be covered =  i max{0, L - #uncovered elements in S i } Initially,  = nL For feasible solutions,  = 0

APBC General setting  Potential function  (X’)  0   ({}) =  max   (X’) = 0 for all feasible solutions  X’’  X’   (X’’)   (X’)  If  (X’)>0, then there exists x s.t.  (X’+x) <  (X’)  X’’  X’  ∆(x,X’)  ∆(x,X’) for every x, where ∆(x,X’) :=  (X’) -  (X’+x)  Objective: find minimum size set X’ with  (X’)=0

APBC Generic Greedy Algorithm Theorem: The generic greedy algorithm has an approximation factor of 1+ln ∆ max Corollary: 1+ln(nL) approximation for PCR primer selection  X’  {}  While  ( X’ ) > 0 Find x with maximum ∆( x,X’ ) X’  X’ + x

APBC Proof Sketch (1) x 1, x 2,…,x g be the elements selected by greedy, in the order in which they are chosen x* 1, x* 2,…,x* k be the elements of an optimum solution. Charging scheme: x i charges to x* j a cost of where  i j = ∆(x i,{x 1,…, x i-1 }  {x* 1,…,x* j }) Fact 1: Each x* j gets charged a total cost of at most 1+ln ∆ max

APBC Proof Sketch (2) Fact 2: Each x i charges at least 1 unit of cost

APBC Experimental Setting Datasets extracted from NCBI databases, L=1000 Dell PowerEdge 2.8GHz Xeon Compared algorithms –G-FIX: greedy primer cover algorithm [Pearson et al.] –MIPS-PT: iterative beam-search heuristic [Souvenir et al.] Restrict primers to L/2 bases around amplification locus –G-VAR: naïve modification of G-FIX First selected primer can be up to L bases away Opposite sequence truncated after selecting first primer –G-POT: potential function driven greedy algorithm

APBC Experimental Results, NCBI tests # Targets k G-FIX (Pearson et al.) G-VAR (G-FIX with dynamic truncation) MIPS-PT (Souvenir et al.) G-POT (Potential- function greedy) #PrimersCPU sec #PrimersCPU sec #PrimersCPU sec #PrimersCPU sec

APBC #primers, as percentage of 2n (l=8) n

APBC #primers, as percentage of 2n (l=10) n

APBC #primers, as percentage of 2n (l=12) n

APBC CPU Seconds (l=10) n

APBC Conclusions Numerous combinatorial optimization problems arising in the area of high-throughput assay design Theoretical insights such as approximation results can lead to significant practical improvements Choosing the proper problem model is critical to solution efficiency

APBC Ongoing Work & Open Problems Degenerate primers Accurate hybridization model (melting temperature, secondary structure, cross hybridization,…) –In-silico MP-PCR simulator Partition into multiple multiplexed PCR reactions (Aumann et al. Wabi’03)

APBC Acknowledgments Financial support from UCONN’s Research Foundation

APBC Integer Program Formulation 0/1 variable x u for every vertex 0/1 variable y e for every edge e

APBC LP-Rounding Algorithm  Theorem [Konwar et al.’04]: The LP-rounding algorithm finds a feasible solution at most O(m 1/2 lnn) times larger than the optimum, where m is the maximum color class size, and n is the number of nodes  For primer selection, m  L 2  approximation factor is O(Llnn)  Better approximation? - Unlikely for minimum multi-colored subgraph problem (1) Solve linear programming relaxation (2) Select node u with probability x u (3) Repeat step 2 O(ln(n)) times and return selected nodes