Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Improved Models and Algorithms for Universal DNA Tag Systems continued … a.k.a. what did we do?
Optimal Testing of Digital Microfluidic Biochips: A Multiple Traveling Salesman Problem R. Garfinkel 1, I.I. Măndoiu 2, B. Paşaniuc 2 and A. Zelikovsky.
Crew Scheduling Housos Efthymios, Professor Computer Systems Laboratory (CSL) Electrical & Computer Engineering University of Patras.
Exact and Approximation Algorithms for DNA Tag Set Design
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Design and Optimization of Universal DNA Arrays Ion Mandoiu CSE Department & BME Program University of Connecticut.
Primer Selection Methods for Detection of Genomic Inversions and Deletions via PAMP Bhaskar DasGupta, University of Illinois at Chicago Jin Jun, and Ion.
Evaluation of Placement Techniques for DNA Probe Array Layout Andrew B. Kahng 1 Ion I. Mandoiu 2 Sherief Reda 1 Xu Xu 1 Alex Zelikovsky 3 (1) CSE Department,
Design and Optimization of Universal DNA Arrays
Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department.
Class 6 DNA Arrays BIOMEMS, Fall Content u Polymerase Chain Reaction or PCR u DNA Detection Process u DNA Micro Arrays u Electronic DNA Arrays u.
Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut.
SNP Genotyping Without Probes by High Resolution Melting of Small Amplicons Robert Pryor 1, Michael Liew 2 Robert Palais 3, and Carl Wittwer 1, 2 1 Dept.
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department.
Selection of Optimal DNA Oligos for Gene Expression Arrays Reporter : Wei-Ting Liu Date : Nov
Algorithms for Biochip Design and Optimization Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
CS5371 Theory of Computation Lecture 1: Mathematics Review I (Basic Terminology)
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
1 Combinatorial Optimization Methods for Reliable Genomic-Based Detection Systems Ion Mandoiu University of Connecticut Computer Science & Engineering.
Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.
Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut
Introduce to Microarray
Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July
May 25, GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of.
Rational DNA Sequence Design for Molecular Nanotechnology Ion Mandoiu, CSE Department DNA is well-known as the carrier of information in living organisms.
APBC Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander.
Optimization Methods for Reliable Genomic- Based Pathogen Detection Systems K.M. Konwar, I.I. Mandoiu, A.C. Russell, and A.A. Shvartsman Computer Science.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
and analysis of gene transcription
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics.
1 Outline Last time: –Molecular biology primer (sections ) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Strand Design for Biomolecular Computation
Microarray Technology
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Solution of Satisfiability Problem on a Gel-Based DNA computer Ji Yoon Park Dept. of Biochem Hanyang University.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Introduction to Oligonucleotide Microarray Technology
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
CSCI2950-C Genomes, Networks, and Cancer
Polymerase Chain Reaction
Data Driven Resource Allocation for Distributed Learning
Greedy Technique.
PCR TECHNIQUE
A DNA computing readout operation based on structure-specific cleavage
Introduction to Bioinformatics II
Approximation Algorithms
Ion Mandoiu Computer Science & Engineering Department
Fitness measures for DNA Computing
Presentation transcript:

Exact and Approximation Algorithms for DNA Tag Set Design Ion Mandoiu and Dragos Trinca Computer Science & Engineering Department University of Connecticut

Overview  Background on universal tag arrays  Tag set design problem - c-h code formulation - Integer program and LP-approximation - Cycle packing formulation  Experimental results  Conclusions

Watson-Crick Complementarity Four nucleotide types: A,C,G,T A’s paired with T’s (2 hydrogen bonds) C’s paired with G’s (3 hydrogen bonds)

DNA Microarrays Exploit Watson-Crick complementarity to simultaneously perform a large number of substring tests Used in a variety of genomic analyses –Transcription (gene expression) analysis –Single Nucleotide Polymorphism (SNP) genotyping –Genomic-based microorganism identification Common microarray formats involve direct hybridization between labeled DNA/RNA sample and DNA probes attached to a glass slide

Direct Hybridization Experiment Images courtesy of Affymetrix. Labeled DNA/RNA sample hybridized to array of probes Laser activation of fluorescent labels Optical scanning used to identify probes with complements in the mixture

Limitations of Common Array Formats Arrays of cDNAs –Probes obtained by reverse transcription from Expressed Sequence Tags (ESTs) –Inexpensive, but can only be used for transcription analysis Oligonucleotide arrays –Short (20-60bp) synthetic DNA probes –Flexible, but expensive unless produced in large quantities

Universal Tag Arrays [Brenner 97, Morris et al. 98] “Programmable” oligonucleotide arrays –Array consists of application independent oligonucleotides called tags –Two-part “reporter” probes: aplication specific primers ligated to antitags –Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

+ Mix reporter probes with genomic DNA Universal Tag Array Experiment Solution phase hybridization Single-Base Extension Solid phase hybridization

Universal Tag Array Advantages Cost effective –Same array used in many analyses  economies of scale Easy to customize – Only need to synthesize new set of reporter probes Reliable –Solution phase hybridization better understood than hybridization on solid support

Tag Hybridization Constraints (H1) Antitags hybridize strongly to complementary tags (H2) No antitag hybridizes to a non-complementary tag (H3) Antitags do not cross-hybridize to each other t1 t2 t1t2t1 Tag Set Design Problem: Find a maximum cardinality set of tags satisfying (H1)-(H3)

More Hybridization Constraints… Enforced during tag assignment by - Leaving some tags unassigned and distributing primers to multiple arrays [Ben-Dor et al. 03] - Exploiting availability of multiple primer candidates [MPT05] t1t2 t1

Hybridization Models: Stability Melting temperature models, e.g., nearest neighbor [SantaLucia 96] Tag weight  h [Ben-Dor et al. 00] –wt(A)=wt(T)=1, wt(C)=wt(G)=2 –Equivalent to “2-4 rule” melting temperature model Tag length = l [Affymetrix] –Combined with additional constraints on GC-content, etc.

Hamming distance model, e.g. [Marathe et al. 01] –Models rigid DNA strands LCS/edit distance model, e.g., [Torney et al. 03] –Models infinitely elastic DNA strands c-token model [Ben-Dor et al. 00]: –Derived from nucleation complex theory: duplex formation requires formation of nucleation complex between perfectly complementary substrings –Nucleation complex must have weight  c Hybridization Models: Non-Interaction

c-h Code Problem c-token: left-minimal DNA string of weight  c, i.e., –w(x)  c –w(x’) < c for every proper suffix x’ of x A set of tags is called c-h code if (C1) Every tag has weight  h (C2) Every c-token is used at most once c-h code problem: given c and h, find maximum cardinality c-h code

Previous Work [Ben-Dor et al.00] - Constructive upper-bound on c-h code size based on token tail-weight - Approximation algorithm based on DeBruijn sequences [MPT05] - Upper-bound for c-h codes including antitag-to-antitag hybridization constraints - Simple alphabetic tree search heuristic

Token Content of a Tag c=4 CCAGATT CC CCA CAG AGA GAT GATT Tag  sequence of c-tokens End pos: c-token: CC  CCA  CAG  AGA  GAT  GATT

Layered c-token graph s t c1c1 cNcN ll-1 c/2(c/2)+1…

Integer Program Formulation Maximum integer flow problem w/ set capacity constraints O( lN)/ O( hN) constraints & variables, where N = #c-tokens

Number of c-tokens c# c-tokens

Packing ILP Formulation

Garg-Konemann Algorithm 1. x  0; y   // y i are variables of the dual LP 2. Find min weight s-t path p, where weight(v) = y i for every v  V i 3. While weight(p) < 1 do M  max i |p  V i | x p  x p + 1/M For every i, y i  y i ( 1 +  * |p  V i |/M ) Find min weight s-t path p, where weight(v) = y i for v  V i 4. For every p, x p  x p / (1 - log 1+   ) [GK98] The algorithm computes a factor (1-  ) 2 approximation to the optimal LP solution with (N/  )* log 1+  N shortest path computations

LP Based c-h Code Construction 1.Run Garg-Konemann and store the minimum weight paths in a list 2.Traversing the list in reverse order, pick tags corresponding to paths if they are feasible and do not share c-tokens with already selected tags 3.Mark used c-tokens and run the alphabetic tree search algorithm to select additional tags

Experimental Results (h=15)

Experimental Results (h=28)

Periodic Tags Key observation: c-token uniqueness constraint in c-h code formulation is too strong –A c-token should not appear in two different tags, but can be repeated within a tag! A tag t is called periodic if it is the prefix of (  )  for some  –Periodic strings make better use of c-tokens (t uses at most |  | c-tokens)

c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT

Cycle Packing Problem Vertex-Disjoint Cycle Packing Problem: Given directed graph G, find maximum number of vertex disjoint directed cycles in G Theorem 1: APX-hard even for regular directed graphs with in-degree and out-degree 2 [Salavatipour and Verstraete 05] For general graphs: –Quasi-NP-hard to approximate within  (log 1-  n) –O(n 1/2 ) approximation algorithm

Tag Set Design Algorithm 1.Construct c-token factor graph G 2.T  {} 3.For all cycles C defining periodic tags, in increasing order of cycle length, do Add to T the tag defined by C Remove C from G 4.Perform an alphabetic tree search and add to T tags with no c-tokens in common with T 5.Return T

Experimental Results h

Antitag-to-Antitag Hybridization Formalization in c-token hybridization model: (C3) No two (anti)tags contain complementary substrings of weight  c Cycle packing and tree search extend easily

Results w/ Extended Constraints

Herpes B Gene Expression Assay TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util GenFlex Tags Periodic Tags

Conclusions Use of periodic tags yields significant increase in tag set size and enables higher multiplexing rates in tag assignment Algorithms extend to more accurate hybridization models –Monotonic melting temperature  c-tokens  factor graph Other applications of non-interacting DNA tag sets –Lab-on-chip, DNA-mediated assembly, DNA computing [Brenneman&Condon 02] Open problems –Settle approximation complexity of disjoint cycle packing –Improved tag set design algorithms?

Acknowledgments UCONN Research Foundation