Download presentation
Presentation is loading. Please wait.
1
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department University of Connecticut
2
Overview Universal DNA Tag Arrays Tag Set Design Problem Tag Assignment Problem Conclusions
3
Watson-Crick Complementarity Four nucleotide types: A,C,T,G A’s paired with T’s (2 hydrogen bonds) C’s paired with G’s (3 hydrogen bonds)
4
Microarray Applications Gene expression (transcription analysis) Genomic-based microorganism identification Single Nucleotide Polymorphism (SNP) genotyping Molecular diagnosis/susceptibility to disease
5
Microarray Technologies Arrays of cDNAs –Obtained by reverse transcription from Expressed Sequence Tags (ESTs) Oligonucleotide arrays –Short (20-60bp) synthetic DNA strands Limitations –cDNA arrays are inexpensive, but limited to measuring gene expression –Oligonucleotide arrays are flexible, but expensive unless produced in large quantities
6
Universal DNA Tag Arrays [Brenner 97, Morris et al. 98] “Programmable” oligonucleotide arrays –Array consisting of application independent oligonucleotides called tags –Two-part “reporter” probes: aplication specific primers ligated to antitags –Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes
7
Universal Tag Array Experiment +
8
Universal Tag Array Advantages Cost effective –Same array used in many analyses economies of scale Easy to customize – Only need to synthesize new set of reporter probes Reliable –Solution phase hybridization better understood than hybridization on solid support
9
Overview Universal DNA Tag Arrays Tag Set Design Problem Tag Assignment Problem Conclusions
10
Tag Set Requirements Hybridization constraints (H1) Antitags hybridize strongly to complementary tags (H2) No antitag hybridezes to a non-complementary tag (H3) Antitags do not cross-hybridize to each other t1 t2 t1t2t1
11
Complementary Oligo Hybridization Melting temperature Tm: temperature at which 50% of duplexes are in hybridized state 2-4 rule Tm = 2 #(As and Ts) + 4 #(Cs and Gs) More accurate models exist, e.g., the near- neighbor model
12
Non-Complementary Oligo Hybridization Models Hamming distance –Assumes DNA is rigid strand Longest Common Subsequence/Edit Distance –Assumes DNA is infinitely elastic Near-neighbor with mismatches –Resulting thermodynamic alignment problem is NP-Hard
13
C-token Hybridization Model Proposed by [Ben-Dor et al. 00] Based on nucleation complex theory –Duplex formation between non-complementary oligos starts with the formation of a nucleation complex between perfectly complementary substrings –The nucleation complex must be sufficiently stable, i.e., must have weight c, where weight(A)=weight(T)=1, weight(C)=weight(G)=2
14
c-h Code Problem c-token: left-minimal DNA string of weight c, i.e., –w(x) c –w(x’) < c for every proper suffix x’ of x A set of tags is called c-h code if (C1) Every tag has weight h (C2) Every c-token is used at most once c-h code problem: given c and h, find largest c-h code –[Ben-Dor et al.00] algorithm based on DeBruijn sequences
15
Extended c-h Codes Antitag-to-antitag hybridization –Formalization in c-token hybridization model: (C3) No two tags contain complementary substrings of weight c Tag length constraints –Industry designs (e.g. Affymetrix GenFlex Arrays) require that tags have fixed length (20 nucleotides)
16
c-Token Counting W=weight 1 nucleotide, S=weight 2 nucleotide G n = #strings of weight n:G 1 = 2; G 2 = 6; G n = 2G n-2 + 2G n-1 [Ben-Dor et al.00] c-tokens in a tag have tail weight h-c+1 At most ( 2G c-1 +6G c-2 +8G c-3 )/(h-c+1) tags in a valid c-h code Token typeMax #tokensMax Tail Weight S2 G c-2 4 G c-2 S 4 G c-3 8 G c-3 W2 G c-1 S W2 G c-2 Total2G c-1 +4G c-2 +4G c-3 2G c-1 +6G c-2 +8G c-3
17
Upperbounds for Extended c-h Codes Theorem: The number of tags in a feasible extended c-h-l code is at most for odd c, and at most for even c
18
Algorithms Alphabetic tree search algorithm - Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags - Easily modified to handle various combinations of constraints [MT 05] Optimum c-h codes can be computed in practical time for small values of c by using integer programming Maximum integer flow problem w/ set capacity constraints
19
Token Content of a Tag c=4 CCAGATT CC CCA CAG AGA GAT GATT Tag sequence of c-tokens End pos: 2 3 4 5 6 7 c-token: CC CCA CAG AGA GAT GATT
20
Layered c-token graph for length-l tags s t c1c1 cNcN ll-1 c/2(c/2)+1…
21
Integer Program Formulation O( hN) constraints & variables, where N = #c-tokens
22
Experimental Results
23
Periodic Tags c-token uniqueness constraint in c-h code formulation is too strong –c-tokens should not appear in different tags, but is OK to repeat a c-token in the same tag –Periodic tags make best use of available c-tokens [MT05] Tag set design can be cast as a maximum vertex-disjoint cycle packing problem Allowing periodic tags yields ~40% more tags
24
c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT
25
Vertex-disjoint Cycle Packing Problem Given directed graph G, find maximum number of vertex disjoint directed cycles in G [MT 05] APX-hard even for regular directed graphs with in-degree and out-degree 2 –h-c/2+1 approximation factor for tag set design problem [Salavatipour and Verstraete 05] –Quasi-NP-hard to approximate within (log 1- n) –O(n 1/2 ) approximation algorithm
26
Tag Set Design Algorithm 1.Construct c-token factor graph G 2.T {} 3.For all cycles C defining periodic tags, in increasing order of cycle length, do Add to T the tag defined by C Remove C from G 4.Perform an alphabetic tree search and add to T tags with no c-tokens in common with T 5.Return T
27
Experimental Results h
28
Results w/ Extended Constraints h
29
Overview Background on DNA Microarrays Universal DNA Tag Arrays Tag Set Design Problem Tag Assignment Problem Conclusions
30
More Possible Mis-Hybridizations What can be done: Leave some tags unassigned Distribute primers over multiple arrays Here we focus on avoiding case (a), primer-to-tag hybridization
31
Constraints on tag assignment If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p t t’ p’
32
Characterization of Assignable Sets [Ben-Dor 04] Set P is assignable to T iff X+Y |P|, where, in the hybridization graph induced by P+T –X = number of primers incident to a degree 1 tag –Y = number of degree 0 tags Y=2 X=1
33
MAPS Problem Maximum Assignable Primer Set (MAPS) Problem: given primer set P and tag set T, find maximum size assignable subset of P [Ben-Dor 04] Greedy deletion heuristic: repeatedly delete primer of maximum weight from P until it becomes assignable, where –Potential of tag t is 2 -|P(t)| –Potential of primer p is sum of potentials of conflicting tags
34
Universal Array Multiplexing Problem Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets [Ben-Dor 04] Repeatedly find approximate MAPS
35
Integration with Probe Selection In practice, several primer candidates with equivalent functionality –In SNP genotyping, can pick primer from either forward and reverse strand –In gene expression/identification applications, many primers have desired length, Tm, etc.
36
Pooled Array Multiplexing Problem Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets
37
X+Y Characterization no Longer Holds
38
Pooled Multiplexing Algorithms 1.Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]
39
Pooled Multiplexing Algorithms 2.Primer-Del+ = same, but never delete last primer from pool unless no other choice 3.Min-Pot = select primer with min potential from each pool, then run Primer-Del 4.Min-Deg = select primer with min conflict degree, then run Primer-Del 1.Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]
40
Results: GenFlex Tags, c=8
41
Results: GenFlex Tags, c=7
42
Herpes B Gene Expression Assay TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util. 601446 1494.06297.20172.30 5496.132100.00172.30 671560 1496.53298.70178.00 5498.00299.90178.00 701522 1496.73298.90176.10 5497.80299.80176.10 TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util. 601446 1482.26365.35257.05 5488.26370.95263.55 671560 1486.33369.70261.15 5491.86376.00267.20 701522 1488.46373.65265.40 5492.26291.10270.30 GenFlex Tags Periodic Tags
43
Overview Background on DNA Microarrays Universal DNA Tag Arrays Tag Set Design Problem Tag Assignment Problem Conclusions
44
Conclusions New techniques for tag set design and tag assignment lead to significantly improved multiplexing rates and more reliable assays Other applications of universal tags: –Lab-on-chip, DNA-mediated assembly (e.g., carbon nano- tubes), DNA computing [Brenneman&Condon 02] Other applications of partition/assignment techniques: –Genotyping by mass-spectroscopy [Aumann et al 05] –Genotyping using l-mer arrays
45
Acknowledgments UCONN Research Foundation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.