Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department.

Similar presentations


Presentation on theme: "Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department."— Presentation transcript:

1 Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department University of Connecticut

2 Overview  Universal DNA Tag Arrays  Tag Set Design Problem  Tag Assignment Problem  Conclusions

3 Watson-Crick Complementarity Four nucleotide types: A,C,T,G A’s paired with T’s (2 hydrogen bonds) C’s paired with G’s (3 hydrogen bonds)

4 Microarray Applications Gene expression (transcription analysis) Genomic-based microorganism identification Single Nucleotide Polymorphism (SNP) genotyping Molecular diagnosis/susceptibility to disease

5 Microarray Technologies Arrays of cDNAs –Obtained by reverse transcription from Expressed Sequence Tags (ESTs) Oligonucleotide arrays –Short (20-60bp) synthetic DNA strands Limitations –cDNA arrays are inexpensive, but limited to measuring gene expression –Oligonucleotide arrays are flexible, but expensive unless produced in large quantities

6 Universal DNA Tag Arrays [Brenner 97, Morris et al. 98] “Programmable” oligonucleotide arrays –Array consisting of application independent oligonucleotides called tags –Two-part “reporter” probes: aplication specific primers ligated to antitags –Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

7 Universal Tag Array Experiment +

8 Universal Tag Array Advantages Cost effective –Same array used in many analyses  economies of scale Easy to customize – Only need to synthesize new set of reporter probes Reliable –Solution phase hybridization better understood than hybridization on solid support

9 Overview  Universal DNA Tag Arrays  Tag Set Design Problem  Tag Assignment Problem  Conclusions

10 Tag Set Requirements Hybridization constraints (H1) Antitags hybridize strongly to complementary tags (H2) No antitag hybridezes to a non-complementary tag (H3) Antitags do not cross-hybridize to each other t1 t2 t1t2t1

11 Complementary Oligo Hybridization Melting temperature Tm: temperature at which 50% of duplexes are in hybridized state 2-4 rule Tm = 2 #(As and Ts) + 4 #(Cs and Gs) More accurate models exist, e.g., the near- neighbor model

12 Non-Complementary Oligo Hybridization Models Hamming distance –Assumes DNA is rigid strand Longest Common Subsequence/Edit Distance –Assumes DNA is infinitely elastic Near-neighbor with mismatches –Resulting thermodynamic alignment problem is NP-Hard

13 C-token Hybridization Model Proposed by [Ben-Dor et al. 00] Based on nucleation complex theory –Duplex formation between non-complementary oligos starts with the formation of a nucleation complex between perfectly complementary substrings –The nucleation complex must be sufficiently stable, i.e., must have weight  c, where weight(A)=weight(T)=1, weight(C)=weight(G)=2

14 c-h Code Problem c-token: left-minimal DNA string of weight  c, i.e., –w(x)  c –w(x’) < c for every proper suffix x’ of x A set of tags is called c-h code if (C1) Every tag has weight  h (C2) Every c-token is used at most once c-h code problem: given c and h, find largest c-h code –[Ben-Dor et al.00] algorithm based on DeBruijn sequences

15 Extended c-h Codes Antitag-to-antitag hybridization –Formalization in c-token hybridization model: (C3) No two tags contain complementary substrings of weight  c Tag length constraints –Industry designs (e.g. Affymetrix GenFlex Arrays) require that tags have fixed length (20 nucleotides)

16 c-Token Counting W=weight 1 nucleotide, S=weight 2 nucleotide G n = #strings of weight n:G 1 = 2; G 2 = 6; G n = 2G n-2 + 2G n-1 [Ben-Dor et al.00] c-tokens in a tag have tail weight  h-c+1  At most ( 2G c-1 +6G c-2 +8G c-3 )/(h-c+1) tags in a valid c-h code Token typeMax #tokensMax Tail Weight S2 G c-2 4 G c-2 S 4 G c-3 8 G c-3 W2 G c-1 S W2 G c-2 Total2G c-1 +4G c-2 +4G c-3 2G c-1 +6G c-2 +8G c-3

17 Upperbounds for Extended c-h Codes Theorem: The number of tags in a feasible extended c-h-l code is at most for odd c, and at most for even c

18 Algorithms Alphabetic tree search algorithm - Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags - Easily modified to handle various combinations of constraints [MT 05] Optimum c-h codes can be computed in practical time for small values of c by using integer programming Maximum integer flow problem w/ set capacity constraints

19 Token Content of a Tag c=4 CCAGATT CC CCA CAG AGA GAT GATT Tag  sequence of c-tokens End pos: 2 3 4 5 6 7 c-token: CC  CCA  CAG  AGA  GAT  GATT

20 Layered c-token graph for length-l tags s t c1c1 cNcN ll-1 c/2(c/2)+1…

21 Integer Program Formulation O( hN) constraints & variables, where N = #c-tokens

22 Experimental Results

23 Periodic Tags c-token uniqueness constraint in c-h code formulation is too strong –c-tokens should not appear in different tags, but is OK to repeat a c-token in the same tag –Periodic tags make best use of available c-tokens [MT05] Tag set design can be cast as a maximum vertex-disjoint cycle packing problem Allowing periodic tags yields ~40% more tags

24 c-token factor graph, c=4 (incomplete) CC AAG AAC AAAA AAAT

25 Vertex-disjoint Cycle Packing Problem Given directed graph G, find maximum number of vertex disjoint directed cycles in G [MT 05] APX-hard even for regular directed graphs with in-degree and out-degree 2 –h-c/2+1 approximation factor for tag set design problem [Salavatipour and Verstraete 05] –Quasi-NP-hard to approximate within  (log 1-  n) –O(n 1/2 ) approximation algorithm

26 Tag Set Design Algorithm 1.Construct c-token factor graph G 2.T  {} 3.For all cycles C defining periodic tags, in increasing order of cycle length, do Add to T the tag defined by C Remove C from G 4.Perform an alphabetic tree search and add to T tags with no c-tokens in common with T 5.Return T

27 Experimental Results h

28 Results w/ Extended Constraints h

29 Overview  Background on DNA Microarrays  Universal DNA Tag Arrays  Tag Set Design Problem  Tag Assignment Problem  Conclusions

30 More Possible Mis-Hybridizations What can be done: Leave some tags unassigned Distribute primers over multiple arrays Here we focus on avoiding case (a), primer-to-tag hybridization

31 Constraints on tag assignment If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p t t’ p’

32 Characterization of Assignable Sets [Ben-Dor 04] Set P is assignable to T iff X+Y  |P|, where, in the hybridization graph induced by P+T –X = number of primers incident to a degree 1 tag –Y = number of degree 0 tags Y=2 X=1

33 MAPS Problem Maximum Assignable Primer Set (MAPS) Problem: given primer set P and tag set T, find maximum size assignable subset of P [Ben-Dor 04] Greedy deletion heuristic: repeatedly delete primer of maximum weight from P until it becomes assignable, where –Potential of tag t is 2 -|P(t)| –Potential of primer p is sum of potentials of conflicting tags

34 Universal Array Multiplexing Problem Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets [Ben-Dor 04] Repeatedly find approximate MAPS

35 Integration with Probe Selection In practice, several primer candidates with equivalent functionality –In SNP genotyping, can pick primer from either forward and reverse strand –In gene expression/identification applications, many primers have desired length, Tm, etc.

36 Pooled Array Multiplexing Problem Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets

37 X+Y Characterization no Longer Holds

38 Pooled Multiplexing Algorithms 1.Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]

39 Pooled Multiplexing Algorithms 2.Primer-Del+ = same, but never delete last primer from pool unless no other choice 3.Min-Pot = select primer with min potential from each pool, then run Primer-Del 4.Min-Deg = select primer with min conflict degree, then run Primer-Del 1.Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]

40 Results: GenFlex Tags, c=8

41 Results: GenFlex Tags, c=7

42 Herpes B Gene Expression Assay TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util. 601446 1494.06297.20172.30 5496.132100.00172.30 671560 1496.53298.70178.00 5498.00299.90178.00 701522 1496.73298.90176.10 5497.80299.80176.10 TmTm # pools Pool size 500 tags1000 tags2000 tags # arrays% Util.# arrays% Util.# arrays% Util. 601446 1482.26365.35257.05 5488.26370.95263.55 671560 1486.33369.70261.15 5491.86376.00267.20 701522 1488.46373.65265.40 5492.26291.10270.30 GenFlex Tags Periodic Tags

43 Overview  Background on DNA Microarrays  Universal DNA Tag Arrays  Tag Set Design Problem  Tag Assignment Problem  Conclusions

44 Conclusions New techniques for tag set design and tag assignment lead to significantly improved multiplexing rates and more reliable assays Other applications of universal tags: –Lab-on-chip, DNA-mediated assembly (e.g., carbon nano- tubes), DNA computing [Brenneman&Condon 02] Other applications of partition/assignment techniques: –Genotyping by mass-spectroscopy [Aumann et al 05] –Genotyping using l-mer arrays

45 Acknowledgments UCONN Research Foundation


Download ppt "Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays Ion Mandoiu Claudia Prajescu Dragos Trinca Computer Science & Engineering Department."

Similar presentations


Ads by Google