Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design and Optimization of Universal DNA Arrays

Similar presentations


Presentation on theme: "Design and Optimization of Universal DNA Arrays"— Presentation transcript:

1 Design and Optimization of Universal DNA Arrays
Ion Mandoiu Computer Science & Engineering Department University of Connecticut

2 Overview Background on DNA Microarrays DNA Tag Arrays
- Tag Set Design - Tag Assignment Problem New SBE/SBH Assay - Decoding and Multiplexing Algorithms Conclusions

3 Watson-Crick Complementarity
Four nucleotide types: A,C,T,G A’s paired with T’s (2 hydrogen bonds) C’s paired with G’s (3 hydrogen bonds)

4 DNA Microarrays Exploit Watson-Crick complementarity to simultaneously perform a large number of substring tests Used in a variety of high-throughput genomic analyses Transcription (gene expression) analysis Single Nucleotide Polymorphism (SNP) genotyping Genomic-based microorganism identification Point-of-service diagnosis Alternative splicing, ChIP-on-chip, tiling arrays,… Common microarray formats involve direct hybridization between labeled DNA/RNA sample and DNA probes attached to a glass slide

5 SNP Genotyping Genome variation: 0.1% of the DNA different from one individual to another 80% of the variation is represented by Single Nucleotide Polymorphisms (SNPs) 2 possible nucleotides (alleles) for each SNP SNP genotyping = determining the alleles present at the SNP sites Highest throughput for SNP genotyping is achieved by high-density DNA microarrays based on direct hybridization

6 SNP genotyping via direct hybridization
SNP1 with alleles T/G SNP2 with alleles A/G 2 probes per SNP A C T Optical scanning used to identify probes with complements in the mixture A C T G Solid phase hybridization

7 Universal DNA Arrays Limitations of direct hybridization formats:
Arrays of cDNAs: inexpensive, but can only be used for transcription analysis Oligonucleotide arrays: flexible, but expensive unless produced in large quantities Universal DNA arrays: “programable” arrays Array consists of application independent oligonucleotides Detection carried by a sequence of reactions involving application specific primers Flexible AND cost effective Universal array architectures: DNA tag arrays, APEX arrays, SBE/SBH arrays

8 Overview Background on DNA Microarrays DNA Tag Arrays
- Tag Set Design - Tag Assignment Problem New SBE/SBH Assay - Decoding and Multiplexing Algorithms Conclusions

9 DNA Tag Arrays “Programmable” array format [Brenner 97, Morris et al. 98] Array consists of application independent array probes called tags The complements of the tags are called antitags Detection carried by a sequence of reactions involving application specific primers and antitags

10 DNA Tag Arrays + Antitag Primer Tag
add labeled dideoxinucleotides and DNA polymerase T C Antitag G Primer A G G + A A G G 1. Mix antitag+primer reporter probes with genomic DNA 2. Solution phase hybridization C T C C Tag G T A C G 4. Solid phase hybridization 3. Single-Base Extension (SBE)

11 Universal Tag Array Advantages
Cost effective Same array used in many analyses  can be mass produced Fast to customize Only need to synthesize new set of reporter probes Reliable Solution phase hybridization better understood than hybridization on solid support

12 Tag Set Design Problem t1 t1 t2 t2 t1 t1 t2 (H1) Tags hybridize strongly to complementary antitags (H2) No tag hybridizes to a non-complementary antitag (H3) Tags do not cross-hybridize to each other Tag Set Design Problem: Find a maximum cardinality set of tags satisfying (H1)-(H3)

13 Hybridization Models Hamming distance model, e.g., [Marathe et al. 01]
Models rigid DNA strands LCS/edit distance model, e.g., [Torney et al. 03] Models infinitely elastic DNA strands c-token model [Ben-Dor et al. 00]: Duplex formation requires formation of nucleation complex between perfectly complementary substrings Nucleation complex must have weight  c, where wt(A)=wt(T)=1, wt(C)=wt(G)=2 (2-4 rule)

14 c-h Code Problem c-token: left-minimal DNA string of weight  c, i.e.,
w(x)  c w(x’) < c for every proper suffix x’ of x A set of tags is a c-h code if (C1) Every tag has weight  h (C2) Every c-token is used at most once c-h Code Problem [Ben-Dor et al.00] Given c and h, find maximum cardinality c-h code [Ben-Dor et al.00] give approximation algorithm based on DeBruijn sequences

15 Periodic Tags [MT05] Key observation: c-token uniqueness constraint in c-h code formulation is too strong A c-token should not appear in two different tags, but can be repeated in a tag Periodic tags use fewer c-tokens!  Tag set design can be cast as a cycle packing problem

16 Vertex-disjoint Cycle Packing Problem
Given directed graph G, find maximum number of vertex disjoint directed cycles in G [MT 05] APX-hard even for regular directed graphs with in-degree and out-degree 2 h-c/2+1 approximation factor for tag set design problem [Salavatipour and Verstraete 05] Quasi-NP-hard to approximate within (log1- n) O(n1/2) approximation algorithm

17 c-token factor graph, c=4 (incomplete)
CC AAG AAC AAAA AAAT

18 Cycle Packing Algorithm
Construct c-token factor graph G T{} For all cycles C defining periodic tags, in increasing order of cycle length, Add to T the tag defined by C Remove C from G Perform an alphabetic tree search and add to T tags consisting of unused c-tokens Return T Gives an increase of over 40% in the number of tags compared to previous methods

19 Experimental Results h

20 Antitag-to-Antitag Hybridization
Additional practical constraint (ignored by Ben-Dor et al): antitags do not cross-hybridize, including self Formalization in c-token hybridization model: (C3) No two (anti)tags contain complementary substrings of weight  c Cycle packing and tree search extend easily

21 Results w/ Extended Constraints
h

22 More Hybridization Constraints…
Enforced during tag assignment by - Leaving some tags unassigned and distributing primers across multiple arrays [Ben-Dor et al. 03] - Exploiting availability of multiple primer candidates [MPT05]

23 Assignable Primers If primer p hybridizes to tag t’, at most one of the assignments (p,t’), (p,t) and (p’,t’) can be made p t’ t p’ Set P of primers is assignable to a set T of tags if the condition above is satisfied for every p,p’ and t,t’

24 Finding Assignable Primer Sets
Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets Maximum Assignable Primer Set Problem: given primer set P and tag set T, find a maximum size assignable subset of P Both problems are NP-hard [Ben-Dor 04]

25 Integration with Primer Selection
In practice, several primer candidates with equivalent functionality In SNP genotyping, can pick primer from either forward and reverse strand In gene expression/identification applications, many primers have desired length, Tm, etc.

26 Pooled Array Multiplexing Problem
Pooled Multiplexing Problem: Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets

27 Pooled Multiplexing Algorithms
Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04] Repeatedly delete primer of maximum potential until X+Y  #pools, where Potential of tag t is 2-deg(t) Potential of primer p is sum of potentials of conflicting tags Subtract ½ if primer adjacent to a tag of degree 1

28 Pooled Multiplexing Algorithms
Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04] Primer-Del+ = same but never delete last primer from pool unless no other choice Min-Pot = select primer with min potential from each pool, then run Primer-Del Min-Deg = select primer with min degree, then run Primer-Del Iterative ILP = iteratively find a maximum assignable pool set using integer linear program

29 Results: 213 [MPT05] Tags, c=7

30 Herpes B Gene Expression Assay
GenFlex Tags Tm # pools Pool size 500 tags 1000 tags 2000 tags # arrays % Util. 60 1446 1 4 82.26 3 65.35 2 57.05 5 88.26 70.95 63.55 67 1560 86.33 69.70 61.15 91.86 76.00 67.20 70 1522 88.46 73.65 65.40 92.26 91.10 70.30 Periodic Tags Tm # pools Pool size 500 tags 1000 tags 2000 tags # arrays % Util. 60 1446 1 4 94.06 2 97.20 72.30 5 96.13 100.00 67 1560 96.53 98.70 78.00 98.00 99.90 70 1522 96.73 98.90 76.10 97.80 99.80

31 Overview Background on DNA Microarrays DNA Tag Arrays
- Tag Set Design - Tag Assignment Problem New SBE/SBH Assay - Decoding and Multiplexing Algorithms Conclusions

32 hybridization on a 2-mer array (SBH) single-base extension (SBE)
New SBE/SBH Assay Primers T T A A T T TTGCA AA AC CC CA AT AG CG CT TT TG GG GT TA TC GC GA T CCATT GATAA A T hybridization on a 2-mer array (SBH) single-base extension (SBE)

33 Some notations P set of primers, X set of probes
Ep ⊆ {A,C,T,G} the set of possible extensions for primer p The spectrum of primer p, SpecX(p), is the set of probes hybridizing with p The extended spectrum of primer p with extension set Ep,

34 Decodable primer sets Four parallel single-color SBE/SBH experiments  one type of extension in each SBE experiment P is weakly decodable with respect to extension e if for every primer p One SBE/SBH experiment with 4 colors (4 extensions) P is weakly decodable if for every primer p and every extension e ∈ Ep

35 Strongly r-decodable primer sets
Hybridization involving labeled nucleotide is unreliable Informative probes should not rely on it Signal from one SNP may obscure signal from another when read at the same probe due to differences in DNA amplification efficiency Informative probes cannot be shared between SNPs P is strongly r-decodable if for every primer p where r = redundancy parameter

36 MPPP A set of primer pools P ={P1,…,Pn } is strongly r-decodable iff there is a primer pi in each pool Pi such that {p1,…,pn} is strongly r-decodable. Minimum Pool Partitioning Problem (MPPP) Given: primer pools set P and extensions sets Ep, for every primer p probe set X redundancy r Find: partition of P into the min number of strongly r-decodable subsets

37 MDPSP Maximum r-Decodable Pool Subset Problem (MDPSP)
Given: primer pools set P and extensions sets Ep, for every primer p probe set X redundancy r Find: strongly r-decodable subset of P of maximum size

38 Min-Greedy Algorithm for Maximum Induced Matching in General Graphs
Pick a vertex u of min degree Pick a vertex v of min degree from among u’s neighbors Add edge (u,v) to the matching Delete all neighbors of u and v from the graph Repeat the above steps until the graph becomes empty [Duckworth 05] d-1 approximation factor for d-regular graphs

39 Min-Greedy Algorithms for MDPSP
Bipartite hybridization graph G: Primers in left side, probes in right side Two types of edges: N+(p)=SpecX(p) N-(p)=SpecX(p,Ep) \ SpecX(p) Two algorithm variants: MinPrimerGreedy: pick primer first MinProbeGreedy: pick probe first Delete primer/probe if N+ degree drops below r/1

40 Experimental results for k-mers

41 Experimental Results for k-mers

42 Experimental results for c-tokens

43 Experimental Results c=13

44 Overview Background on DNA Microarrays DNA Tag Arrays
- Tag Set Design - Tag Assignment Problem New SBE/SBH Assay - Decoding and Multiplexing Algorithms Conclusions

45 Conclusions and Ongoing Work
Combinatorial algorithms yield significant increases in multiplexing rates of universal DNA arrays New SBE/SBH architecture particularly promising based on preliminary simulation results Ongoing work: Extend methods to more accurate hybridization models, e.g., use NN melting temperature models More complex (e.g., temperature dependent) DNA tag set non-interaction requirements for DNA self/mediated assembly Probabilistic decoding in presence of hybridization errors

46 Acknowledgments Claudia Prajescu and Dragos Trinca
Funding by NSF (CAREER Award IIS ) and UCONN Research Foundation

47 Backup Slides Microarray Technologies Number of c-tokens
Characterization of assignable sets Integer program for MAPS

48 Microarray Technologies
Arrays of cDNAs Obtained by reverse transcription from Expressed Sequence Tags (ESTs) Oligonucleotide arrays Short (20-60bp) synthetic DNA strands

49 Robotic cDNA Arrayers Pin Technology Quill Pen Technology
Ink jet Technology Pin Ring Technology

50 In-Place Oligonucleotide Synthesis
CG AC ACG AG G C Probes to be synthesized A

51 In-Place Oligonucleotide Synthesis
CG AC ACG AG G C Probes to be synthesized C A

52 In-Place Oligonucleotide Synthesis
CG AC ACG AG G C Probes to be synthesized C G G G A

53 Number of c-tokens Token type Num tokens <c-2>S 2 Gc-2
<c-1>W 2 Gc-1 S<c-2>W 4 Gc-2 Total Gc + 2 Gc-1 W=A or T, S=C or G Gn = #strings of weight n  G1 = 2; G2 = 6; Gn = 2Gn-2 + 2Gn-1

54 Number of c-tokens c Num c-tokens 5 208 6 568 7 1552 8 4240 9 11584 10
31648

55 Characterization of Assignable Sets
conflict graph: G=(T  P,E), where (t,p) ∈ E if t hybridizes with p X = number of primers adjacent to a degree 1 tag Y = number of degree 0 tags X=1 Y=2 [Ben-Dor 04] Set P is assignable to T iff X+Y  |P|

56 X+Y Characterization Fails for Pools

57 Integer Linear Program for MAPS
where zpt = 1 iff primer p is assigned to tag t


Download ppt "Design and Optimization of Universal DNA Arrays"

Similar presentations


Ads by Google