Download presentation
Presentation is loading. Please wait.
1
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)
Probe Selection Algorithms with Applications in the Analysis of Microbial Communities James Borneman et al. Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Summarized by Sun Kim
2
Overview Minimizing the number of oligonucleotide probes needed for analyzing populations of rRNA clones by hybridization experiments on DNA microarrays. Propose two heuristics based on optimization techniques Simulated annealing Lagrangian relaxation
3
Introduction Analysis using rRNA genes Goal Adapted strategy
DGGE (Denaturing Gradient Gel Electrophoresis) T-RFLP (Terminal Restriction Fragment Length Polymorphisms) Goal To develop a high-throughput approach for the examination of microbial communities Adapted strategy Oligonucleotide fingerprinting
4
Oligonucleotide fingerprinting
The rDNA clone libraries are constructed The clones are classified by individual hybridization experiments on DNA microarrays with a series of short DNA oligonucleotides into clone types or OTUs (Operational Taxonomic Units) The nucleotide sequence of representative clones from each OUT can be obtained by DNA sequencing
5
Work on Probe Selection
Oligonucleotide fingerprinting results Binary vector (called fingerprint), which describes which probes occur in this clone. Provide Linear fluorescence response over a range of 0-4 occurrences of a probe sequence per clone. Not consistent enough to provide statistically reliable information Nevertheless, adopt non-binary model in the strategy. Considers two models Binary membership Frequency of occurrences up to 4
6
Basic Probe Selection Problem
A population C of m unknown rDNA clones To analyze C, need to choose a set S of oligonucleotide probes of length l Clones : approximately 1500 l : between 6 and 10 A probe p distinguishes a pair of clones c and d if p is a substring of exactly one of c or d. Goal To find a smallest set S of length-l probes such as that any two distinct clones c and d from C are distinguished by at least one probe in S.
7
Difficulties We do not know the rDNA sequences in the population
How can we compute the minimal probe set? Even if we did have complete sequences of these clone, computing optimal probe sets for large data sets is computationally infeasible Propose two-step approach to overcome the difficulties.
8
Two-step Approach Choose a random subset C’ of t rDNA clones from the given population, where t is a parameter chosen by empirical study. Sequence the clones in C’. Compute an optimal, or near-optimal, probe set S for C’. Use S for analyzing the whole clone population. Intuition if the random subset C’ is large enough, the computed probe set S will be close to being optimal for the whole population. May augment the C’ with known rDNA sequences available in databases (Genbank, Ribosomal Databse) This paper focuses on Step 2.
9
Formulations of Probe Selection
MCPS (Minimum Cost Probe Set) Minimum number of probes that distinguish all given clones. Lagrangian relaxation MDPS (Maximum Distinguishing Probe Set) A set of k probes, where k is given, that maximizes the number of distinguished pairs of clones. Simulated annealing Variants of the combinatorial optimization problem SET COVER [Hochbaum, 1997]
10
Previous Work Selection criteria are G+C-content of the oligomers combined with the expected frequency [Fu et al., 1992; Cutichia et al., 1993] Also, based on their frequencies in the clones [Drmanac et al., 1996] Free energy and melting temperature [Li and Stormo, 2000] Information theory (entropy maximization) [Herwig et al., 2000] First formulation as an explicit optimization
11
Formulations of Probe Selection and Optimization Techniques
Notation : set of clones : set of preselected length-l probes : number of occurences of p in c : Given a set S of probes, S-fingerprint of c Vector of values A set S is distinguishes two clones c and d if : the set of pairs of clones that are distinguished by S
12
Formulations MCPS is a special case of SET COVER [Hochbaum, 1997] MDPS is a special case of MAXIMUM COVERAGE [Hochbaum, 1997]
13
The Simulated Annealing Algorithm for MDPS
neighbor Two sets of probes are neighbors if they can be obtained from each other by substituting exactly one of the probes. According to objective functions SA+entropy, SA+pairs, SA+Largest
14
The Lagrangian Relaxation Algorithm for MCPS
LRSOLUTION Compute an optimal solutin to the Lagrangian relaxation for a given Lagrangian multiplier FEASIBLEEXTENSION Extend the solution obtained from LRSOLUTION to a feasible solution
15
Subgradient optimization
Finding a good multiplier vector
16
LR algorithm Because of constraint matrix very large
17
Experimental Results Data set
small-subunit ribosomal genes from GenBank large-subunit ribosomal genes from the Ribosomal Database Project II eubacteria samples eubacteria samples
18
Data set 1
19
Data set 3 Binary distinguishability
20
Data set 3 Non-binary distinguishability
21
Data set 4
22
Results of the LR algorithm on data sets 1 and 2
23
Conclusions Present two heuristics
SA + Lagrangian relaxation Get promising results, comparing with the greedy algorithm [Herwig et al., 2000] Future work Some variants of the algorithms Speeding up on the LR algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.