Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)

Similar presentations


Presentation on theme: "Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)"— Presentation transcript:

1 Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)
Probe Selection Algorithms with Applications in the Analysis of Microbial Communities James Borneman et al. Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Summarized by Sun Kim

2 Overview Minimizing the number of oligonucleotide probes needed for analyzing populations of rRNA clones by hybridization experiments on DNA microarrays. Propose two heuristics based on optimization techniques Simulated annealing Lagrangian relaxation

3 Introduction Analysis using rRNA genes Goal Adapted strategy
DGGE (Denaturing Gradient Gel Electrophoresis) T-RFLP (Terminal Restriction Fragment Length Polymorphisms) Goal To develop a high-throughput approach for the examination of microbial communities Adapted strategy Oligonucleotide fingerprinting

4 Oligonucleotide fingerprinting
The rDNA clone libraries are constructed The clones are classified by individual hybridization experiments on DNA microarrays with a series of short DNA oligonucleotides into clone types or OTUs (Operational Taxonomic Units) The nucleotide sequence of representative clones from each OUT can be obtained by DNA sequencing

5 Work on Probe Selection
Oligonucleotide fingerprinting results Binary vector (called fingerprint), which describes which probes occur in this clone. Provide Linear fluorescence response over a range of 0-4 occurrences of a probe sequence per clone. Not consistent enough to provide statistically reliable information Nevertheless, adopt non-binary model in the strategy. Considers two models Binary membership Frequency of occurrences up to 4

6 Basic Probe Selection Problem
A population C of m unknown rDNA clones To analyze C, need to choose a set S of oligonucleotide probes of length l Clones : approximately 1500 l : between 6 and 10 A probe p distinguishes a pair of clones c and d if p is a substring of exactly one of c or d. Goal To find a smallest set S of length-l probes such as that any two distinct clones c and d from C are distinguished by at least one probe in S.

7 Difficulties We do not know the rDNA sequences in the population
How can we compute the minimal probe set? Even if we did have complete sequences of these clone, computing optimal probe sets for large data sets is computationally infeasible Propose two-step approach to overcome the difficulties.

8 Two-step Approach Choose a random subset C’ of t rDNA clones from the given population, where t is a parameter chosen by empirical study. Sequence the clones in C’. Compute an optimal, or near-optimal, probe set S for C’. Use S for analyzing the whole clone population. Intuition  if the random subset C’ is large enough, the computed probe set S will be close to being optimal for the whole population. May augment the C’ with known rDNA sequences available in databases (Genbank, Ribosomal Databse) This paper focuses on Step 2.

9 Formulations of Probe Selection
MCPS (Minimum Cost Probe Set) Minimum number of probes that distinguish all given clones.  Lagrangian relaxation MDPS (Maximum Distinguishing Probe Set) A set of k probes, where k is given, that maximizes the number of distinguished pairs of clones.  Simulated annealing Variants of the combinatorial optimization problem SET COVER [Hochbaum, 1997]

10 Previous Work Selection criteria are G+C-content of the oligomers combined with the expected frequency [Fu et al., 1992; Cutichia et al., 1993] Also, based on their frequencies in the clones [Drmanac et al., 1996] Free energy and melting temperature [Li and Stormo, 2000] Information theory (entropy maximization) [Herwig et al., 2000]  First formulation as an explicit optimization

11 Formulations of Probe Selection and Optimization Techniques
Notation : set of clones : set of preselected length-l probes : number of occurences of p in c : Given a set S of probes, S-fingerprint of c Vector of values A set S is distinguishes two clones c and d if : the set of pairs of clones that are distinguished by S

12 Formulations MCPS is a special case of SET COVER [Hochbaum, 1997] MDPS is a special case of MAXIMUM COVERAGE [Hochbaum, 1997]

13 The Simulated Annealing Algorithm for MDPS
neighbor Two sets of probes are neighbors if they can be obtained from each other by substituting exactly one of the probes. According to objective functions SA+entropy, SA+pairs, SA+Largest

14 The Lagrangian Relaxation Algorithm for MCPS
LRSOLUTION Compute an optimal solutin to the Lagrangian relaxation for a given Lagrangian multiplier FEASIBLEEXTENSION Extend the solution obtained from LRSOLUTION to a feasible solution

15 Subgradient optimization
Finding a good multiplier vector

16 LR algorithm Because of constraint matrix  very large

17 Experimental Results Data set
small-subunit ribosomal genes from GenBank large-subunit ribosomal genes from the Ribosomal Database Project II eubacteria samples eubacteria samples

18 Data set 1

19 Data set 3 Binary distinguishability

20 Data set 3 Non-binary distinguishability

21 Data set 4

22 Results of the LR algorithm on data sets 1 and 2

23 Conclusions Present two heuristics
SA + Lagrangian relaxation Get promising results, comparing with the greedy algorithm [Herwig et al., 2000] Future work Some variants of the algorithms Speeding up on the LR algorithm


Download ppt "Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)"

Similar presentations


Ads by Google