Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)

Slides:



Advertisements
Similar presentations
Decision Support Andry Pinto Hugo Alves Inês Domingues Luís Rocha Susana Cruz.
Advertisements

Application a hybrid controller to a mobile robot J.-S Chiou, K. -Y. Wang,Simulation Modelling Pratice and Theory Vol. 16 pp (2008) Professor:
Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang Institute of Applied Mathematics,
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An evolutionary Monte Carlo algorithm for predicting DNA hybridization Joon Shik Kim et al. (2008) (Fri) Computational Modeling of Intelligence.
Fingerprint Clustering - CPM Fingerprint Clustering with Bounded Number of Missing Values Paola Bonizzoni, Gianluca Della Vedova, Giancarlo Mauri.
DNA fingerprinting Every human carries a unique set of genes (except twins!) The order of the base pairs in the sequence of every human varies In a single.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
PCR - Polymerase Chain Reaction PCR is an in vitro technique for the amplification of a region of DNA which lies between two regions of known sequence.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Selection of Optimal DNA Oligos for Gene Expression Arrays Reporter : Wei-Ting Liu Date : Nov
Microbial Diversity.
Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.
Maximum Network lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Mihaela Cardei, Jie Wu, Mingming Lu, and Mohammad O. Pervaiz Department.
Reading the Blueprint of Life
Observation Hypothesis Experimental Design (including Methods) Results Inference Camp Wildness 2004 Ward Lab Research Project.
Analysis of Hot Spring Microbial Mat
Introduction to Adaptive Digital Filters Algorithms
Analysis of Microbial Community Structure
DNA Fingerprinting of Bacterial Communities. Overview Targets gene for ribosomal RNA (16S rDNA) Make many DNA copies of the gene for the entire community.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Module 1 Section 1.3 DNA Technology
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Objective 1 - Describe how restriction enzymes are used to manipulate DNA. Run the virtual gel electrophoresis at this web site
Maximum Network Lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Cardei, M.; Jie Wu; Mingming Lu; Pervaiz, M.O.; Wireless And Mobile.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
A Computational Study of Three Demon Algorithm Variants for Solving the TSP Bala Chandran, University of Maryland Bruce Golden, University of Maryland.
Vaida Bartkutė, Leonidas Sakalauskas
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
A Software Tool for Generating Non-Crosshybridizing libraries of DNA Oligonucleotides Russell Deaton, junghuei Chen, hong Bi, and John A. Rose Summerized.
Biotech. Cloning a mammal PCR This is the polymerase chain reaction. It is a technique to multiply a sample of DNA many times in a short period of time.
Advanced Environmental Biotechnology II
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Biotech. Southern Blotting Through a series of steps, DNA that has been separated by electrophoresis is applied to a membrane of nylon or nitrocellulose.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Efficient Point Coverage in Wireless Sensor Networks Jie Wang and Ning Zhong Department of Computer Science University of Massachusetts Journal of Combinatorial.
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Computacion Inteligente Least-Square Methods for System Identification.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Tools for microbial community analysis. What I am not going to talk  Culture dependent analysis  Isolate all possible colonies  Infer community  Test.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Results for all features Results for the reduced set of features
Research in Computational Molecular Biology , Vol (2008)
Summarized by In-Hee Lee
Wet (DNA) computing 2001년 9월 20일 이지연
COURSE OF MICROBIOLOGY
Denaturing Gradient Gel Electrophoresis
DNA Library Design for Molecular Computation
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
On the k-Closest Substring and k-Consensus Pattern Problems
Algorithms for Budget-Constrained Survivable Topology Design
Information Theoretical Probe Selection for Hybridisation Experiments
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Maximum Lifetime of Sensor Networks with Adjustable Sensing Range
SEG5010 Presentation Zhou Lanjun.
Boltzmann Machine (BM) (§6.4)
CS 394C: Computational Biology Algorithms
Russell Deaton, junghuei Chen, hong Bi, and John A. Rose
Survey on Coverage Problems in Wireless Sensor Networks - 2
Survey on Coverage Problems in Wireless Sensor Networks
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Probe Selection Algorithms with Applications in the Analysis of Microbial Communities James Borneman et al. Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Summarized by Sun Kim

Overview Minimizing the number of oligonucleotide probes needed for analyzing populations of rRNA clones by hybridization experiments on DNA microarrays. Propose two heuristics based on optimization techniques Simulated annealing Lagrangian relaxation

Introduction Analysis using rRNA genes Goal Adapted strategy DGGE (Denaturing Gradient Gel Electrophoresis) T-RFLP (Terminal Restriction Fragment Length Polymorphisms) Goal To develop a high-throughput approach for the examination of microbial communities Adapted strategy Oligonucleotide fingerprinting

Oligonucleotide fingerprinting The rDNA clone libraries are constructed The clones are classified by individual hybridization experiments on DNA microarrays with a series of short DNA oligonucleotides into clone types or OTUs (Operational Taxonomic Units) The nucleotide sequence of representative clones from each OUT can be obtained by DNA sequencing

Work on Probe Selection Oligonucleotide fingerprinting results Binary vector (called fingerprint), which describes which probes occur in this clone. Provide Linear fluorescence response over a range of 0-4 occurrences of a probe sequence per clone. Not consistent enough to provide statistically reliable information Nevertheless, adopt non-binary model in the strategy. Considers two models Binary membership Frequency of occurrences up to 4

Basic Probe Selection Problem A population C of m unknown rDNA clones To analyze C, need to choose a set S of oligonucleotide probes of length l Clones : approximately 1500 l : between 6 and 10 A probe p distinguishes a pair of clones c and d if p is a substring of exactly one of c or d. Goal To find a smallest set S of length-l probes such as that any two distinct clones c and d from C are distinguished by at least one probe in S.

Difficulties We do not know the rDNA sequences in the population How can we compute the minimal probe set? Even if we did have complete sequences of these clone, computing optimal probe sets for large data sets is computationally infeasible Propose two-step approach to overcome the difficulties.

Two-step Approach Choose a random subset C’ of t rDNA clones from the given population, where t is a parameter chosen by empirical study. Sequence the clones in C’. Compute an optimal, or near-optimal, probe set S for C’. Use S for analyzing the whole clone population. Intuition  if the random subset C’ is large enough, the computed probe set S will be close to being optimal for the whole population. May augment the C’ with known rDNA sequences available in databases (Genbank, Ribosomal Databse) This paper focuses on Step 2.

Formulations of Probe Selection MCPS (Minimum Cost Probe Set) Minimum number of probes that distinguish all given clones.  Lagrangian relaxation MDPS (Maximum Distinguishing Probe Set) A set of k probes, where k is given, that maximizes the number of distinguished pairs of clones.  Simulated annealing Variants of the combinatorial optimization problem SET COVER [Hochbaum, 1997]

Previous Work Selection criteria are G+C-content of the oligomers combined with the expected frequency [Fu et al., 1992; Cutichia et al., 1993] Also, based on their frequencies in the clones [Drmanac et al., 1996] Free energy and melting temperature [Li and Stormo, 2000] Information theory (entropy maximization) [Herwig et al., 2000]  First formulation as an explicit optimization

Formulations of Probe Selection and Optimization Techniques Notation : set of clones : set of preselected length-l probes : number of occurences of p in c : Given a set S of probes, S-fingerprint of c Vector of values A set S is distinguishes two clones c and d if : the set of pairs of clones that are distinguished by S

Formulations MCPS is a special case of SET COVER [Hochbaum, 1997] MDPS is a special case of MAXIMUM COVERAGE [Hochbaum, 1997]

The Simulated Annealing Algorithm for MDPS neighbor Two sets of probes are neighbors if they can be obtained from each other by substituting exactly one of the probes. According to objective functions SA+entropy, SA+pairs, SA+Largest

The Lagrangian Relaxation Algorithm for MCPS LRSOLUTION Compute an optimal solutin to the Lagrangian relaxation for a given Lagrangian multiplier FEASIBLEEXTENSION Extend the solution obtained from LRSOLUTION to a feasible solution

Subgradient optimization Finding a good multiplier vector

LR algorithm Because of constraint matrix  very large

Experimental Results Data set 1. 1158 small-subunit ribosomal genes from GenBank 2. 131 large-subunit ribosomal genes from the Ribosomal Database Project II 3. 5000 eubacteria samples 4. 2000 eubacteria samples

Data set 1

Data set 3 Binary distinguishability

Data set 3 Non-binary distinguishability

Data set 4

Results of the LR algorithm on data sets 1 and 2

Conclusions Present two heuristics SA + Lagrangian relaxation Get promising results, comparing with the greedy algorithm [Herwig et al., 2000] Future work Some variants of the algorithms Speeding up on the LR algorithm