An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

Bayesian Belief Propagation
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
RNA Structure Prediction
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
R AG P OOLS : RNA-As-Graph-Pools A Web Server to Assist the Design of Structured RNA Pools for In-Vitro Selection R AG P OOLS : RNA-As-Graph-Pools A Web.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Predicting RNA Structure and Function
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Differential Evolution Hossein Talebi Hassan Nikoo 1.
Elements of the Heuristic Approach
Genetic Algorithms and Ant Colony Optimisation
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.
Strand Design for Biomolecular Computation
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Basic Monte Carlo (chapter 3) Algorithm Detailed Balance Other points.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Outline Introduction Evolution Strategies Genetic Algorithm
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
KIAS July 2006 RNA secondary structure Ground state and the glass transition of the RNA secondary structure RNA folding: specific versus nonspecific pairing.
Enumerating, Sampling and Counting: some illustrative cases in biology Enumerating discrete structures Sampling and searching Nested sampling for counting.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
RNA Structure Prediction
Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P)
Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.
Expected accuracy sequence alignment Usman Roshan.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
Consensus Group Stable Feature Selection
Toward Quantitative Simulation of Germinal Center Dynamics Toward Quantitative Simulation of Germinal Center Dynamics Biological and Modeling Insights.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.
Beyond ab initio modelling… Comparative and Boltzmann equilibrium Yann Ponty, CNRS/Ecole Polytechnique with invaluable help from Alain Denise, LRI/IGM,
Motif Search and RNA Structure Prediction Lesson 9.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
RNA Structure Prediction
Basic Monte Carlo (chapter 3) Algorithm Detailed Balance Other points non-Boltzmann sampling.
Lecture 14: Advanced Conformational Sampling Dr. Ronald M. Levy Statistical Thermodynamics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Journal of Computational and Applied Mathematics Volume 253, 1 December 2013, Pages 14–25 Reporter : Zong-Dian Lee A hybrid quantum inspired harmony search.
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Vienna RNA web servers
Predicting RNA Structure and Function
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Presentation transcript:

An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer Science, McGill Centre for Bioinformatics, McGill University, Canada Yann Ponty, PhD Laboratoire d’informatique (LIX), École Polytechnique, France

Philippe Flajolet (1948 – 2011)

RNAmutants: Algorithms to explore the RNA mutational landscape Overview Understanding how mutations influence RNA secondary structures AND how structures influence mutations (Waldispühl et al., PLoS Comp Bio, 2008).

Sampling k-mutants CAGUGAUUGCAGUGCGAUGC (-1.20)..((.(((((...))))))) Classic: 0 mutation CAGUGAUUGCAGUGCGAUcC (-3.40)..(.((((((...))))))) CAGUGAUUGCAGUGCGgUGC (-0.30) ((.((....)).)) CAGUGAUcGCAGUGCGAUGC (-3.10).....(((((...))))).. RNAmutants: 1 mutation uAGcGccgGgAGacCGgcGC (-18.00)..(((((((....))))))) CccUGgccGCAagGCcAgGg (-20.40) ((((((((....)))))))) CcGUGgccGCgagGCcAcGg (-19.10) ((((((((....)))))))) RNAmutants: 10 mutations Seed Sample k mutations increasing the folding energy Consequence: it increases the C+G content

Objectives How to efficiently sample sequences at arbitrary C+G contents … without bias! C+G Content (%) Sample frequency Target C+G content

Outline Background: RNAmutants in a nutshell Algorithms to sample RNA secondary structures and mutations. Our approach: Adaptive sampling Uniformly shifting the distribution of samples. Results: Evolutionary studies Insights on the evolutionary pressure stemming from an optimization of the thermodynamical stability.

Outline Background: RNAmutants in a nutshell Algorithms to sample RNA secondary structures and mutations. Our approach: Adaptive sampling Uniformly shifting the distribution of samples. Results: Evolutionary studies Insights on the evolutionary pressure stemming from an optimization of the thermodynamical stability.

RNA secondary structure The secondary structure is the ensemble of base-pairs in the structure. Bracket notation: ((((((((…)))..((((….)))).(((…)))..)))))

Loop decomposition Stacking pairs

UUUACGGCUAGC Parameterization of the mutational landscape UCUGAAACCCGU UUUACGGCCAGC Sequence ensemble Structure ensemble CCUCAACGAAGC UCUACGGCCAGC UUUAAGGCCAGC 1-neighborhood (1 mutations)

Classical Recursions (Zuker & Stiegler, McCaskill) Enumerate all secondary structures

RNAmutants Generalize Classical Algorithms Enumerate all secondary structures over all mutants (Waldispuhl et al., ECCB, 2002)

Our approach  Explore the complete mutation landscape.  Polynomial time and space algorithm.  Compute the partition function for all sequences:  Sample by backtracking the dynamic prog. tables. RNAmutants (Waldispuhl et al., PLoS Comp Bio, 2008) RNAmutants: Single sequence:

Sampling k-mutants CAGUGAUUGCAGUGCGAUGC (-1.20)..((.(((((...))))))) Classic: 0 mutation CAGUGAUUGCAGUGCGAUcC (-3.40)..(.((((((...))))))) CAGUGAUUGCAGUGCGgUGC (-0.30) ((.((....)).)) CAGUGAUcGCAGUGCGAUGC (-3.10).....(((((...))))).. RNAmutants: 1 mutation uAGcGccgGgAGacCGgcGC (-18.00)..(((((((....))))))) CccUGgccGCAagGCcAgGg (-20.40) ((((((((....)))))))) CcGUGgccGCgagGCcAcGg (-19.10) ((((((((....)))))))) RNAmutants: 10 mutations Seed Sample k mutations increasing the folding energy Consequence: it increases the C+G content

Outline Background: RNAmutants in a nutshell Algorithms to sample RNA secondary structures and mutations. Our approach: Adaptive sampling Uniformly shifting the distribution of samples. Results: Evolutionary studies Insights on the evolutionary pressure stemming from an optimization of the thermodynamical stability.

UUUAAGGCUAGC Our approach: Weighting mutations UCUGAAACCCGU UUUAAGGCCAGC Sequence ensemble Structure ensemble CCUCAACGAAGC UAUAAGGCCAGC UUUAGGGCCAGC w -1 1 w Z w -1. Z C9U 1. Z U2A w. Z A5G Weighted by partition function value Promote A+U content Penalize C+G content No change

Weighting recursive equations ) × W(i,x) × W(j,y) ( × W(j,y)

C+G Content (%) Effect of weighted sampling Unweighted sampling weighted ( w =1/2) weighted ( w =2) Frequency of samples

Sampling pipe-line Keep all samples at the target C+G and reject others. Update w at each iteration using a bisection method. Stop when enough samples have been stored.

Some features After rejection, the weighted schema only impact the performance, not the probability. This is unbiased. Partition function can be written as a polynom: After n iterations we can to calculate all a i and inverse the polynom to compute the optimal weight w. Remark: In practice, less interations are necessary

Example: 40 nt., samples, 30 mutations, 70% C+G content Cumulative distribution

Outline Background: RNAmutants in a nutshell Algorithms to sample RNA secondary structures and mutations. Our approach: Adaptive sampling Uniformly shifting the distribution of samples. Results: Evolutionary studies Insights on the evolutionary pressure stemming from an optimization of the thermodynamical stability.

20 nucleotides40 nucleotides Low C+G-contents favor structural diversity Simulation at fixed G+C content from random seeds 10% 30% 50% 70% 90%

Low C+G contents favor internal loop insertion 10% 30% 50% 70% 90% Number of Internal Loops

20 nucleotides40 nucleotides High G+C-contents reduce evolutionary accessibility Simulation at fixed G+C content from random seeds 10% 30% 50% 70% 90%

Perspectives More studies of Sequence-Structure maps. Applications to RNA design. Same techniques can be applied to other parameters (e.g. number of base pairs). Can be generalized to multiple parameters.

Acknowledgments Ecole Polytechnique Jean-Marc Steyaert Boston College Peter Clote INRIA Philippe Flajolet MIT Bonnie Berger Srinivas Devadas Mieszko Lis Alex Levin Charles W. O’Donnell Google Inc. Behshad Behzadi Yann Ponty CNRS at LIX, École Polytechnique, France. University of Paris 6 Olivier Bodini University of Paris 11 Alain Denise

Would you like to know more?  O. Bodini, and Y. Ponty Multi-dimensional Boltzmann Sampling of Languages, Proceedings of AOFA'10, , 2010  J. Waldispühl, S. Devadas, B. Berger and P. Clote, Efficient Algorithms for Probing the RNA Mutation Landscape, Plos Computational Biology, 4(8):e ,