Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder.

Slides:



Advertisements
Similar presentations
New structure-based methods for the phylogenetic analysis of ribosomal RNA sequences using the parsimony optimality criterion Joseph J. Gillespie Matthew.
Advertisements

Applied Discrete Mathematics Week 11: Graphs
 Species evolve with significantly different morphological and behavioural traits due to genetic drift and other selective pressures.  Example – Homologous.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
DNA sequences alignment measurement
Evidences of Evolution
Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Data representation: techniques and trade-offs Rob Knight Dept. Chem. & Biochem. CU Boulder.
Protein Modules An Introduction to Bioinformatics.
Sequence similarity.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
The Evidence for Evolution. Problem: How did the great diversity of life originate? Alternative Solutions: A. All living things were created at the same.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
1 Unity of Invention: Biotech Examples TC1600 Special Program Examiner Julie Burke (571)
Homologous organs Analogous organs Vestigial organs
 Fossil: Any non-living object obtained from the ground indicating the former presence of a living thing in a broad sense is a FOSSIL  Rock strata can.
Evidence for Evolution. 5 types of Evidence for Evolution Palaeontology Biogeography Comparative Embryology Comparative Anatomy DNA sequencing (biochemistry)
Mechanisms of Evolution Convergent Divergent Adaptive radiation.
Karolina Muszyńska Based on: S. Wrycza, B. Marcinkowski, K. Wyrzykowski „Język UML 2.0 w modelowaniu SI”
D.5: Phylogeny and Systematics
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Romain Rivière AReNa –  Characterise RNA families  Improve non-coding RNA identification in genomic data  Determine the RNA players in.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
THE CLASSIFICATION OF LIVING ORGANISMS
Give me some proof! Evidence for Evolution. 1. Studies of Fossils What are Fossils? –Fossils are any trace of dead organisms.
U.S. Patent and Trademark Office Technology Center 1600 Michael P. Woodward Unity of Invention: Biotech Examples.
Classification and energetics of the base-phosphate interactions in RNA Jesse Stombaugh.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
Analysis of microbial communities with QIIME Justin Kuczynski February 2012.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Life in General Living matter is organized into complex structures based on organic molecules. They have cells. Homeostasis is maintained by Living.
Protein and RNA Families
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Evidence of Evolution Main Types of Evidence 1. Fossils 2. Homologous structures 3. Embryology.
Principle of Classification.  Humans primarily emphasize traits that can be seen with their eyes  Biologists also classify organisms into different.
This seems highly unlikely.
RNASTAR, Greengenes, and QIIMEdb Rob Knight HHMI/CU Boulder.
Evidence of Evolution Nothing in biology makes sense except in the light of evolution. – Theodosius Dobzhansky.
Reference Text: Modern Biology Chapter 18 – Section 2 Modern Phylogenetic Taxonomy pgs
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Evolution. Charles Darwin The modern theory of evolution is the fundamental concept in biology Evolution changes populations over time Charles Darwin.
Riboswitch Structures: Purine Ligands Replace Tertiary Contacts
Bellwork: Pull out Natural Selection Lab Summary for me to read.
Evidence of Evolution From Biology
Lecture 81 – Lecture 82 – Lecture 83 Modern Classification Ozgur Unal
Evidence of Evolution From Biology
5.4 Cladistics.
THE CLASSIFICATION OF LIVING ORGANISMS
Evidence of Evolution Organism DNA
The Evidence for Evolution
Homologous Structures
Systematics: Tree of Life
Evidence for Evolution
D.5: Phylogeny and Systematics
The Theory of Evolution
Comparative RNA Structural Analysis
Chapter 4 The Interrupted Gene.
Systematics: Tree of Life
Evidence for Evolution
Evidence for Evolution
RNA enzymes: Putting together a large ribozyme
Evolution Biology Mrs. Johnson.
Presentation transcript:

Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

What do we want to do? Represent detailed structural info and other metadata on alignment Avoid horizontal and vertical expansion Explicitly annotate correspondences at the level where they occur

Homology is problematic… Fundamental problem: systems that are homologous at one level are not necessarily homologous at other levels E.g. bat wings and bird wings: homologous as pentadactyl limbs, but not homologous as wings Homology is hierarchical and can partially overlap at any level (e.g. Griffiths 2006) Ridley “Evolution” 3rd ed. Bat forelimbs Bird forelimbs Frog forelimbs Rodent forelimbs Mammal forelimbs Tetrapod forelimbs

…and correspondence need not be homology at all! Example from SELEX: hammerhead ribozymes independently evolved at least three times: in nature, and in Jack Szostak and Ron Breaker’s labs However, we still want to be able to align the functionally equivalent sequences although there is not evolutionary relationship

So what are going to use the alignment ontology for?

Use case 1: aligning rRNA

Problem: have millions of fragments, want to align (incl. noncanonical pairs) + assign named regions

Solution Use existing alignment, try to fit new seqs in Would be improved if we could explicitly annotate helices, noncanonical pairs, etc. on the sequence overall For display, need to easily show/hide groups of sequences and/or regions of the sequence

Use case 2: SELEX From large number of unaligned sequences, want to identify motifs like this (Majerfeld & Yarus 2005)

How is this currently done? Find regions that are similar in more sequences than chance Group these sequences centered on the “motif” See if the parts of the motif can be related by helices See if anything else is reliably found by the motif Repeat for other families and see if there are relationships between them Group these families together, then iterate

e.g. here we discovered unpaired G important

So how do we handle all this? A proposal Entities: sequence_region: a thing that defines a set of bases relative to some sequence (i.e. with indices for each base) stem: two regions linked by pairs unbroken_stem: two regions completely paired base: region that consists of single nucleotide base_pair: region that consists of two, paired bases canonical_base_pair: base pair that is cis-WW terminal_loop: contiguous sequence_region stretching from i to j such that i-1 and j+1 are a base pair internal_loop: unpaired region that interrupts one unbroken_stem junction: unpaired region that connects two or more stems

So how do we handle all this? A proposal (cont’d) Relationships: correspondence: relation among set of sequence_regions implying all share a feature (with metadata about how determined) homology: correspondence implying continuous chain of descent preserving the relation sequence_similarity: correspondence implying regions are similar in primary sequence two_d_structure_similarity: correspondence implying regions are similar in 2D structure, i.e. nested canonical base pairs secondary_structure_similarity: correspondence implying regions are similar in secondary structure, i.e. incl. pseudoknots/noncanonicals tertiary_structure_similarity: correspondence implying regions are similar in 3D structure

So how do we handle all this? A proposal (cont’d) Relationships: pairing: relation that asserts that two sequence_regions each have parts of at least one base_pair that connects them stem_pairing: pairing that includes several base_pairs (not necessarily contiguous) between two sequence_regions unbroken_stem_pairing: stem_pairing that includes no bases in the sequence_regions that are not paired with the other sequence_region, in order base_pairing: pairing that connects exactly two bases, annotated with the Leontis-Westhof classification More exotic uses for alignment: microrna_target: pairing relation in which one member is a miRNA and the other is an mRNA according to SO same_microrna_target: a relation among a set of sequences that have microrna_target relation to the same miRNA

Definitions Correspondence: A relation between regions of an RNA alignment, which can occur between molecules or within a molecule. These relations are reflexive, symmetric and transitive. Region: Consists of a single RNA nucleotide or a set of RNA nucleotides. Regions can be continuous spans of nucleotides or discontinuous collections of contiguous spans. Single base pairs, terminal loops, junctions, etc. are all examples of regions. Homology: A correspondence that implies descent from a common ancestor with evolutionary continuity. Similarity: A correspondence that can be defined in terms of a quantitative measurement, typically at some structural level. Sequence similarity: A similarity defined at the primary sequence level, e.g. 95% sequence identity. Secondary structure similarity: A similarity defined at the secondary structure level, e.g. 50% of base pairs in common. 3D structure similarity: A similarity defined at the 3D structure level, e.g. 3 Angstrom RMSD. Basepairing: A relation between two RNA nucleotides, defined by base-base hydrogen-bonding interactions. Function: The properties of a biological entity for which it is maintained by evolutionary selection

Acknowledgements RNA Alignment Ontology working group: James. W. Brown Fabrice Jossinet Rym Kachouri B. Franz. Lang Neocles Lenotis Gerhard Steger Jesse Stombaugh Eric Westhof Other coauthors: Amanda Birmingham Paul Griffiths Franz Lang NSF RCN grant # Knight Lab members: Cathy Lozupone Micah Hamady Chris Lauber Jesse Zaneveld Jeremy Widmann Elizabeth Costello Jens Reeder Daniel McDonald Anh Vu Ryan Kennedy Julia Goodrich Meg Pirrung Reece Gesumaria Tony Walters Bob Larsen Trp project: Irene Majerfeld Jana Chocholousova Vikas Malaiya Matthew Iyer Mike Yarus