Download presentation
Presentation is loading. Please wait.
Published byBennett Boyd Modified over 9 years ago
1
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD
2
Non coding DNA (98.5% human genome) Intergenic Intergenic Repetitive elements Repetitive elements Promoters Promoters Introns Introns mRNA untranslated region (UTR) mRNA untranslated region (UTR)
3
RNA Molecules mRNA mRNA tRNA tRNA rRNA rRNA Other types of RNA Other types of RNA -RNaseP – trimming 5’ end of pre tRNA -telomerase RNA- maintaining the chromosome ends -Xist RNA- inactivation of the extra copy of the x chromosome
4
What are RNA and mRNA? Traditional role as messenger molecule (mRNA) Traditional role as messenger molecule (mRNA) RNA is a polymer of nucleotides A, U, C, and G transcribed from DNA GATTACA GAUUACA
5
non-coding RNA (RNA genes) RNA enzymes: catalytic RNA RNA enzymes: catalytic RNA Ribosomal RNA (rRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA) Transfer RNA (tRNA) RNAi: RNA mediated gene regulation Micro RNA (miRNA) Short-interfering RNA (siRNA) Alternative splicing: small-nuclear RNA (snRNA) Others: snoRNA, eRNA, srpRNA, tmRNA, gRNA Structure essential to function for many ncRNAs
6
Some biological functions of ncRNA Nuclear export Nuclear export mRNA cellular localization mRNA cellular localization Control of mRNA stability Control of mRNA stability Control of translation Control of translation The function of the RNA molecule depends on its folded structure
7
Most biological molecules contain one- dimensional information that is called “sequence”, which can be treated as “string” in computer science. Most biological molecules contain one- dimensional information that is called “sequence”, which can be treated as “string” in computer science. Molecules with sequence: DNA, RNA and proteins Molecules with sequence: DNA, RNA and proteins Molecules without much sequence information: Lipids and polysaccharides. Molecules without much sequence information: Lipids and polysaccharides. From Sequence to Structure
8
Can all the properties of a macromolecule be predicted by its sequences? Three dimensional structures Three dimensional structures Alternate splicing Alternate splicing Kinetics properties Kinetics properties Etc. Etc.
9
RNA sequence hierarchy 1D: CCAUCUUCUCCUUGGAGAUUUGG 1D: CCAUCUUCUCCUUGGAGAUUUGG 2D: 2D: 3D: 3D:
10
Control of Iron levels by mRNA secondary structure G U A G C N N N’ C N N’ 5’3’ conserved Iron Responsive Element IRE Recognized by IRP1, IRP2
11
IRP1/2 5’ 3’ F mRNA 5’ 3’ TR mRNA IRP1/2 F: Ferritin = iron storage TR: Transferin receptor = iron uptake IRE Low Iron IRE-IRP inhibits translation of ferritin IRE-IRP Inhibition of degradation of TR High Iron IRE-IRP off -> ferritin translated Transferin receptor degradated
12
RNA Secondary Structure U U C G U A A U G C 5’ 3’ 5’ G A U C U U G A U C 3’ STEM LOOP The RNA molecule folds on itself. The RNA molecule folds on itself. The base pairing is as follows: The base pairing is as follows: G C A U G U G C A U G U hydrogen bond. hydrogen bond.
13
RNA Secondary structure G G A U U G C C G G A U A A U G C A G C U U INTERNAL LOOP HAIRPIN LOOP BULGE STEM DANGLING ENDS 5’3’
14
Examples of known interactions of RNA secondary structural elements Pseudo-knot Kissing hairpins Hairpin- bulge contact These patterns are excluded from the prediction schemes as their computation is too intensive.
15
What is RNA secondary structure/folding? bulge loop helix (stem) hairpin loop internal loop multi-branch loop
16
2D: mRNA Regulatory elements Mini-Rose and Macro-Rose at 37 o C and 42 o C
17
Legal structure Pseudoknots RNA secondary structure representation, also:
18
RNA 2D structure in Matlab
19
3D motifs
20
16S rRNA 2 O structure can be predicted from 1 O structure BacteriaArchaea Eukarya
21
How is RNA folding done? Simple Nussinov Folding Algorithm Only scores interactions between paired bases Useful for demonstrating general structure of more complex folding algorithms Score for optimal structure from base i to base j Base i is unpaired, consider pairing between i+1 and j We want the highest scoring fold Base j is unpaired, consider pairing between i and j-1 δ (i, j) = score for a pairing between i and j.
22
How is RNA folding done? Simple Nussinov Folding Algorithm Only scores interactions between paired bases Useful for demonstrating general structure of more complex folding algorithms Pair i and j. Now consider pairing between i+1 and j-1.
23
How is RNA folding done? Simple Nussinov Folding Algorithm Only scores interactions between paired bases Useful for demonstrating general structure of more complex folding algorithms i and j begin a bifurcation. Consider every possible bifurcation point k. Sum scores from each folded structure.
24
CONTRAfold Problem: Given an RNA sequence, predict the most likely secondary structure AUCCCCGUAUCGAUC AAAAUCCAUGGGUAC CCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA
25
How does CONTRAfold work? CONTRAfold looks at features that indicate a good structure CONTRAfold looks at features that indicate a good structure C-G base pairings A-U base pairings Helices of length 5 Hairpin loops of size 9 Bulge loops of size 2 CG/GC Base-pair stacking interactions For example: These examples are called thermodynamic parameters because they represent free energy values
26
What is an RNA regulatory motif? Motif: A conserved sequence element Motif: A conserved sequence element A regulator binds to a regulatory motif RNA regulatory motif: A motif used to regulate translation G A U U A C A... RNA Regulatory motif (AUUAC) Regulatory protein Micro RNA U A A U G microRNA
27
What is an accessible motif? If a sequence is part of an intramolecular hybridization, it is unlikely to bind to regulators We define a motif as “accessible” if none of its nucleotides is hybridized as part of the folding
28
Accessible motifs cont’d Therefore, only accessible sequences should be scanned for regulatory motifs
29
Accessible motifs cont’d Therefore, only accessible sequences should be scanned for regulatory motifs.
30
Results: Degradation Related Motifs
31
Prediction Tools based on Energy Calculation Fold, Mfold Zucker & Stiegler (1981) Nuc. Acids Res. 9:133- 148 Zucker (1989) Science 244:48-52 RNAfold Vienna RNA secondary structure server Hofacker (2003) Nuc. Acids Res. 31:3429-3431
32
Links http://rna.tbi.univie.ac.at/ http://rna.tbi.univie.ac.at/ http://rna.tbi.univie.ac.at/ http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi http://frontend.bioinfo.rpi.edu/applications/mfold/ http://frontend.bioinfo.rpi.edu/applications/mfold/ http://frontend.bioinfo.rpi.edu/applications/mfold/ http://frontend.bioinfo.rpi.edu/applications/mfold/cgi- bin/rna-form1.cgi http://frontend.bioinfo.rpi.edu/applications/mfold/cgi- bin/rna-form1.cgi http://frontend.bioinfo.rpi.edu/applications/mfold/cgi- bin/rna-form1.cgi http://frontend.bioinfo.rpi.edu/applications/mfold/cgi- bin/rna-form1.cgi http://bioweb2.pasteur.fr/nucleic/intro- en.html#rna http://bioweb2.pasteur.fr/nucleic/intro- en.html#rna http://bioweb2.pasteur.fr/nucleic/intro- en.html#rna http://bioweb2.pasteur.fr/nucleic/intro- en.html#rna http://mobyle.pasteur.fr/cgi- bin/MobylePortal/portal.py?form=mfold http://mobyle.pasteur.fr/cgi- bin/MobylePortal/portal.py?form=mfold http://mobyle.pasteur.fr/cgi- bin/MobylePortal/portal.py?form=mfold http://mobyle.pasteur.fr/cgi- bin/MobylePortal/portal.py?form=mfold
33
RNAalifold (Hofacker 2002) From the vienna RNA package Predicts the consensus secondary structure for a set of aligned RNA sequences by using modified dynamic programming algorithm that add covariance term to the standard energy model Improvement in prediction accuracy
34
Other related programs COVE COVE RNA structure analysis using the covariance model (implementation of the stochastic free grammar method) QRNA (Rivas and Eddy 2001) QRNA (Rivas and Eddy 2001) Searching for conserved RNA structures tRNAscan-SE tRNA detection in genome sequences tRNAscan-SE tRNA detection in genome sequences Sean Eddy’s Lab WU http://www.genetics.wustl.edu/eddy
35
RNA families Rfam : General non-coding RNA database Rfam : General non-coding RNA database (most of the data is taken from specific databases) http://www.sanger.ac.uk/Software/Rfam/ Includes many families of non coding RNAs and functional Motifs, as well as their alignement and their secondary structures
36
Rfam 379 different RNA families or functional 379 different RNA families or functional Motifs from mRNA UTRs etc. GENE INTRON Cis ELEMENTS
37
Scopes of sequence analysis Scopes of sequence analysis Sequences only Sequences only Sequences of DNA/RNA/Proteins for defining transcription unit and intron Sequences of DNA/RNA/Proteins for defining transcription unit and intron Sequence with other kinds of data Sequence with other kinds of data Microarray Microarray 3D-data such as comparative modeling 3D-data such as comparative modeling Metabolic data Metabolic data
38
Purposes of Sequence Analysis include Identification of coding regions Identification of coding regions Identification of regulatory elements Identification of regulatory elements Identifying events of genetic recombination Identifying events of genetic recombination Identifying the existence of selective pressures Identifying the existence of selective pressures Searching for homologous sequences Searching for homologous sequences Identifying shared patterns of a group of sequences Identifying shared patterns of a group of sequences Modeling secondary or 3D structures (ab initio modeling) Modeling secondary or 3D structures (ab initio modeling)
39
What can you do with a single sequence? Information content in a sequence Information content in a sequence G/C content G/C content Codon usage Codon usage Synonymous/Non-synonymous mutations Synonymous/Non-synonymous mutations Periodicity Periodicity
40
Based on 16S rRNA sequences and others, organisms can be divided into 3 superkingdoms.
41
Information contained in SSU rRNA genes SSU rRNA sequences are used for universal phylogenetic construction SSU rRNA sequences are used for universal phylogenetic construction Its length is ~1500 bp~ 3000 bits of information Its length is ~1500 bp~ 3000 bits of information Life begins ~ 3.5-3.8 billion years Life begins ~ 3.5-3.8 billion years
42
Maximum average change of information in SSU rRNA genes 1 bit/million years of evolution 1 bit/million years of evolution In general, SSU rRNA may not be suitable for constructing a fine phylogenic tree with branches that separate <1 million years. In general, SSU rRNA may not be suitable for constructing a fine phylogenic tree with branches that separate <1 million years.
43
Homology vs Similarity “Homology” implies common origin. “Homology” implies common origin. Homologous sequences are generally recognized by similarity Homologous sequences are generally recognized by similarity Similar sequences may not be homologous but may be due to convergent evolution or just by chance. Similar sequences may not be homologous but may be due to convergent evolution or just by chance. Homology is qualitative, while similarity is quantitative Homology is qualitative, while similarity is quantitative
44
Orthologous vs Paralogous Orthologues usually refer to homologous genes with the same function in different organisms. Orthologues usually refer to homologous genes with the same function in different organisms. Paralogues usually refer to homologous genes with different functions, usually in the same organisms. Paralogues usually refer to homologous genes with different functions, usually in the same organisms. Bioinformatic tools cannot differentiate between both. Bioinformatic tools cannot differentiate between both.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.