Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8.21 Lecture 8.2: RNA Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia

Similar presentations


Presentation on theme: "Lecture 8.21 Lecture 8.2: RNA Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia"— Presentation transcript:

1 Lecture 8.21 Lecture 8.2: RNA Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia jennifer@cmdr.ubc.ca

2 Lecture 8.22 Bad News Disclaimer Jenn is not an RNA researcher

3 Lecture 8.23 Good News Disclaimer NO, NO, NO, NO, NO, NO, NO MATH!All new RNA lecture! Now with less scary algorithms! And no dynamic programming exercises! AND NO MATH! NO, NO, NO, NO, NO, NO, NO MATH!

4 Lecture 8.24 Outline What is RNA? RNA’s many roles in the cell Levels of RNA structure Secondary structure: –Elements –Predictive methods for single and multiple queries Tertiary structure prediction RNA databases Finding functional RNAs in the genome

5 Lecture 8.25 What is RNA? Ribonucleic acid Ribose sugar –vs DNA’s deoxyribose –Less stable Uracil base –vs DNA’s thymine –“Cheap” to produce Single-stranded http://en.wikipedia.org/wiki/RNA

6 Lecture 8.26 DNA vs RNA Structure Single strand RNA base pairing: G-C, A-U, G-U and more –“Canonical”, “Watson-Crick” –“GU wobble” –“AU reverse Hoogsteen” Single-stranded RNA will fold in on itself to form all sorts of structures! Double strand DNA base pairing: G-C, A-T “Canonical”, “Watson-Crick”

7 Lecture 8.27 From Structure to Function Single-stranded RNA can fold into a variety of secondary and tertiary structures RNA is able to play a number of functional roles in the cell! What are some of the types of RNA you’ve heard of?

8 Lecture 8.28 Types of RNA: mRNA (Coding/Informational) Messenger RNA is the middleman between gene and protein: –RNA pol transcribes a gene into mRNA –mRNA is processed and transported out of nucleus mRNA is translated into protein sequence at the ribosome Used transcripts are degraded Molecular Cell Biology, 4 Uh ed.

9 Lecture 8.29 Types of RNA: ncRNAs (Functional) Non-coding RNAs are RNAs that carry out their function without ever being translated into protein: –tRNA (transfer RNA) Anticodon recognizes triplet on mRNA Opposite end charged with specific amino acid –rRNA (Ribosomal RNA) 80% of cell’s RNA Together with certain proteins, form the ribosome Molecular Cell Biology, 4 th ed.

10 Lecture 8.210 Other families of ncRNAs ncRNAs are involved in a variety of cellular processes: Gene regulation (micro RNAs/miRNAs, riboswitches) –miRNAs bind 3’ UTR of specific gene to suppress translation –Riboswitches are cis-acting elements in 3’ UTR of certain genes Bind a target molecule, translation up/down-regulated when bound Modification of RNAs (small RNAs: sn/sno/gRNAs) Catalysis of reactions (ribozymes) Protein trafficking (small cytosolic RNAs/scRNAs) Interference with protein synthesis (antisense, RNAi)

11 Lecture 8.211 Take Home Messages: The RNA world consists of MUCH more than mRNAThe RNA world consists of MUCH more than mRNA Non-coding, or functional, RNAs (ncRNAs) perform a number of important functions and are of significant biological interestNon-coding, or functional, RNAs (ncRNAs) perform a number of important functions and are of significant biological interest Virtually all RNA bioinformatics centres around the identification, analysis, and structure of ncRNAsVirtually all RNA bioinformatics centres around the identification, analysis, and structure of ncRNAs

12 Lecture 8.212 RNA Structure has Protein-like Hierarchy Molecular Cell Biology, 4 th ed.

13 Lecture 8.213 Levels of RNA Structure Primary = sequence itself Single-stranded RNA Bases want to pair with each other –Unusual pairings are possible CGGUGCUCUUUCUGCCGCGCAGAAGAGCGGCGCGGAUCUGUUCGUUCUUGC UGAGCACGGGCGUGGGAACCAGUGCCGGCUCGGGCGCCUCGACCUUGGCGC UCUGUUUGAGGCUCAGGAAGCGCUUCUGCGCGACCAGCCGGUCUUCGGCCG ACAAGGGUUCGACCGGUUCGCCGUCGAUGUUGUGGCGCAUGGAAUCGGGCU GGGCGCUGGCGAAAUAAUAGCGCCGGCAAUGGACAAAGGCAGCGGUCGCGC GCCGGAGCGUCGUUACACCGGCCUCGGGCUUCAAGAGCGGACGCAGCUCGU UGAAGAGGCCGACGGCGAAGGGAAGAACGGGAUCGCCGGGCUUGGCCGGAA GCACGCCGACCGGCCGGAUCAACAUGCUGUUGAUCGCGUUGGCCUUCUCCA CGUCGAGCUCGGUCGCCGCAAUCGGCCCGCGGCUGAUCUUCCAGGGUUUGU CCAUGCAUCCUCCAUAAAGGCCGACUGUGAUGGUAUCUUGACGGAUGGGGC AAUAGCGGUGCGGCCUGCAUAUUGCUAGCCCCGCUGCAAGCCUGCGACGAG ACGCCGUGUCCCGCGGAUAUAGGCCCGCCUUUCUCGGCCGGCGAUUCUCUA UUCAAUCAAUUUUAUCGCCGAUUAUGCAUUGACCUCCAGCAUCGAUUACAC UUCUUAUCCGCGCCCAAGAUCAAUGCCGGCCGCGGGGGAGGAUAUAUGCGG GUUUUCUCGAGCAUCGAUGAGCUGCGCCACACGCUCGAUGCGCUCAAACGC

14 Lecture 8.214 Levels of RNA Structure Secondary structure Patterns formed by base pairing

15 Lecture 8.215 Levels of RNA Structure Tertiary = 3D structure formed by interaction of 2° elements

16 Lecture 8.216 Secondary Structure is Interesting! Easy to study: –Secondary structure usually determined before tertiary –Chemical modification assays chew up bases not involved in specific secondary structure interactions ncRNA functions correlated to specific secondary structure elements: –E.g. hairpin loop shape involved in gene regulation Therefore, most RNA resources/methods are focused on secondary structure: –Visualization –Prediction –Annotation of elements

17 Lecture 8.217 Elements of Secondary Structure Hairpin Loops: Backbone makes 180° bend

18 Lecture 8.218 Elements of Secondary Structure Internal Loops: Pairing of both strands interrupted equally

19 Lecture 8.219 Elements of Secondary Structure Bulge Loops: Pairing of one strand interrupted, unequal

20 Lecture 8.220 Elements of Secondary Structure Multibranch Loops: AKA helical junction, joins two or more stems with no bulge

21 Lecture 8.221 Elements of Secondary StructureStems/Helices: Base pairs, e.g. G C

22 Lecture 8.222 Elements of Secondary Structure

23 Lecture 8.223 Visualizing 2° Structure Dot-bracket: –() = paired bases (stems) –. = unpaired base (loops) C A A A A C U U G G U U U G CAAAACUUGGUUUG Dot-plot: –Symmetrical – = paired basesGUUUGGUUCAAAAC((((......)))) Graphical:

24 Lecture 8.224 Secondary Structure Prediction Two categories of 2° prediction methods: –Ones that take a single sequence as input Mfold, Vienna, RNAStructure, Sfold –Ones that require multiple (aligned) input sequences Infernal, ConStruct, Alifold, Pfold, FOLDALIGN, Dynalign Important to both categories is the idea of free energy minimization: Molecules fold to achieve the lowest energy state possible (minimum free energy, or MFE) Given a set of potential structures, those with the lowest free energies are most stable and most likely to be found in a cell.

25 Lecture 8.225 Calculating a 2° Structure’s Free Energy Computed using nearest-neighbour parameters Each possible pair of neighbouring structural elements has an associated free energy value (kcal/mol) –Negative values = good, found in stable, base-paired stems –Positive values = bad, found in loops and bulges Sum all values over structure to get overall free energy G G G G A C C C C U U G G C C U U G G A A G A A A C A -3.3 -2.1 -0.6 +1.0 -3.3 +5.6 -1.3 -1.1 -1.3-3.3-2.1-0.6+1.0-3.3-1.1+5.6-5.1kcal/mol

26 Lecture 8.226 2° Structure Prediction Approach 1: MFE Used when you have a single sequence as input Forms the basis of the tools Mfold, Vienna, RNAStructure and Sfold Basic principle: –Generate a series of possible secondary structures –Calculate the free energy of each –Returns lowest energy structure(s) Two possible implementations: –Naïve MFE –Dynamic programming MFE

27 Lecture 8.227 Naïve MFE Prediction Fold the query RNA into ALL possible secondary structures Calculate free energy for each structure Problem: A 50-base RNA can have over Naïve MFE prediction is virtually never used Reminiscent of database searching problem Solution: Heuristic approach that breaks down the RNA sequence 5000 BILLION POSSIBLE STRUCTURES!!!

28 Lecture 8.228 Dynamic Programming MFE Prediction Break RNA query into small subsequences Generate possible secondary structures for subsequences, select lowest free energy substructure Combine substructures into overall structure using dynamic programming: S. Eddy, Nat. Biotech. 2004

29 Lecture 8.229 Dynamic Programming MFE Prediction Typically yields one lowest energy secondary structure and a number of suboptimal structures –Not the lowest energy, but still energetically favourable Benefits of DP:MFE: –Only needs single sequence as input, fast Pitfalls of DP:MFE: –Correctly predicts structure of only 50-70% of bases in a given RNA Thermodynamic parameters have 5-10% error rate Many known secondary structures are not the lowest free energy: may be within 5-10% kcal/mol –Lowest free energy structure not always biologically correct –Can improve structure predictions with constraint information “This residue must base pair with this residue”

30 Lecture 8.230 2° Structure Prediction Approach 2: Comparative Sequence Analysis RELATEDUsed when you have multiple RELATED input RNAs Two underlying principles: Different RNA sequences (different primary structures) can fold into IDENTICAL secondary and tertiary structures. A ncRNA’s structure and function is maintained throughout evolution. A mutation in one member of a pair of interacting residues necessitates a change in the other member of the pair. These are called compensatory base changes (CBCs).

31 Lecture 8.231 The Two Principles in Graphical Form Different RNA sequences (different primary structures) can fold into IDENTICAL secondary and tertiary structures. A mutation in one member of a pair of interacting residues necessitates a change in the other member. GUUUGGUUCAAAAC((((......))))GGAUGGUUCAAUCC((((......)))) GUUUGGUUCAAAAC((((......)))) GGAUGGUUCAAUCC((((......)))) **

32 Lecture 8.232 From Alignment to 2° Structure Comparative sequence analysis (CSA) methods require an alignment of related RNAs as input Gaps and conserved columns are removed, leaving only variant columns Variant residues in sequence #1 that might base pair are noted Check for covariance – could these also base pair in seqs #2 and #3? Provides constraint information GUUUGG-UCAAAACGGAUGGUUCAAUCCGCAGGGU-CAAUGCGUUUGG-UCAAAACGGAUGGUUCAAUCCGCAGGGU-CAAUGC GUUUGG-UCAAAACGGAUGGUUCAAUCCGCAGGGU-CAAUGC* * ** 4/5 columns contain residues that may pair

33 Lecture 8.233 CSA/Covariance Prediction Methods Constraint information derived from covariance analysis is combined with energy minimization and dynamic programming to generate a final prediction Disclaimer: this is a very simplified explanation. In reality, this type of analysis requires knowledge of concepts from information theory and math that are very, very scary indeed:

34 Lecture 8.234 Ha!

35 Lecture 8.235 CSA/Covariance Methods: Scary but Useful Infernal, ConStruct, Alifold, Pfold, FOLDALIGN, Dynalign do the scary stuff for you Requires multiple related sequences for input, but provides MUCH better predictions –Limited to 500-base input sequence Secondary structures of as many as 97% of bases in a given RNA are correctly predicted using this method –Vs. 50-70% with basic MFE methods

36 Lecture 8.236 Tertiary Structure Prediction How do we go from 2° structure to 3° structure? We don’t. Well, not easily, anyway.

37 Lecture 8.237 Why is 3° Structure Prediction So Hard? Relative lack of 3D RNA structures available NMR: –Small loops –Practical limit of 50 base pairs Few automated methods mean that 3° structure prediction requires lots of user guidance and knowledge about the field Methods produce “coarse-grain” resolution structures: major features predicted correctly, finer atomic-level contacts incorrect

38 Lecture 8.238 3° Structure Prediction: MC-SYM Developed by Francois Major, U. Montreal Uses information derived from known 3D RNA structures to build a series of models –Assemble bases into structures matching known elements –Avoid elements not found in known structures –Can incorporate constraint information Requires complex input –MC-SYM script Generates PDB files as output

39 Lecture 8.239 sequence (r 31 ACUGAAGAU) // Conformations ------------------------------------------- residue ( 31 { helix } 1 39 { helix } 1 32 38 { type_A } 15 ) // Relations ----------------------------------------------- connect ( 31 33 { stack } 20 33 34 { ! stack } 20 34 39 { stack } 20 ) pair (31 39 { wct } 1) // Building ------------------------------------------------ anticodon = backtrack ( (31 39) (39 38 37 36 35 34) (31 32 33) ) // Constraint ---------------------------------------------- adjacency (anticodon 1.0 2.5) res_clash ( anticodon fixed_distance 1.0 all no_hydrogen ) // Exploration --------------------------------------------- explore ( anticodon rmsd (1.0 base_only no_hydrogen) file_pdb ("ANTI/anti-%04d.pdb" zipped) ) Sample MC-SYM Script

40 Lecture 8.240 RNA Databases NAR Database Portal lists 51 RNA DBs –http://www3.oup.co.uk/nar/database/cat/2 RNA databases tend to be specialized

41 Lecture 8.241 Rfam: RNA Families Database http://www.sanger.ac.uk/Software/Rfam/

42 Lecture 8.242 An Rfam Entry Represents 1 RNA Family Class of ncRNA with specific function and 2° structure Annotation 2° structure CSA/Covariance alignments Lit refs

43 Lecture 8.243 Rfam: CSA/Covariance Structures Colour blocks show which bases pair with each other –Red with red, blue with blue, green with green Dot-bracket notation also helps in visualization

44 Lecture 8.244 Searching Rfam Keyword search –Name of RNA or any word in annotation (function, interactor) E.g. Spot 42, spf, regulation, galactose, OxyS EMBL ID search BLAST search –< 2kb sequence allowed Browse –Gene, cis-reg, intron Genomes –Rfam ncRNAs identified in many genomes –How many families occur how many times in my genome?

45 Lecture 8.245 SCOR: Structural Classification of RNA http://scor.lbl.gov/scor.html

46 Lecture 8.246 SCOR Structural Hierarchy

47 Lecture 8.247 SCOR: Structural Classification of RNA http://scor.lbl.gov/scor.html

48 Lecture 8.248 SCOR Functional Hierarchy

49 Lecture 8.249 RNABase: 3D Structures of RNA http://www.rnabase.org

50 Lecture 8.250 RNABase Daily download of RNA structures from PDB and NDB –X-ray crystallography & NMR Provides annotations to go with structures –Measures of structure quality Can be searched/browsed by category, keyword, technique, resolution, structure quality

51 Lecture 8.251 Finding ncRNAs in the Genome Rfam currently contains 503 families – why so few? Rapid, accurate computational identification of ncRNAs from genome sequence is not a trivial task –Most available information was derived in the laboratory Three approaches: –Similarity search –Transcription prediction –Comparative genome analysis

52 Lecture 8.252 Similarity Searching BLAST/FASTA primary sequence alignments don’t work very well: –4-letter alphabet (low information content) –RNA structure is more important than sequence How can we incorporate structure information into database similarity searching? –Need good secondary structure predictions –Need a good alignment scoring method that properly weights sequence and structural contributions Stochastic context-free grammars Conceptually related to HMMs, good over very long distances –Computationally intensive Will need heuristics and improved computer power

53 Lecture 8.253 Transcription Prediction Gene-finding programs for DNA sequences look for signals indicating transcription initiation, termination and processing events – could we do the same for RNA? It would be difficult: –ncRNA signals are not as strong as gene signals –ncRNAs don’t show statistically significant biases in nucleotide composition –Some ncRNAs are not transcribed at all, they are excised out of introns –Different ncRNAs are processed by different RNA polymerases

54 Lecture 8.254 Comparative Genome Analysis Most successful approach to date Compare 2+ species Identify regions that are conserved between the two species that are not involved in protein-coding Look for conserved secondary structure elements in these regions

55 Lecture 8.255 Take Home Messages RNA is a unique biomolecule and requires unique computational analysis methods ncRNAs play a number of important roles in the cell and are an area of increasing research interest ncRNA function depends primarily on 2° structure Many methods for 2° structure predictions and structures themselves are available, however few 3D RNA structures are available Many specialized and general-interest RNA databases are available over the web The identification of novel ncRNAs in the genome requires improved computational approaches


Download ppt "Lecture 8.21 Lecture 8.2: RNA Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia"

Similar presentations


Ads by Google