Lecture 11. RNA Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
Tutorial 4: Biopolymers, Protein, DNA and RNA. Question1: [Transcription and Translation] What is the main difference between transcription and translation?
Advertisements

6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Simultaneous transcription and translation in prokaryotes Green arrow = E. coli DNA Red arrow = mRNA combined with ribosomes.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Chapter 3 Macromolecules.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
By hj nabil Hj muhd muadz Vincent ong. Proteins are biological polymers composed of amino acids. Amino acids, linked together by peptide bonds, form a.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
FROM DNA TO PROTEINS CHAPTER 7 AND PAGES Molecular Genetics.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Atoms, Molecules, and Chemistry
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Insulin: Weight = 5733, 51 amino acids Glutamine Synthetase: Weight = 600,000, 468 amino acids.
Intelligent Systems for Bioinformatics Michael J. Watts
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Chapter 3 Protein Structure and Function. Key Concepts Most cell functions depend on proteins. Amino acids are the building blocks of proteins. Amino.
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Translation Protein Biosynthesis. Central Dogma DNA RNA protein transcription translation.
PROTEINS Nicky Mulder Acknowledgements: Anna Kramvis for lecture material (adapted here)
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Molecular Biology I-II The central dogma of molecular biology Nucleotide chemistry DNA, RNA and Chromosome Structure DNA Replication Gene Expression Transcription.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
Genetics 314 – Spring, 2009 Lecture 7 Reading – Chapter 13 First Exam – Friday, February 6 th, 2009 Review Session – Wednesday, Feb. 4th.
Spliceosome attachs to hnRNA and begins to snip out non-coding introns mRNA strand composed of exons is free to leave the nucleus.
AP Biology Control of Eukaryotic Genes.
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
Cell Protein Production. Transcription : process of mRNA formation. 1. Triggered by chem. messengers from cytoplasm which bind to DNA 2. This causes release.
Central dogma: the story of life RNA DNA Protein.
Copyright © John Wiley and Sons, Inc. All rights reserved. Cells and Tissues Chapter 3 Visualizing A & P.
PROTEINS The final product of the DNA blueprint Hemoglobin.
Copyright © 2009 Pearson Education, Inc. PowerPoint ® Lecture Presentation for Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter.
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
This seems highly unlikely.
Intended Learning Objectives You should be able to… 1. Give 3 examples of proteins that are important to humans and are currently produced by transgenic.
Lecture 4: Transcription in Prokaryotes Chapter 6.
Motif Search and RNA Structure Prediction Lesson 9.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics.
Protein Synthesis RNA, Transcription, and Translation.
From Gene to Protein Transcription and Translation.
Translation. RNA  Polypeptide Translation is the process of making polypeptides from mRNA Occurs in a 5’  3’ direction.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
CHM 708: MEDICINAL CHEMISTRY
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
From Gene to Protein pp Discover Biology: C15 From Gene to Protein pp
© SSER Ltd..
Organic Compounds: Proteins
3.11 Proteins are essential to the structures and activities of life
7.3 Translation udent_view0/chapter3/animation__how_translation_work s.html.
Protein Structure and Function
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Protein Structure and Examples
Translation 2.7 & 7.3.
3 Carbon and the Molecular Diversity of Life.
Dynamic Programming (cont’d)
Chapter 3 Part 2 Lecture Outline See PowerPoint Image Slides
Protein Structure and Examples
Presentation transcript:

Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics

Lecture outline From sequences to functions RNA secondary structures Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

From Sequences to Functions Part 1 From Sequences to Functions

From sequences to functions One of the biggest questions in biology: Can one tell the function of a molecule (DNA/RNA/protein) from its sequence alone? Sometimes, but usually not Easier if we also know the structure Common believe: sequence  structure  function Of course, also depends on the environment Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Molecular structures Four levels: Primary structures The sequence Secondary structures First formed Local Tertiary structures Global “Folds”, “domains” Quaternary structures Multiple molecules Image credit: http://www.personal.psu.edu/jms5704/blogs/simmons/levels_of_protein_s_c_la_784.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Primary structures Connections (strong covalent bonds vs. weak hydrogen bonds) Which molecules are connected Which atoms are connected First-level constraints of the possible structures Example: Molecules close in primary structure must also be close in secondary, tertiary and quaternary structures Image credit: Wikibooks Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Primary structures Orientation: DNA, RNA: 5’-3’ Amino acids: Amino (N) terminus to carboxyl (C) terminus “Residue”: what remains after a water molecule is expelled Image credit: http://bealbio.wikispaces.com/file/view/dsDNA.jpg, http://attentionmanagement.ca/userfiles/image/DNA-RNA%20directions.gif, http://www.phschool.com/science/biology_place/biocoach/images/translation/peptbond.gif, http://www.cystinuria.org/resources/education/aminoacids/peptide.gif Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA secondary structures Double helix A-DNA (dehydrated samples) Right-handed 11bp per turn Most common: B-DNA 10.5bp per turn Z-DNA (some methylated DNA) Left-handed 12bp per turn Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA secondary structures A-DNA B-DNA Z-DNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA secondary structures Largely possible to be projected onto a 2D plane Stem/hairpin loop Stacking pairs Bulge Internal loop Multi-loop Exterior loop Dangling nucleotides Less stable pair Coaxial stacking Image credit: http://www.clcbio.com/scienceimages/rna_prediction/RNA_structure_prediction_web.png Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA secondary structures Pseudoknots: complex structures Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4):630-640, (2008) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Protein secondary structures Three main types: -helixes -sheets Coils (connectors) Image credit: http://calcium.uhnres.utoronto.ca/cadherin/images/pub_pages/general/ribbon.jpg, http://www.mun.ca/biology/scarr/MGA2-03-25.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA tertiary structures Wrapped around nucleosomes formed by histone proteins Condensed form at beginning of mitosis and meiosis Image credit: http://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg, Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA tertiary structures Overall structure of an RNA More studied for RNAs that do not translate into proteins -- “non-coding” RNAs Example: tRNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Protein tertiary structures Complex structures Mainly caused by weak forces (hydrogen bonds and hydrophobic interactions) Occasionally stronger forces (disulfide bonds between cysteines) The CATH hierarchy Class: composition of secondary structures Architecture: overall shape Topology: connection of secondary structures Homologous: with common ancestor Image credit: CATH Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Quaternary structures Types: Protein subunit-protein sub-unit Protein-protein Protein-DNA Protein-RNA (Protein-small molecules) RNA-RNA ... Protein-subunit interaction (Hemoglobin) Protein-DNA interaction Image credit: Wikipedia, http://serrano.crg.es/images/protein_dna1.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Structure and function Why function depends on structure? Structure itself is the function (e.g., tubulins) Binding Complementarity of interacting structures Formation of special bonds Image credit: http://www.nigms.nih.gov/NR/rdonlyres/54BEAC37-47A9-454A-BC4F-B94EA127FA1E/0/fig1a_large.jpg, http://upload.wikimedia.org/wikimedia/en-labs/7/7f/Protein_Protein_Docking.JPG Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Structure and function Why function depends on structure? (cont’d) Functional group (e.g., catalytic site) Determining localization (e.g., transporter membrane proteins) Image credit: http://www.catalysis-ed.org.uk/principles/images/enzyme_substrate.gif, Spudich , Science 288(5470):1358-1359, 2000 Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA Secondary Structures Part 2 RNA Secondary Structures

Important RNA classes Coding: Non-coding: Messenger RNAs (mRNAs) For translating into proteins Non-coding: Ribosomal RNAs (rRNAs) Parts of the ribosome complex Transfer RNAs (tRNAs) Delivering free amino acids during translation Micro RNAs (miRNAs) Binding mRNA targets to promote RNA degradation or repress translation Small nucleolar RNAs (snoRNAs) Guiding chemical modifications of other RNAs Small nuclear RNAs (snRNAs) Involved in mRNA splicing Long non-coding RNAs (lncRNAs) Some involved in gene regulation ... Image source: http://legacy.hopkinsville.kctcs.edu/sitecore/instructors/Jason-Arnold/VLI/Module%201/m1DNAfunction/m1DNAfunction3.html Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Importance of RNA structures Structure is important to many classes of RNA Examples: tRNA snoRNA Image sources: http://www.bio.miami.edu/dana/pix/tRNA.jpg, http://lowelab.ucsc.edu/images/CDBox.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Representing RNA secondary structures Formats: (see http://projects.binf.ku.dk/pgardner/bralibase/RNAformats.html): Dot-bracket format Stockholm format ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Dot-bracket format Sequence (nucleotides 10, 20, 30, etc. marked in red): GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG Structure: ......((((.......((((((.(((....((((((.((((..........)))).)))))).))).)))))).((((((.....)))))).))))..... Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Predicting RNA secondary structures A basic assumption in structure predictions: Real structure has the lowest free energy In a simplified view, more stable bonds  lower free energy In the case of RNA secondary structures: Good to form more pairs A-U C-G Sometimes G-U (a “wobble base pair”) Good to form more stable pairs C-G > A-U > G-U Good to have stable sub-structures E.g., stacking pairs Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Predicting RNA secondary structures We will assume there are no pseudoknots With pseudoknots, currently there is no known algorithm that can find the optimal solution efficiently We need two things: A thermodynamic model for computing the free energy of a structure A method for finding the structure with the minimum free energy This setting sounds familiar? A pseudoknot Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Further assumptions The free energy of a secondary structure is the sum of the free energies of the sub-structures Not the sum of individual bases/base pairs, as one base pair can participate in multiple sub-structures The free energies of the sub-structures are independent Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Problem definition Given an RNA sequence, find a set of base pairs so that each base is paired at most once Example: Input sequence: GUGAAUGAUGAAUUU...ACG Output set of base pairs: (7, 97) (8, 96) ... (18, 74) (81, 87) Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Linear view Last update: 21-Nov-2015 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Thermodynamics model We will consider four types of sub-structures here: Stacking pairs: both (i, j) and (i+1, j-1) are in the set Hairpin loop: there is a pair (i, j), where all bases from i+1 to j-1 are not paired Bulge/Internal loop: there are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired One base pair can participate in multiple structures Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Stacking pairs Both (i, j) and (i+1, j-1) are in the set E.g., i:20, j:72 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i+1 j-1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Hairpin loop There is a pair (i, j), where all bases from i+1 to j-1 are not paired E.g., i: 81, j: 87 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i j Image source: http://img.ehowcdn.com/article-new/ds-photo/getty/article/151/226/87820768_XS.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Bulge/Internal loop Internal loop: There are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Called a bulge if only one side has unpaired bases E.g., i:23, j:69, i1:25, j1:67 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Multi-loop Multi-loop: There are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 i2 j2 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

One possible thermodynamic model Unpaired bases have 0 free energy and all the terms below have negative free energy eS(i, j): for the stacking pairs (i, j) and (i+1, j-1) eH(i, j): for the hairpin loop closed at (i, j) eBI(i, j, i1, j1): for a bulge or internal loop enclosed by the pairs (i, j) and (i1, j1) eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Finding the optimal structure Dynamic programming Let s be the RNA sequence with n nucleotides Tables: V(j): free energy of the optimal structure for s[1..j] Final answer is based on V(n) VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas V(j): free energy of the optimal structure for s[1..j] For j > 1, 1 ... j ... j is unpaired 1 ... j-1 j ... j pairs with i 1 ... i-1 i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair We require that i < j ... i ... j ... Stacking pairs ... i i+1 ... j-1 j ... Hairpin loop ... i ... j ... All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1) ... i ... j ... Budge or internal loop ... i ... i1 ... j1 ... j ... All unpaired All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop ... i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Time and space requirements V: n entries, each takes O(n) time VP(i, j): O(n2) entries, each takes constant time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Time and space requirements VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Time and space requirements Summary: V: n entries, each takes O(n) time VP: O(n2) entries, each takes constant time VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Total: O(n2) space, O(n2k+2) time Exponential if k is unbounded Some approximations could bring the time down to O(n4) – still huge for large n, but feasible for small or median n Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Some remarks If we allow general pseudoknots, there is currently no efficient way to find the optimal RNA secondary structure with the minimum free energy Other methods to predict RNA secondary structures: Conservation and covariation High conservation: 2 and 4 Strong covariation: 1 and 5 Experimental methods (e.g., RNA footprinting) 12345 ACGGU ACUGU CCAGG UCCGA Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Representing pseudoknots Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs) What if there are pseudoknots? Need more types of brackets 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... GAAGUACAAUAUGUAACCG .{.((((.....))})).. Image source: http://ultrastudio.org/upload/RNAPseudoKnot-25005810.jpg Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Case Study, Summary and Further Readings Epilogue Case Study, Summary and Further Readings

Case study: Drug finding/design Drugs are mostly chemicals with a specific structure that interacts with some biological objects Examples: Inhibiting the activities of an important protein of bacteria Blocking the interaction between virus and receptors of host cell Simulating the production of a hormone Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Case study: Drug finding/design Suppose we want to identify/design a chemical to target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking Image source: http://vds.cm.utexas.edu/ Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Case study: Drug finding/design Computational problem: Input: a target protein and a list of chemicals Goal: find a chemical that binds the target well Try different locations and orientations Binding depends on structure and chemistry Output: One or more chemicals that bind the target well Difficulties: Computational complexity Large search space for each protein-chemical combination Need to try many chemicals Need to ensure specificity (not to target other proteins and cause side effects) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Case study: Drug finding/design There is a game for players to try folding proteins called FoldIt (http://fold.it/) Score based on free energy Real time update of scores and ranks Players can discuss and share solutions Resulted in some amazingly good folds as compared to automatic predictions by computer programs Image source: http://fold.it/portal/site_files/theme/science/competition.png Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Summary Functions depend on structures Different levels of structures: Primary (sequence) Secondary (local) Tertiary (global) Quaternary (interactions) RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model Important sub-structures Stacking pairs Hairpin loops Internal loops/bulges Multi-loops Pseoduknots Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Further readings Chapter 11 of Algorithms in Bioinformatics: A Practical Introduction Speed up of algorithm Algorithm for RNA structure perdition with pseudoknots Free slides available Parts VII and VIII of Fundamental Concepts of Bioinformatics Protein folding and protein structure prediction Docking Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015