Applied Bioinformatics Week 11. Topics Protein Secondary Structure RNA Secondary Structure.

Slides:

Advertisements

Similar presentations

RNA Secondary Structure Prediction

Advertisements

Secondary structure prediction from amino acid sequence.

Hidden Markov Model in Biological Sequence Analysis – Part 2

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Protein Structure Prediction

Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.

The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

1 September, 2004 Chapter 5 Macromolecular Structure.

Protein Secondary Structures

Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.

Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.

Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.

RNA Secondary Structure Prediction

Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.

Structure Prediction in 1D

Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.

RNA: Secondary Structure Prediction and Analysis

Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)

Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Protein Sequence Alignment and Database Searching.

Lecture 10: Protein structure

Proteins Secondary Structure Predictions Structural Bioinformatics.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Intelligent Systems for Bioinformatics Michael J. Watts

Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.

ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.

Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?

Secondary structure prediction

2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”

© Wiley Publishing All Rights Reserved. RNA Analysis.

Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.

Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.

HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.

Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.

Protein Secondary Structure Prediction G P S Raghava.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.

Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.

Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.

Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

Motif Search and RNA Structure Prediction Lesson 9.

Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.

Protein backbone Biochemical view:

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?

Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?

Proteins Structure Predictions Structural Bioinformatics.

Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.

RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.

Structural organization of proteins

Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.

Mir Ishruna Muniyat. Primary structure (Amino acid sequence) ↓ Secondary structure （ α -helix, β -sheet ） ↓ Tertiary structure （ Three-dimensional.

RNA Secondary Structure Prediction

Introduction to Bioinformatics II

Fundamentals of Organic Chemistry

Yuchun Tang (1), Preeti Singh (1), Yanqing Zhang (1),

RNA Secondary Structure Prediction

Levels of Protein Structure

Protein structure prediction.

Fundamentals of Organic Chemistry

Presentation transcript:

Applied Bioinformatics Week 11

Topics Protein Secondary Structure RNA Secondary Structure

Theory I

Recall Domains Functional region of a protein sequence Proteins may have several domains Generally identified by MSA

Domains Convey function Function derives from 3D structure How to determine 3D structure of proteins? First step secondary structure

Four levels of protein structure

Structure

Secondary Structure Local three dimensional structure Elements –Helix –Sheet –Coil G = 3-turn helix (310 helix). Min length 3 residues. H = 4-turn helix (α helix). Min length 4 residues. I = 5-turn helix (π helix). Min length 5 residues. T = hydrogen bonded turn (3, 4 or 5 turn) E = extended strand in parallel and/or anti-parallel β-sheet conformation. Min length 2 residues. B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation) S = bend (the only non-hydrogen-bond based assignment)

Secondary Structure 8 different categories (DSSP): H:  - helix G: 3 10 – helix I:  - helix (extremely rare) E:  - strand B:  - bridge T:  - turn S: bend L: the rest

Protein Secondary Structure [3] Alpha Helix- Structure repeats itself evry5.4 Angstroms along the helix axis Every main chain CO and NH group is hydrogen bonded to a peptide bond 4 residues away Beta Sheet – Two or more polypeptide chains run alongside each other and are linked by hydrogen bonds Yuchun Tang, Preeti Singh, Yanqing Zhang, Chung-Dar Lu and Irene Weber, Georgia State University

Simplification 20 amino acids groups of amino acids –Amino acids with similar chemical properties –Depends on the study 3 secondary structures

Secondary Structure Preditiction Sheet/ helix forming tendency of amino acids –Up to 60% accurate MSA -> neighborhood exploitation –Words of several aa are formed –Hydrophobicity is included –Up to 80% accurate

Propensities

Generation of Prediction Methods 1st generation : single residue statistics –Base on single amino acid propensity 2nd generation : segment statistics –Propensity for segments of 3-51 adjacent residues 3rd generation : evolution to better predictions –The use of evolutionary information (evolutionary profile)

Assignment to Structure Sliding window of 7 amino acids –Why 7? Middle amino acid is assigned average propensity –Helix, Sheet Long stretches of similar assignments About 2 turns (3.6 per turn)

Example: Window Consider a secondary structure (x, e) and the window of length 5 with the special position in the middle (bold letters) Fist position of the window is: x = A R N S T V V S T A A... e = ? ? H H C C C E E E.... Window returns instance: A R N S T  H

Example: Window Second position of the window is: x = A R N S T V V S T A A... e = ? ? H H C C C E E E.... Windows returns instance: R N S T V  H Next instances are: N S T V V  C S T V V S  C T V V S T  C

Practical Secondary Structure Prediction Can aid in MSA –If structures are not more similar than the aligned sequences; there is a problem Step towards three dimensional structure Clue about architecture –28 regular protein architectures

PSIPRED Example

Secondary structure prediction methods PSI-predPSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick) JPREDJPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI) DSCDSC King & Sternberg PREDATORPREDATORFrischman & Argos (EMBL) PHD home pagePHD home page Rost & Sander, EMBL, Germany ZPRED serverZPRED server Zvelebil et al., Ludwig, U.K. nnPredict nnPredict Cohen et al., UCSF, USA. BMERC PSA ServerBMERC PSA Server Boston University, USA SSP (Nearest-neighbor)SSP (Nearest-neighbor) Solovyev and Salamov, Baylor College, USA. Andrew CR Martin, UCL

Consensus prediction method hydropho bic highly conservedb= buried, e = exposed Andrew CR Martin, UCL

Consensus prediction method -JPRED hydropho bic highly conservedb= buried, e = exposed amphipathi c hydrophob ic Andrew CR Martin, UCL

Neural network prediction - PHD Multiple alignment of protein family SS profile for window of adjacent residues Andrew CR Martin, UCL

Hidden Markov Models-HMMSTR amino acid secondary structure element structural context Markov state Recurrent local features of protein sequences Accuracy of 74% Bystroff et al., 2000 Andrew CR Martin, UCL

Consensus/ Meta Prediction Method Uses more than one existing method Learns how to combine the results Produces a result which is on average better than the single methods E.g.:

Prediction Accuracy Assessment Protein Structure Prediction Center – CASP –Critical Assessment of protein Structure Prediction

Hydrophobicity

Assignment to Structure Sliding window of 5-7 or amino acids –Why? Otherwise same idea as for secondary structure forming propensities

End Theory I Mindmapping 10 min break

Practice I

Sec Struct Prediction bin/npsa_automat.pl?page=/NPSA/npsa_phd.html bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html

In class assignment Choose a protein sequence –Not too short! Perform secondary structure predictions with as many tools as possible –Google at least one more than given in the slides Retrieve and rewrite the predictions such that they use the 3 letter code (H,C,S; Helix, Coil, Sheet) –Use search and replace functionality of your word processor Make an MSA with the predicted secondary structures to compare the results –Are there gaps? –Are they within the transition from one secondary structure to the next?

Try to predict TMDs Find a protein with TMDs Expasy will provide you with prediction methods –DAS - Prediction of transmembrane regions in prokaryotes using the Dense Alignment Surface method (Stockholm University)DAS –HMMTOP - Prediction of transmembrane helices and topology of proteins (Hungarian Academy of Sciences)HMMTOP –PredictProtein - Prediction of transmembrane helix location and topology (Columbia University)PredictProtein –SOSUI - Prediction of transmembrane regions (Nagoya University, Japan)SOSUI –TMHMM - Prediction of transmembrane helices in proteins (CBS; Denmark)TMHMM –TMpred - Prediction of transmembrane regions and protein orientation (EMBnet- CH)TMpred –TopPred - Topology prediction of membrane proteins (France)TopPred

End Practice I

Theory II

RNA Coding RNA –Results in protein Non Coding RNA –Structural –Regulational –Catalytic –…

RNA Basics transfer RNA (tRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) small interfering RNA (siRNA) micro RNA (miRNA) small nucleolar RNA (snoRNA)

RNA Secondary Structure Just like amino acids interact to form a secondary structure, nucleotides do the same Here base pairing is the driving motor Generally the structure of RNA molecules is projected onto 2 dimensions

Chemical Structure of RNA Four base types. Distinguishable ends.

Partial Tertiary Structure One illustration

Yet Another Tertiary Structure Found via google

Our Final Tertiary Picture Very complex

A Partial RNA Secondary Structure

Pure Secondary Structure

RNA Folding Single stranded RNA –Unstable –Base pairs with complementary sequences –Base pair stacking –Favorable loop sizes Highest Stability –Lowest energy model Folding process –Not known in detail –Extremely fast

RNA Secondary Structure Prediction Dynamic Programming Approaches Sarah Aerni

Outline RNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction

RNA Secondary Structure Hairpin loop Junction (Multiloop) Bulge Loop Single-Stranded Interior Loop Stem Image– Wuchty Pseudoknot

Sequence Alignment as a method to determine structure Bases pair in order to form backbones and determine the secondary structure Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure

Base Pair Maximization – Dynamic Programming Algorithm Simple Example: Maximizing Base Pairing Base pair at i and j Unmatched at iUmatched at jBifurcation Images – Sean Eddy S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs

Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Images – Sean Eddy Bases cannot pair, similar to unmatched alignment S(i, j – 1) Bases can pair, similar to matched alignment S(i + 1, j) Dynamic Programming – possible paths S(i + 1, j – 1) +1

Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Images – Sean Eddy Reminder: For all k S(i,k) + S(k + 1, j) k = 0 : Bifurcation max in this case S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) Bases cannot pair, similar Bases can pair, similar to matched alignment Dynamic Programming – possible paths Bifurcation – add values for all k

Base Pair Maximization - Drawbacks Base pair maximization will not necessarily lead to the most stable structure May create structure with many interior loops or hairpins which are energetically unfavorable Comparable to aligning sequences with scattered matches – not biologically reasonable

Energy Minimization Thermodynamic Stability Estimated using experimental techniques Theory : Most Stable is the Most likely No Pseudknots due to algorithm limitations Uses Dynamic Programming alignment technique Attempts to maximize the score taking into account thermodynamics MFOLD and ViennaRNA

Energy Minimization Results Linear RNA strand folded back on itself to create secondary structure Circularized representation uses this requirement Arcs represent base pairing Images – David Mount All loops must have at least 3 bases in them Equivalent to having 3 base pairs between all arcs Exception: Location where the beginning and end of RNA come together in circularized representation

Trouble with Pseudoknots Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations Images – David Mount

Energy Minimization Drawbacks Compute only one optimal structure Usual drawbacks of purely mathematical approaches Similar difficulties in other algorithms Protein structure Exon finding

Alternative Algorithms - Covariaton Incorporates Similarity-based method Evolution maintains sequences that are important Change in sequence coincides to maintain structure through base pairs (Covariance) Cross-species structure conservation example – tRNA Manual and automated approaches have been used to identify covarying base pairs Models for structure based on results Ordered Tree Model Stochastic Context Free Grammar Expect areas of base pairing in tRNA to be covarying between various species Base pairing creates same stable tRNA structure in organisms Mutation in one base yields pairing impossible and breaks down structure Covariation ensures ability to base pair is maintained and RNA structure is conserved

Binary Tree Representation of RNA Secondary Structure Representation of RNA structure using Binary tree Nodes represent Base pair if two bases are shown Loop if base and “gap” (dash) are shown Pseudoknots still not represented Tree does not permit varying sequences Mismatches Insertions & Deletions Images – Eddy et al.

Covariance Model HMM which permits flexible alignment to an RNA structure – emission and transition probabilities Model trees based on finite number of states Match states – sequence conforms to the model: MATP – State in which bases are paired in the model and sequence MATL & MATR – State in which either right or left bulges in the sequence and the model Deletion – State in which there is deletion in the sequence when compared to the model Insertion – State in which there is an insertion relative to model Transitions have probabilities Varying probability – Enter insertion, remain in current state, etc Bifurcation – no probability, describes path

Covariance Model (CM) Training Algorithm S(i,j) = Score at indices i and j in RNA when aligned to the Covariance Model Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol. Frequencies obtained by aligning model to “training data” – consists of sample sequences Reflect values which optimize alignment of sequences to model Frequency of seeing the symbols (A, C, G, T) together in locations i and j depending on symbol.

Alignment to CM Algorithm Calculate the probability score of aligning RNA to CM Three dimensional matrix – O(n³) Align sequence to given subtrees in CM For each subsequence calculate all possible states Subtrees evolve from Bifurcations For simplicity Left singlet is default Images – Eddy et al.

For each calculation take into account the Transition (T) to next state Emission probability (P) in the state as determined by training data Bifurcation – does not have a probability associated with the state Deletion – does not have an emission probability (P) associated with it Images – Eddy et al. Alignment to CM Algorithm

Covariance Model Drawbacks Needs to be well trained Not suitable for searches of large RNA Structural complexity of large RNA cannot be modeled Runtime Memory requirements

End Theory II Mindmapping 10 min break

Practice II

RNA Secondary Structure Online Download RNAShapes RNAFold Get RNAs –