CSCE555 Bioinformatics Lecture 18 Protein Bioinforamtics and Protein Secondary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Proteins - Many Structures, Many Functions 1.A polypeptide is a polymer of amino acids connected to a specific sequence 2.A protein’s function depends.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Visualizing Protein Structures. Genetic information, stored in DNA, is conveyed as proteins.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Structure Prediction in 1D
An Introduction To The Backpropagation Algorithm Who gets the credit?
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Protein Tertiary Structure Prediction
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Intelligent Systems for Bioinformatics Michael J. Watts
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Chapter 9 Neural Network.
COT 6930 HPC and Bioinformatics Protein Structure Prediction Xingquan Zhu Dept. of Computer Science and Engineering.
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
NEURAL NETWORKS FOR DATA MINING
Classification / Regression Neural Networks 2
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.
Mrs. Einstein Research in Molecular Biology. Importance of proteins for cell function: Proteins are the end product of the central dogma YOU are your.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Multi-Layer Perceptron
Proteins are instrumental in about everything that an organism does. These functions include structural support, storage, transport of other substances,
THE STRUCTURE AND FUNCTION OF MACROMOLECULES Proteins - Many Structures, Many Functions 1.A polypeptide is a polymer of amino acids connected to a specific.
Non-Bayes classifiers. Linear discriminants, neural networks.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
CS621 : Artificial Intelligence
Objective 7: TSWBAT recognize and give examples of four levels of protein conformation and relate them to denaturation.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
ILO 1-Explain the chemical structure,classification, and properties of amino acids and how peptides are formed. 2-Describe the order of protein organization.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
PROTEINS Characteristics of Proteins Contain carbon, hydrogen, oxygen, nitrogen, and sulfur Serve as structural components of animals Serve as control.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
An Introduction To The Backpropagation Algorithm.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural Network - 2 Mayank Vatsa
An Introduction To The Backpropagation Algorithm
Capabilities of Threshold Neurons
Proteins.
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Presentation transcript:

CSCE555 Bioinformatics Lecture 18 Protein Bioinforamtics and Protein Secondary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering

Outline Understanding Protein Structures Protein bioinformatics: what and why? Protein Secondary Structure Prediction: problem & algorithm Summary

Proteins Large organic compounds made of amino acids functions Proteins play a crucial role in virtually all biological processes with a broad range of functions. structure The activity of an enzyme or the function of a protein is governed by the three-dimensional structure

How Proteins Are Generated folding

Protein Bioinformatics Analysis and prediction of protein structures (Structural Bioinformatics) ◦ Protein Design: design a sequence that will fold into a designated structure Assist experimental biology in assigning functions or suggesting functional hypotheses for all known proteins.

DNARNA cDNA ESTs UniGene phenotype Genomic DNA Databases Protein sequence databases protein Protein structure databases transcriptiontranslation Gene expression database Protein Bioinformatics

TOP 10 Most Wanted solutions in protein bioinformatics 1. Protein sequence alignment 2. Predicting protein features from sequence 3. Function prediction 4. Protein structure prediction 5. Membrane proteins 6. Functional site identification 7. Protein-protein interaction 8. Protein-small molecule interaction (Docking) 9. Protein design 10. Protein engineering

Why Protein Bioinformatics? Function =  interactions Disease Mechanism, Gene regulation, Drug design…

Relevance of Protein Structure in the Post-Genome Era sequence structure function medicine

Protein Structure Example Beta Sheet HelixLoop 2 chains

Proteins Structure is Hierarchical Single peptide chain Multiple peptide chains Local Folding Long-range Folding Multi-meric organization Sequence

How to Obtain Protein Structures Experimental methods (>50,000)  X-ray crystallography or NMR (Nuclear magnetic resonance) spectrometry  limitation: protein size, require crystallized proteins  Difficult to get crystallized for membrane proteins Computational methods (predictive methods)  2-D structure (secondary structure)  3-D structure (tertiary structure)  CASP competition: Critical Assessment of Techniques for Protein Structure Prediction

Protein Structure Prediction Problem Given the amino acid sequence of a protein, what’s its shape in three- dimensional space? ◦ Sequence → secondary structure → 3D structure → function

Why Prediction Needed? The functions of a protein is determined by its structure. Experimental methods to determine protein structure are time-consuming and expensive. Big gap between the available protein sequences and structures.

Growth of Protein Sequences and Structures Data from 50,000 as *X species

What determines structures: Inter-atomic Forces Covalent bond (short range, very strong) ◦ Binds atoms into molecules / macromolecules Hydrogen bond (short range, strong) ◦ Binds two polar groups (hydrogen + electronegative atom) Disulfide bond / bridge (short range, very strong) ◦ Covalent bond between sulfhydryl (sulfur + hydrogen) groups Hydrophobic / hydrophillic interaction (weak) ◦ Hydrogen bonding w/ H2O in solution Van der Waal’s interaction (very weak) ◦ Nonspecific electrostatic attractive force Electrostatic forces: ◦ oppositely charged side chains form salt bridges

Secondary Structure Predication (2D) For each residues in a protein structure, three possible states: a (a-helix), ß (ß-strand), t (others). amino acid sequence Secondary structure sequence Currently the accuracy of secondary structure methods is nearly 80-82% (2006). Theoretical uplimit is 90% due to uncertainty 10% in real proteins Secondary structure prediction can provide useful information to improve other sequence and structure analysis methods, such as sequence alignment and 3-D modeling.

PSSP: Protein Secondary Structure Prediction Three Generations Based on statistical information of single amino acids Based on local amino acid interaction (segments). Typically a segment containes aminoacids Based on evolutionary information of the homology sequences

Formulate PSSP as a machine learning classification problem Using a sliding window to move along the amino acid sequence ◦ Each window denotes an instance ◦ Each amino acid inside the window denotes an attribute ◦ The known secondary structure of the central amino acid is the class label

How to generalize protein secondary prediction as a machine learning problem? A set of “examples” are generated from sequence with known secondary structures Examples form a training set Build a neural network classifier Apply the classifier to a sequence with unknown secondary structure

Introduction to Neural Network What is an Artificial Neural Network? ◦ An extremely simplified model of the brain  Essentially a function approximator  Transforms inputs into outputs to the best of its ability

How do Neural Network Work? A neuron (perceptron) is a single layer NN The output of a neuron is a function of the weighted sum of the inputs plus a bias

Activation Function Binary active function ◦ f(x)=1 if x>=0 ◦ f(x)=0 otherwise The most common sigmoid function used is the logistic function ◦ f(x) = 1/(1 + e -x )

Multi-Layer Feedforward NN Example XOR problem (nonlinear classification capable)

Where Do The Weights Come From? The weights in a neural network are the most important factor in determining its function Training is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function (class labels) ◦ Supervised Training  Supplies the neural network with inputs and the desired outputs  Response of the network to the inputs is measured  The weights are modified to reduce the difference between the actual and desired outputs

Training in Perceptron Neural Net Training a perceptron: Find the weights W that minimizes the error function: P: number of training data X i : training vectors F(W.X i ): output of the perceptron t(X i ) : target value for X i Use steepest descent: - compute gradient: - update weight vector: - iterate (e: learning rate)

Back-propagation algorithm For Mult-layer NN, the errors of hidden layers are not known Searches for weight values that minimize the total error of the network over the set of training examples ◦ Forward pass: Compute the outputs of all units in the network, and the error of the output layers. ◦ Backward pass: The network error is backpropogated for updating the weights (credit assignment problem).

9/12/2015 Copyright G. A. Tagliarini, PhD 28 Feedforward Network Training by Backpropagation: Process Summary Select an architecture Randomly initialize weights While error is too large ◦ Select training pattern and feedforward to find actual network output ◦ Calculate errors and backpropagate error signals ◦ Adjust weights Evaluate performance using the test set

NN for Protein Secondary Structure Prediction 0

How to Encode Each Amino Acid? 20 bit binary sequence A R N … V

Evaluation of Performance: Accuracy(Q3) ALHEASGPSVILFGSDVTVPPASNAEQAK hhhhhooooeeeeoooeeeooooohhhhh ohhhooooeeeeoooooeeeooohhhhhh Amino acid sequence Actual Secondary Structure Q3=22/29=76% Q3 for random prediction is 33% Secondary structure assignment in real proteins is uncertain to about 10%; Therefore, a “perfect” prediction would have Q3=90%.

Performances(CASP) CASPYEAR # of Targets Group CASP % Rost and Sander CASP % Rost CASP %Jones CASP %Jones

Summary Protein bioinformatics is a very important area with many interesting problems Computational methods can have big impact in medicine and molecular biology Secondary protein structure prediction algorithms are very strong

Slides Acknowledgements Jinbo Xu University of Waterloo Xingquan Zhu

Why predict structure: Can Label Proteins by Dominant Structure Protein classification, Structural Blasting

Amino Acids Side chain Each amino acid is identified by its side chain, which determines the properties of this amino acid.

Side Chain Properties The amino acids names are colored according to their type: positively charged, negatively charged, polar but not charged, aliphatic (nonpolar), and aromatic. Amino acids that are essential to mammals are marked with an asterisk (*). hydrophobicV, L, I, M, F HydrophilicN, E, Q, H, K, R, D In-betweenG, A, S, T, Y, W, C, P Positively chargedR, H, L Negatively chargedD, E Polar but not chargedN, Q, S, T nonpolarA, G, I, L, M, P, V AromaticF, W, Y Hydrophobic amino acids stay inside of a protein, while Hydrophilic ones tend to stay in the exterior of a protein. Oppositely charged amino acids can form salt bridge. Polar amino acids can participate hydrogen bonding

Alpha Helix Examples

Beta Sheet Examples Parallel beta sheet Anti-parallel beta sheet

9/12/2015 Copyright G. A. Tagliarini, PhD 40 Calculate Outputs For Each Neuron Based On The Pattern The output from neuron j for pattern p is O pj where and k ranges over the input indices and W jk is the weight on the connection from input k to neuron j FeedforwardInputs Outputs

9/12/2015 Copyright G. A. Tagliarini, PhD 41 Calculate The Error Signal For Each Output Neuron The output neuron error signal  pj is given by  pj =(T pj -O pj ) O pj (1-O pj ) T pj is the target value of output neuron j for pattern p O pj is the actual output value of output neuron j for pattern p

9/12/2015 Copyright G. A. Tagliarini, PhD 42 Calculate The Error Signal For Each Hidden Neuron The hidden neuron error signal  pj is given by where  pk is the error signal of a post- synaptic neuron k and W kj is the weight of the connection from hidden neuron j to the post-synaptic neuron k