COT 6930 HPC and Bioinformatics Protein Structure Prediction Xingquan Zhu Dept. of Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks Marco Loog.
Protein Tertiary Structure Prediction. Protein Structure Prediction & Alignment Protein structure Secondary structure Tertiary structure Structure prediction.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
CS Instance Based Learning1 Instance Based Learning.
Protein Structures.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Protein Structural Prediction. Protein Structure is Hierarchical.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Protein Tertiary Structure Prediction
CSCE555 Bioinformatics Lecture 18 Protein Bioinforamtics and Protein Secondary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr.
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Chapter 9 Neural Network.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Non-Bayes classifiers. Linear discriminants, neural networks.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Motif Search and RNA Structure Prediction Lesson 9.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Protein Structure Prediction and Protein Homology modeling
Protein Structures.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

COT 6930 HPC and Bioinformatics Protein Structure Prediction Xingquan Zhu Dept. of Computer Science and Engineering

DNARNA cDNA ESTs UniGene phenotype Genomic DNA Databases Protein sequence databases protein Protein structure databases transcriptiontranslation Gene expression database

Outline Protein Structure Why structure How to predict protein structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for protein secondary structure prediction Tertiary structure prediction (3D) Ab initio Homology modeling

Proteins Proteins play a crucial role in virtually all biological processes with a broad range of functions. The activity of an enzyme or the function of a protein is governed by the three-dimensional structure

Protein Structure is Hierarchical Protein Structure Video m/watch?v=lijQ3a8yU YQ

Primary Structure: Sequence The primary structure of a protein is the amino acid sequence

Protein Structure Prediction Problem Protein structure prediction Predict protein 3D structure from (amino acid) sequence One step closer to useful biological knowledge Sequence → secondary structure → 3D structure → function

Outline Protein Structure Why structure How to Predict Protein Structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for Protein Secondary Structure Prediction Tertiary structure prediction (3D) Ab initio Homology modeling

Why Predict Structure? Structure determines function Molecular function Structure is more conserved than sequence Goals: 1.Predict structure from sequence 2.Predict function based on structure 3.Predict function based on sequence

Why predict structure: Structure is more conserved than sequence 28% sequence identity

Why predict structure: Can Label Proteins by Dominant Structure SCOP: Structural Classification Of Proteins

Why predict structure: Large number proteins vs. relative smaller number folds Small number of unique folds found in practice 90% proteins < 1000 folds, estimated ~4000 total folds As of 02/05/200848,878 structures

Examples of Fold Classes

How to Predict Protein Structure A related biological question: what are the factors that determine a structure? Energy Kinematics How can we determine structure? Experimental methods X-ray crystallography or NMR (Nuclear magnetic resonance) spectrometry  limitation: protein size, require crystallized proteins Computational methods (predictive methods) 2-D structure (secondary structure) 3-D structure (tertiary structure)

Geometry of Protein Structure rotatable

Inter-atomic Forces Covalent bond (short range, very strong) Binds atoms into molecules / macromolecules Hydrogen bond (short range, strong) Binds two polar groups (hydrogen + electronegative atom) Disulfide bond / bridge (short range, very strong) Covalent bond between sulfhydryl (sulfur + hydrogen) groups Hydrophobic / hydrophillic interaction (weak) Hydrogen bonding w/ H2O in solution Van der Waal’s interaction (very weak) Nonspecific electrostatic attractive force

Types of Inter-atomic Forces

Quick Overview of Energy Strength (kcal/mole) Bond 3-7H-bonds 10Ionic bonds 1-2 Hydrophobic interactions 1 Van der vaals interactions 51Disulfide bridge

Protein Folding Animation

Two Related Problems in Structure Prediction Directly predicting protein structure from the amino acid sequence has proved elusive Two sub-problems Secondary Structure Prediction Tertiary Structure Prediction

Secondary Structure Predication (2D) For each residues in a protein structure, three possible states: a (a-helix), ß (ß-strand), t (others). amino acid sequence Secondary structure sequence Currently the accuracy of secondary structure methods is nearly 80% (2000). Secondary structure prediction can provide useful information to improve other sequence and structure analysis methods, such as sequence alignment and 3-D modeling.

Outline Protein Structure Why structure How to Predict Protein Structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for Protein Secondary Structure Prediction Tertiary structure prediction (3D) Ab initio Homology modeling

PSSP: Protein Secondary Structure Prediction Three Generations Based on statistical information of single amino acids Based on local amino acid interaction (segments). Typically a segment containes aminoacids Based on evolutionary information of the homology sequences

Secondary Structure preferences for Amino Acids The normalized frequencies for each conformation were calculated from the fraction of residues of each amino acid that occurred in that conformation, divided by this fraction for all residues. Random occurrence of a particular amino in a conformation would give a value of unity. A value greater than unity indicates a preference for a particular type of secondary structure.

Outline Protein Structure Why structure How to Predict Protein Structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for Protein Secondary Structure Prediction Tertiary structure prediction (3D) Ab initio Homology modeling

Machine learning methods for Protein Secondary Structure Prediction Introduction to classification Generalize protein secondary structure prediction as a machine learning problem Introduction to Neural Network

Classification and Classifiers Given a data base table DB with a set of attribute values and a special atribute C, called a class label. Example: A1 A2A3 A4 C 11mgTumor 01vgNormal 10mb

Classification and Classifiers An algorithm is called a classification algorithm if it uses the data to build a set of patterns Decision rules or decision trees, etc. Those patters are structured in such a way that we can use them to classify unknown sets of objects- unknown records. For that reason (because of the goal) the classification algorithm is often called shortly a classifier. Classifier Example

Classification and Classifiers Building a classifier consists of two phases: Training and testing. In both phases we use data (training data set and disjoint test data set) for which the class labels are known for ALL of the records. The training data set to create patterns (rules, trees, or to train a Neural network). Evaluate created patterns with the use of of test data, which classification is known. The measure for a trained classifier accuracy is called predictive accuracy.

Predictive Accuracy Evaluation The main methods of predictive accuracy evaluations are: Re-substitution (N ; N) Holdout (2N/3 ; N/3) x-fold cross-validation (N-N/x ; N/x) Leave-one-out (N-1 ; 1), where N is the number of instances in the dataset The process of building and evaluating a classifier is also called a supervised learning, or lately when dealing with large data bases a classification method in Data Mining

Classification Models: Different Classifiers Typical classification models Decision Trees (ID3, C4.5) Nearest Neighbors Support Vector Machines Neural Networks Most of the best classifiers for PSSP are based on Neural Network model Demonstration

Machine learning methods for Protein Secondary Structure Prediction Introduction to classification Generalize protein secondary structure prediction as a machine learning problem Introduction to Neural Network

How to generalize protein secondary prediction as a machine learning problem? Using a sliding window to move along the amino acid sequence Each window denotes an instance Each amino acid inside the window denotes an attribute The known secondary structure of the central amino acid is the class label

How to generalize protein secondary prediction as a machine learning problem? A set of “examples” are generated from sequence with known secondary structures Examples form a training set Build a neural network classifier Apply the classifier to a sequence with unknown secondary structure

Machine learning methods for Protein Secondary Structure Prediction Introduction to classification Generalize protein secondary structure prediction as a machine learning problem Introduction to Neural Network

What is an artificial Neural Network? An extremely simplified model of the brain Essentially a function approximator Transforms inputs into outputs to the best of its ability

Introduction to Neural Network Composed of many “neurons” that co-operate to perform the desired function

How do Neural Network Work? A neuron (perceptron) is a single layer NN The output of a neuron is a function of the weighted sum of the inputs plus a bias

Activation Function Binary active function f(x)=1 if x>=0 f(x)=0 otherwise The most common sigmoid function used is the logistic function f(x) = 1/(1 + e -x ) The calculation of derivatives are important for neural networks and the logistic function has a very nice derivative f’(x) = f(x)(1 - f(x))

Where Do The Weights Come From? The weights in a neural network are the most important factor in determining its function Training is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function Supervised Training Supplies the neural network with inputs and the desired outputs Response of the network to the inputs is measured  The weights are modified to reduce the difference between the actual and desired outputs

Perceptron Example Simplest neural network with the ability to learn Made up of only input neurons and output neurons Output neurons use a simple threshold activation function In basic form, can only solve linear problems Limited applications

Perceptron Example Perceptron weight updating If the output is not correct, the weights are adjusted according to the formula: w new = w old +  ·(desired – output)  input Assuming given instance {(1,0,1), 0}

Multi-Layer Feedforward NN An extension of the perceptron Multiple layers The addition of one or more “hidden” layers in between the input and output layers Activation function is not simply a threshold Usually a sigmoid function A general function approximator Not limited to linear problems Information flows in one direction The outputs of one layer act as inputs to the next layer

Multi-Layer Feedforward NN Example XOR problem

Back-propagation Searches for weight values that minimize the total error of the network over the set of training examples Forward pass: Compute the outputs of all units in the network, and the error of the output layers. Backward pass: The network error is used for updating the weights (credit assignment problem).

NN for Protein Secondary Structure Prediction

Outline Protein Structure Why structure How to Predict Protein Structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for Protein Secondary Structure Prediction Tertiary structure prediction (3D) Ab initio Homology modeling

Ab initio Prediction Sampling the global conformation space Lattice models / Discrete-state models Molecular Dynamics Picking native conformations with an energy function Solvation model: how protein interacts with water Pair interactions between amino acids

Lattice String Folding HP model: main modeled force is hydrophobic attraction Amino Acids are classified into two types Hydrophopic (H) or Polar (P) NP-hard in both 2-D square and 3-D cubic Constant approximation algorithms Not so relevant biologically

Lattice String Folding

Energy Minimization Many forces act on a protein Hydrophobic: inside of protein wants to avoid water Hydrophobic molecules associate with each other in water solvent as if water molecules is the repellent to them. It is like oil/water separation. Packing: atoms can't be too close, nor too far away van der Waals interactions Bond angle/length constraints Long distance, e.g. Electrostatics & Hydrogen bonds Disulphide bonds Salt bridges Can calculate all of these forces, and minimize Intractable in general case, but can be useful

Molecular Dynamics (MD) In molecular dynamics simulation, we simulate motions of atoms as a function of time according to Newton ’ s equation of motion. The equations for a system consisting on N atoms can be written as Here, r i and m i represent the position and mass of atom i and F i (t) is the force on atom i at time t. F i (t) is given by where V ( r 1, r 2, …, r N ) is the potential energy of the system that depends on the positions of the N atoms in the system. ∇ i is (1) (3) (2)

Energy Functions used in Molecular Simulation Electrostatic term H-bonding term Van der Waals term Bond stretching term Dihedral termAngle bending term r Φ Θ + ー O H r r r The most time demanding part.

Outline Protein Structure Why structure How to Predict Protein Structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for Protein Secondary Structure Prediction Tertiary structure prediction (3D) Ab initio Homology modeling

Homology-based Prediction Align query sequence with sequences of known structure, usually >30% similar Superimpose the aligned sequence onto the structure template, according to the computed sequence alignment Perform local refinement of the resulting structure in 3D 90% of new structures submitted to PDB in the past three years have similar folds in PDB The number of unique structural folds is small (possibly a few thousand)

Homology-based Prediction Raw model Loop modeling Side chain placement Refinement

Homology-based Prediction

Outline Protein Structure Why structure How to predict protein structure Experimental methods Computational methods (predictive methods) Protein Structure Prediction Secondary structure prediction (2D) Machine learning methods for protein secondary structure prediction Tertiary structure prediction (3D) Ab initio Homology modeling