Bidirectional Dynamics for Protein Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Secondary structure prediction from amino acid sequence.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Dynamic Bayesian Networks (DBNs)
Artificial Spiking Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Hidden Markov Models in Bioinformatics
Hidden Markov Models Theory By Johan Walters (SR 2003)
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Lecture 5: Learning models using EM
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM) Wenjie Pei In this coffee talk, I would like to present you some basic.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Hidden Markov Models In BioInformatics
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
7-Speech Recognition Speech Recognition Concepts
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Convolutional Sequence to Sequence Learning
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Online Multiscale Dynamic Topic Models
Deep Learning Amin Sobhani.
Deep Learning: Model Summary
Intro to NLP and Deep Learning
Intelligent Information System Lab
Different Units Ramakrishna Vedantam.
Neural networks (3) Regularization Autoencoder
A Consensus-Based Clustering Method
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Introduction to Bioinformatics II
Yuchun Tang (1), Preeti Singh (1), Yanqing Zhang (1),
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Artificial Intelligence Chapter 3 Neural Networks
CONTEXT DEPENDENT CLASSIFICATION
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Neural Networks Geoff Hulten.
Input Output HMMs for modeling network dynamics
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Deep Learning in Bioinformatics
Artificial Intelligence Chapter 3 Neural Networks
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Bidirectional Dynamics for Protein Secondary Structure Prediction P. Baldi et.al., Sequence Learning pp. 80-104, Springer, 2000 G. Pollastri et al., ISMB, pp. 234-242, 2001 P. Baldi et al., Hybrid Modeling, HMM/NN Architecture, and Protein Applications, Neural Computation 8(7), pp. 1541-1565, 1996 S. Haykin, Neural Networks, Chapter 15, 1998 Summarized by O, Jangmin BioIntelligence Lab.

Contents Introduction IOHMMs and Bidirectional IOHMMs Bidirectional Recurrent Neural Networks Datasets Architecture Details and Experimental Results Conclusion BioIntelligence Lab.

Introduction BioIntelligence Lab.

Learning in Sequential domains Connectionist models Dynamical systems of which hidden states store contextual information Adapt to variable time lags for complex sequential mappings One discouraging field : sequential translation Interpretation needs some delay. Causality assumption Output at t is independent with future inputs. Non-causal dynamics for finite sequences Both information from past and future influence on analysis at time t. DNA and protein sequences BioIntelligence Lab.

Protein Secondary Structure Polypeptides chains carrying out most of the basic functions of life at the molecular level 20-letter amino linear sequences folding into complex 3D structures Secondary structure of protein Local folding regularities often maintained by hydrogen bonds 3 classes : alpha helices, beta sheets, coils BioIntelligence Lab.

Prediction of Protein 2nd Structure (1) NN is best up to date (Rost & Sander, 1994) Qian & Sejnowski 1988 Local fixed window size 13 (around the residue predicted) Cascaded architecture Q3 = 64.3%, C = 0.41, C = 0.31, C = 0.41 Window size & overfitting Rost & Sander 1993b Started from (Qian & Sejnowski 1988) : early stopping, ensemble averages Use of multiple alignments : 2nd structure more conserved than primary sequence Profiles (position-dependent frequency vector) are used. Q3 = 72%, CASP2 Q3 = 74% (best up to date) BioIntelligence Lab.

Prediction of Protein 2nd Structure (2) Riis & Krogh 1996 Adaptive encoding of amino acids 3 different networks from using biological prior knowledge Helix periodicit Ensemble networks and filtering Multiple alignments with a weighting scheme After prediction from single sequence, combining using multiple alignments Q3 = 71.3% (similar with Rost & Sander 1994) BioIntelligence Lab.

Prediction of Protein 2nd Structure (3) Accuracy upper bound : 70 ~ 75% based on local information only Use of distant information is needed Beta sheets : stabilizing bonds can be formed between amino acids far apart BioIntelligence Lab.

Approach with dynamics RNN (recurrent neural networks) IOHMM (input-output hidden Markov models) State dynamics storing contextual information Adapt to variable width temporal dependencies Vanishing gradients problem : prevents RNN from capturing long-ranged information BioIntelligence Lab.

IOHMMs and Bidirectional IOHMMs BioIntelligence Lab.

Markovian Models for Sequence Processing (1) Markovian model (HMM) State-of-the-art approaches for sequence learning Speech recognition, sequence analysis in molecular biology, time series prediction, pattern recognition, information extraction … Markovian property BioIntelligence Lab.

Markovian Models for Sequence Processing (2) 그림 1 BioIntelligence Lab.

Markovian Models for Sequence Processing (3) Factorial HMM (Gharamani & Jordan 1997) Supervised learning for HMM : IHMM Not ML estimation but MMI (maximum mutual information) estimation Similar with stochastic translating automaton Can estimate conditional probability of the class given input sequence. (different with unsupervised case of unconditional probability estimation) BioIntelligence Lab.

The Bidirectional Architecture (1) Two Markov sequences F : forward direction (causal impact on Ft+1) B : backward direction (causal impact on Bt-1) Factorization Parameterization Baldi & Chauvin, 1996 Neural network for each local distribution BioIntelligence Lab.

The Bidirectional Architecture (2) 그림 2 BioIntelligence Lab.

Inference and Learning (1) Inference : general junction tree algorithm (Jensen 1990) Learning by EM Complete data : Y, F, B, U Missing data : F, B E-step : compute sufficient statistics M-step : optimize  BioIntelligence Lab.

Inference and Learning (2) Sufficient statistics Nj,l,u(f) : expected number of forward transitions from fl to fj when the input is u (j, l = 1, …, n; u = 1, …, K) Nk,l,j(b) : expected number of forward transitions from bl to bk when the input is u (k, l = 1, …, m; u = 1, …, K) Ni,j,k,u(y) : expected number of times output symbol i is emitted at a given position t when the forward stats at t is fj, backward state at t is bk, and the input at t is u. Can be calculated through junction tree algorithm Local conditional probability can be easily computed if modeled by multinomial distribution. BioIntelligence Lab.

Inference and Learning (3) NN approach (case of P(Yt|Ft, Bt, Ut)) ai,t : activation of the i-th output unit of network given time step t. zi,j,k,t = exp(ai,t)/(lexp(al,t)) Contribution in this sequence at position t to the expected sufficient statistics Error function = 0, if yt  yi BioIntelligence Lab.

Bidirectional Recurrent Neural Nets BioIntelligence Lab.

The Architecture (1) Notation Ft  Rn, Bt  Rm , Ut  Rk , State dynamics (Boundary condition F0=BT+1=0) Output mapping BioIntelligence Lab.

The Architecture (2) 그림 3 BioIntelligence Lab.

Inference and Learning (1) Unrolling the network on the input sequence (Back-propagation through time) Same graphical model with BIOHMM Interpretation as a Bayesian network Deterministic relation rather than probabilistic Dirac-delta distribution BioIntelligence Lab.

Inference and Learning (2) From F0 and BT+1, after forward and backward propagation, predictions Yt can be computed. More efficient inference than BIOHMM Ft and Bt evolve independently O(n2) In BIOHMM, Ft and Bt become dependent when Yt is given : their cliques contain triplets of state variables O(n3) Learning : cross-entropy cost function Non-causal version of back-propagation through time (weight sharing) BioIntelligence Lab.

Inference and Learning (3) Back propagation through time Unroll RNN Compute error signal for the leaf nodes (Yt) Propagate the error over time (in both directions) Obtain total gradients by summing all the contributions associated to different time steps Gradients N N BioIntelligence Lab.

Embedded Memories and Other Architectural Variants (1) Vanishing gradients (Bengio 1994) Unable to store past information about inputs Gradients vanish expoentially Propagation needs multiplication of Jacobian of transition function : eigenvalue of attractor’s Jacobian < 1 (stable dynamics) Explicit delay line BioIntelligence Lab.

Datasets BioIntelligence Lab.

Architecture Details and Experimental Results BioIntelligence Lab.

Experimental Data 1 824 sequences (2/3 training, 1/3 test) # of Free parameter : 1400 ~ 2600 Best FNN : Q3 = 67.2%, 68.5% using adaptive input encoding and output filtering 표1 BioIntelligence Lab.

Experimental Data 2 Same data set, but ensemble using a simple averaging Different n, k, and hidden units 6 networks ensemble using profiles at input level : Q3 = 75.1% Sensitive to information located within 15 amino acids FNN : 8 Embedded memory architecture doesn’t make much improvement. BioIntelligence Lab.

Experimental Data 3 Official test sequences 1998 CASP3 competition (35 sequences) Winner D. Jones (with two programs) Q3 = 77.6% per protein, 75.5% per residue (Jones 1999) Answer of Jones’ prediction server Jones : 76.2% / 74.3% Authors : 74.6% / 73.0% Excuse : Jones builds upon more recent profiles from TrEMBL database. BioIntelligence Lab.

More… BIOHMMs Best result is obtained using w=11, n=m=10, 20 hidden units for output network, 6 hidden units for Ft, Bt transition networks. 105 parameters Severe computational demands BRNN 1400 ~ 2600 free parameters More complex architecture can be used. Accuracies are nearly same in between MLP (Riis & Krogh 1996) and BRNNs BioIntelligence Lab.

Conclusion Two novel architecture for dealing with sequence learning problems Non-causal model BIOHMM, BRNN Very close performance to the best existing systems But used profiles is not sophisticated BioIntelligence Lab.