Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bidirectional Dynamics for Protein Secondary Structure Prediction

Similar presentations


Presentation on theme: "Bidirectional Dynamics for Protein Secondary Structure Prediction"— Presentation transcript:

1 Bidirectional Dynamics for Protein Secondary Structure Prediction
P. Baldi et.al., Sequence Learning pp , Springer, 2000 G. Pollastri et al., ISMB, pp , 2001 P. Baldi et al., Hybrid Modeling, HMM/NN Architecture, and Protein Applications, Neural Computation 8(7), pp , 1996 S. Haykin, Neural Networks, Chapter 15, 1998 Summarized by O, Jangmin BioIntelligence Lab.

2 Contents Introduction IOHMMs and Bidirectional IOHMMs
Bidirectional Recurrent Neural Networks Datasets Architecture Details and Experimental Results Conclusion BioIntelligence Lab.

3 Introduction BioIntelligence Lab.

4 Learning in Sequential domains
Connectionist models Dynamical systems of which hidden states store contextual information Adapt to variable time lags for complex sequential mappings One discouraging field : sequential translation Interpretation needs some delay. Causality assumption Output at t is independent with future inputs. Non-causal dynamics for finite sequences Both information from past and future influence on analysis at time t. DNA and protein sequences BioIntelligence Lab.

5 Protein Secondary Structure
Polypeptides chains carrying out most of the basic functions of life at the molecular level 20-letter amino linear sequences folding into complex 3D structures Secondary structure of protein Local folding regularities often maintained by hydrogen bonds 3 classes : alpha helices, beta sheets, coils BioIntelligence Lab.

6 Prediction of Protein 2nd Structure (1)
NN is best up to date (Rost & Sander, 1994) Qian & Sejnowski 1988 Local fixed window size 13 (around the residue predicted) Cascaded architecture Q3 = 64.3%, C = 0.41, C = 0.31, C = 0.41 Window size & overfitting Rost & Sander 1993b Started from (Qian & Sejnowski 1988) : early stopping, ensemble averages Use of multiple alignments : 2nd structure more conserved than primary sequence Profiles (position-dependent frequency vector) are used. Q3 = 72%, CASP2 Q3 = 74% (best up to date) BioIntelligence Lab.

7 Prediction of Protein 2nd Structure (2)
Riis & Krogh 1996 Adaptive encoding of amino acids 3 different networks from using biological prior knowledge Helix periodicit Ensemble networks and filtering Multiple alignments with a weighting scheme After prediction from single sequence, combining using multiple alignments Q3 = 71.3% (similar with Rost & Sander 1994) BioIntelligence Lab.

8 Prediction of Protein 2nd Structure (3)
Accuracy upper bound : 70 ~ 75% based on local information only Use of distant information is needed Beta sheets : stabilizing bonds can be formed between amino acids far apart BioIntelligence Lab.

9 Approach with dynamics
RNN (recurrent neural networks) IOHMM (input-output hidden Markov models) State dynamics storing contextual information Adapt to variable width temporal dependencies Vanishing gradients problem : prevents RNN from capturing long-ranged information BioIntelligence Lab.

10 IOHMMs and Bidirectional IOHMMs
BioIntelligence Lab.

11 Markovian Models for Sequence Processing (1)
Markovian model (HMM) State-of-the-art approaches for sequence learning Speech recognition, sequence analysis in molecular biology, time series prediction, pattern recognition, information extraction … Markovian property BioIntelligence Lab.

12 Markovian Models for Sequence Processing (2)
그림 1 BioIntelligence Lab.

13 Markovian Models for Sequence Processing (3)
Factorial HMM (Gharamani & Jordan 1997) Supervised learning for HMM : IHMM Not ML estimation but MMI (maximum mutual information) estimation Similar with stochastic translating automaton Can estimate conditional probability of the class given input sequence. (different with unsupervised case of unconditional probability estimation) BioIntelligence Lab.

14 The Bidirectional Architecture (1)
Two Markov sequences F : forward direction (causal impact on Ft+1) B : backward direction (causal impact on Bt-1) Factorization Parameterization Baldi & Chauvin, 1996 Neural network for each local distribution BioIntelligence Lab.

15 The Bidirectional Architecture (2)
그림 2 BioIntelligence Lab.

16 Inference and Learning (1)
Inference : general junction tree algorithm (Jensen 1990) Learning by EM Complete data : Y, F, B, U Missing data : F, B E-step : compute sufficient statistics M-step : optimize  BioIntelligence Lab.

17 Inference and Learning (2)
Sufficient statistics Nj,l,u(f) : expected number of forward transitions from fl to fj when the input is u (j, l = 1, …, n; u = 1, …, K) Nk,l,j(b) : expected number of forward transitions from bl to bk when the input is u (k, l = 1, …, m; u = 1, …, K) Ni,j,k,u(y) : expected number of times output symbol i is emitted at a given position t when the forward stats at t is fj, backward state at t is bk, and the input at t is u. Can be calculated through junction tree algorithm Local conditional probability can be easily computed if modeled by multinomial distribution. BioIntelligence Lab.

18 Inference and Learning (3)
NN approach (case of P(Yt|Ft, Bt, Ut)) ai,t : activation of the i-th output unit of network given time step t. zi,j,k,t = exp(ai,t)/(lexp(al,t)) Contribution in this sequence at position t to the expected sufficient statistics Error function = 0, if yt  yi BioIntelligence Lab.

19 Bidirectional Recurrent Neural Nets
BioIntelligence Lab.

20 The Architecture (1) Notation
Ft  Rn, Bt  Rm , Ut  Rk , State dynamics (Boundary condition F0=BT+1=0) Output mapping BioIntelligence Lab.

21 The Architecture (2) 그림 3 BioIntelligence Lab.

22 Inference and Learning (1)
Unrolling the network on the input sequence (Back-propagation through time) Same graphical model with BIOHMM Interpretation as a Bayesian network Deterministic relation rather than probabilistic Dirac-delta distribution BioIntelligence Lab.

23 Inference and Learning (2)
From F0 and BT+1, after forward and backward propagation, predictions Yt can be computed. More efficient inference than BIOHMM Ft and Bt evolve independently O(n2) In BIOHMM, Ft and Bt become dependent when Yt is given : their cliques contain triplets of state variables O(n3) Learning : cross-entropy cost function Non-causal version of back-propagation through time (weight sharing) BioIntelligence Lab.

24 Inference and Learning (3)
Back propagation through time Unroll RNN Compute error signal for the leaf nodes (Yt) Propagate the error over time (in both directions) Obtain total gradients by summing all the contributions associated to different time steps Gradients N N BioIntelligence Lab.

25 Embedded Memories and Other Architectural Variants (1)
Vanishing gradients (Bengio 1994) Unable to store past information about inputs Gradients vanish expoentially Propagation needs multiplication of Jacobian of transition function : eigenvalue of attractor’s Jacobian < 1 (stable dynamics) Explicit delay line BioIntelligence Lab.

26 Datasets BioIntelligence Lab.

27 Architecture Details and Experimental Results
BioIntelligence Lab.

28 Experimental Data 1 824 sequences (2/3 training, 1/3 test)
# of Free parameter : 1400 ~ 2600 Best FNN : Q3 = 67.2%, 68.5% using adaptive input encoding and output filtering 표1 BioIntelligence Lab.

29 Experimental Data 2 Same data set, but ensemble using a simple averaging Different n, k, and hidden units 6 networks ensemble using profiles at input level : Q3 = 75.1% Sensitive to information located within 15 amino acids FNN : 8 Embedded memory architecture doesn’t make much improvement. BioIntelligence Lab.

30 Experimental Data 3 Official test sequences 1998 CASP3 competition (35 sequences) Winner D. Jones (with two programs) Q3 = 77.6% per protein, 75.5% per residue (Jones 1999) Answer of Jones’ prediction server Jones : 76.2% / 74.3% Authors : 74.6% / 73.0% Excuse : Jones builds upon more recent profiles from TrEMBL database. BioIntelligence Lab.

31 More… BIOHMMs Best result is obtained using w=11, n=m=10, 20 hidden units for output network, 6 hidden units for Ft, Bt transition networks. 105 parameters Severe computational demands BRNN 1400 ~ 2600 free parameters More complex architecture can be used. Accuracies are nearly same in between MLP (Riis & Krogh 1996) and BRNNs BioIntelligence Lab.

32 Conclusion Two novel architecture for dealing with sequence learning problems Non-causal model BIOHMM, BRNN Very close performance to the best existing systems But used profiles is not sophisticated BioIntelligence Lab.


Download ppt "Bidirectional Dynamics for Protein Secondary Structure Prediction"

Similar presentations


Ads by Google