Biological sequence analysis and information processing by artificial neural networks
Objectives InputNeural networkOutput Neural network: is a black box that no one can understand over predict performance
Pairvise alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/ % identity; Global alignment score: carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI
HUNKAT
Biological Neural network
Biological neuron
Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units
Biological neuron structure
Transfer of biological principles to artificial neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items
Neural networks Neural networks can learn higher order correlations XOR function: 0 0 => => => => 0 (1,1) (1,0) (0,0) (0,1) No linear function can separate the points
Error estimates XOR 0 0 => => => => 0 (1,1) (1,0) (0,0) (0,1) Predict 0 1 Error 0 1 Mean error: 1/4
Neural networks v1v1 v2v2 Linear function
Neural networks w 11 w 12 v1v1 w 21 w 22 v2v2 Higher order function
Neural networks. How does it work? w 12 v1v1 w 21 w 22 v2v2 w t2 w t1 w 11 vtvt Input 1 (Bias) {
Neural networks (0 0) Input 1 (Bias) { o 1 =-6 O 1 =0 o 2 =-2 O 2 =0 y 1 =-4.5 Y 1 =0
Neural networks (1 0 && 0 1) Input 1 (Bias) { o 1 =-2 O 1 =0 o 2 =4 O 2 =1 y 1 =4.5 Y 1 =1
Neural networks (1 1) Input 1 (Bias) { o 1 =2 O 1 =1 o 2 =10 O 2 =1 y 1 =-4.5 Y 1 =0
What is going on? XOR function: 0 0 => => => => Input 1 (Bias) { y2y2 y1y1
What is going on? (1,1) (1,0) (0,0) (0,1) x2x2 x1x1 y1y1 y2y2 (1,0) (2,2) (0,0)
DEMO
Training and error reduction
Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms
A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training years Temperature
Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus
Neural network training curve Maximum test set performance Most cable of generalizing
Network training Encoding of sequence data Sparse encoding Blosum encoding Sequence profile encoding
Sparse encoding of amino acid sequence windows
Sparse encoding Inp Neuron AAcid A R N D C Q E
BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
Sequence encoding (continued) Sparse encoding V: L: V. L=0 (unrelated) Blosum encoding V: L: V. L = 0.88 (highly related) V. R = (close to unrelated)
Applications of artificial neural networks Talk recognition Prediction of protein secondary structure Prediction of Signal peptides Post translation modifications Glycosylation Phosphorylation Proteasomal cleavage MHC:peptide binding
Higher order sequence correlations Neural networks can learn higher order correlations! –What does this mean? S S => 0 L S => 1 S L => 1 L L => 0 Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind? => XOR function
What have we learned Neural networks are not so bad as their reputation Neural networks can deal with higher order correlations Be careful when training a neural network –Always use cross validated training