Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Advertisements

Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Hidden Markov models) Morten.
Immune system overview in 10 minutes The non-immunologist guide to the immune system Morten Nielsen Department of Systems Biology DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS,
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Immune system overview in 10 minutes The non-immunologist guide to the immune system.
NEURAL NETWORKS Backpropagation Algorithm
Sequence motifs, information content, logos, and Weight matrices
Prediction of T cell epitopes using artificial neural networks
MHC binding and MHC polymorphism Or Finding the needle in the haystack.
MHC binding and MHC polymorphism. MHC-I molecules present peptides on the surface of most cells.
Morten Nielsen, CBS, BioCentrum, DTU
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics Or Finding the needle in the haystack Morten Nielsen
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Depart of Systems Biology, DTU.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, Department of Systems Biology, DTU.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Optimization methods Morten Nielsen Department of Systems Biology, DTU.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Sequence motifs, information content, logos, and Weight matrices Morten Nielsen, CBS, BioCentrum, DTU.
Biological sequence analysis and information processing by artificial neural networks.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Characterizing receptor ligand interactions Morten Nielsen, CBS, Depart of Systems Biology, DTU.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Protein Secondary Structures
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Immunological Bioinformatics Introduction to the immune system.
Immunological Bioinformatics. The Immunological Bioinformatics group Immunological Bioinformatics group, CBS, Technical University of Denmark (
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Biological sequence analysis and information processing by artificial neural networks.
Psi-Blast Morten Nielsen, CBS, Department of Systems Biology, DTU.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Sequence encoding, Cross Validation Morten Nielsen BioSys, DTU
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.
Weight matrices, Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Department of Systems Biology, DTU and Instituto de Investigaciones.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics
Machine Learning Today: Reading: Maria Florina Balcan
Motifs, logos, and Profile HMM’s
Sequence motifs, information content, and sequence logos
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Immunological Bioinformatics
Sequence motifs, information content, logos, and HMM’s
Sequence motifs, information content, and sequence logos
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Presentation transcript:

Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU

InputNeural networkOutput Neural network: is a black box that no one can understand over-predicts performance Overfitting - many thousand parameters fitted on few data Objectives

Pairvise alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/ % identity; Global alignment score: carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

Biological Neural network

Biological neuron structure

Transfer of biological principles to artificial neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items

How to predict The effect on the binding affinity of having a given amino acid at one position can be influenced by the amino acids at other positions in the peptide (sequence correlations). –Two adjacent amino acids may for example compete for the space in a pocket in the MHC molecule. Artificial neural networks (ANN) are ideally suited to take such correlations into account

XOR function Cannot be solved be linear functions

SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAV LLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTL HLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTI ILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSL LERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGV PLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQM KLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSV KTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKV SLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYV ILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLV TGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAA GAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLA KARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIV AVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVV GLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLV VLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQC ISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGA YTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYI NMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTV VVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQ GLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYL EAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAV YLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRL FLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKL AAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYI AAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV MHC peptide binding

How is mutual information calculated? Information content was calculated as Gives information in a single position Similar relation for mutual information Gives mutual information between two positions Mutual information

Mutual information. Example ALWGFFPVA ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS YMNGTMSQV GILGFVFTL WLSLLVPFV FLPSDFFPS P1 P6 P(G 1 ) = 2/9 = 0.22,.. P(V 6 ) = 4/9 = 0.44,.. P(G 1,V 6 ) = 2/9 = 0.22, P(G 1 )*P(V 6 ) = 8/81 = 0.10 log(0.22/0.10) > 0 Knowing that you have G at P 1 allows you to make an educated guess on what you will find at P 6. P(V 6 ) = 4/9. P(V 6 |G 1 ) = 1.0!

313 binding peptides313 random peptides Mutual information

Higher order sequence correlations Neural networks can learn higher order correlations! –What does this mean? S S => 0 L S => 1 S L => 1 L L => 0 Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind? => XOR function

Neural networks Neural networks can learn higher order correlations XOR function: 0 0 => => => => 0 (1,1) (1,0) (0,0) (0,1) No linear function can separate the points OR AND XOR

Error estimates XOR 0 0 => => => => 0 (1,1) (1,0) (0,0) (0,1) Predict 0 1 Error 0 1 Mean error: 1/4

Neural networks v1v1 v2v2 Linear function

Neural networks

Neural networks. How does it work? w 12 v1v1 w 21 w 22 v2v2 w t2 w t1 w 11 vtvt Input 1 (Bias) {

Neural networks (0 0) Input 1 (Bias) { o 1 =-6 O 1 =0 o 2 =-2 O 2 =0 y 1 =-4.5 Y 1 =0 O =

Neural networks. How does it work? Hand out

XOR function: 0 0 => => => => Input 1 (Bias) { y2y2 y1y1 What is going on?

(1,1) (1,0) (0,0) (0,1) x2x2 x1x1 y1y1 y2y2 (1,0) (2,2) (0,0) What is going on?

Artificial Neural Networks 2 Morten Nielsen BioSys, DTU

Outline Optimization procedures –Gradient decent Network training –Sequence encoding –back propagation –cross-validation –Over-fitting –examples

Neural network. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

What about the hidden layer?

Hidden to output layer

Input to hidden layer

Summary

Or

I i =X[0][k] H j =X[1][j] O i =X[2][i]

Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2H2h2H2 h1H1h1H1 oOoO I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0 and  =0.5?

Can you do it your self (  =0.5). Has the error decreased? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2=H2=h2=H2= h 1= H 1 = o= O= I 1 =1I 2 =1 v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After

Demo

Network training Encoding of sequence data –Sparse encoding –Blosum encoding –Sequence profile encoding

Sparse encoding Inp Neuron AAcid A R N D C Q E

BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V

Sequence encoding (continued) Sparse encoding –V: –L: –V. L=0 (unrelated) Blosum encoding –V: –L: –V. L = 0.88 (highly related) –V. R = (close to unrelated)

The Wisdom of the Crowds The Wisdom of Crowds. Why the Many are Smarter than the Few. James Surowiecki One day in the fall of 1906, the British scientist Fracis Galton left his home and headed for a country fair… He believed that only a very few people had the characteristics necessary to keep societies healthy. He had devoted much of his career to measuring those characteristics, in fact, in order to prove that the vast majority of people did not have them. … Galton came across a weight-judging competition…Eight hundred people tried their luck. They were a diverse lot, butchers, farmers, clerks and many other no-experts…The crowd had guessed … pounds, the ox weighted 1.198

Network ensembles No one single network with a particular architecture and sequence encoding scheme, will constantly perform the best Also for Neural network predictions will enlightened despotism fail –For some peptides, BLOSUM encoding with a four neuron hidden layer can best predict the peptide/MHC binding, for other peptides a sparse encoded network with zero hidden neurons performs the best –Wisdom of the Crowd Never use just one neural network Use Network ensembles

Evaluation of prediction accuracy ENS : Ensemble of neural networks trained using sparse, and Blosum sequence encoding

Sequence encoding Change in weight is linearly dependent on input value “True” sparse encoding is therefore highly inefficient Sparse is most often encoded as –+1/-1 or 0.9/0.05

Training and error reduction 

 Size matters

A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has more than 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training years Temperature

What is going on? years Temperature  

Examples Train on 500 A0201 and 60 A0101 binding data Evaluate on 1266 A0201 peptides NH=1: PCC = 0.77 NH=5: PCC = 0.72

Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

Neural network training curve Maximum test set performance Most cable of generalizing

5 fold training Which network to choose?

5 fold training

How many folds? Cross validation is always good!, but how many folds? –Few folds -> small training data sets –Many folds -> small test data sets Example –560 peptides for training 50 fold (10 peptides per test set, few data to stop training) 2 fold (280 peptides per test set, few data to train) 5 fold (110 peptide per test set, 450 per training set)

Problems with 5fold cross validation Use test set to stop training, and test set performance to evaluate training –Over-fitting? If test set is small yes If test set is large no Confirm using “true” 5 fold cross validation –1/5 for evaluation –4/5 for 4 fold cross-validation

Conventional (fake) 5 fold cross validation

“True” 5 fold cross validation

When to be careful When data is scarce, the performance obtained used “conventional” versus “true” cross validation can be very large When data is abundant the difference is small, and “true” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “true” cross validation approach

Do hidden neurons matter? The environment matters NetMHCpan

Context matters FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201 FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101 DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201 DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

Applications of artificial neural networks Talk recognition Prediction of protein secondary structure Prediction of Signal peptides Post translation modifications –Glycosylation –Phosphorylation Proteasomal cleavage MHC:peptide binding

Prediction of protein secondary structure Benefits –General applicable –Can capture higher order correlations –Inputs other than sequence information Drawbacks –Needs many data (different solved structures). However, these does exist today (more than 2500 solved structures with low sequence identity/high resolution) –Complex method with several pitfalls

ß-strand Helix Turn Bend Secondary Structure Elements

Sparse encoding of amino acid sequence windows

How is it done One network (SEQ2STR) takes sequence (profiles) as input and predicts secondary structure –Cannot deal with SS elements i.e. helices are normally formed by at least 5 consecutive amino acids Second network (STR2STR) takes predictions of first network and predicts secondary structure –Can correct for errors in SS elements, i.e remove single helix prediction, mixture of strand and helix predictions

Architecture I K E E H V I I Q A E H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

Secondary networks (Structure-to-Structure) H E C H E C H E C H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

Example PITKEVEVEYLLRRLEE (Sequence) HHHHHHHHHHHHTGGG. (DSSP) ECCCHEEHHHHHHHCCC (SEQ2STR) CCCCHHHHHHHHHHCCC (STR2STR)

Why so many networks?

Why not select the best?

And some more stuff for the long cold winter nights Can it might be made differently?

Predicting accuracy Can it be made differently? Reliability

Summary Gradient decent is used to determine the updates for the synapses in the neural network Some relatively simple math defines the gradients –Networks without hidden layers can be solved on the back of an envelope –Hidden layers are a bit more complex, but still ok Always train networks using a test set to stop training –Be careful when reporting predictive performance Use “true” cross-validation for small data sets And hidden neurons do matter (sometimes)

What have we learned Neural networks are not so bad as their reputation Neural networks can deal with higher order correlations Be careful when training a neural network –Always use cross validated training