Optimization methods Morten Nielsen Department of Systems Biology, DTU.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
Advertisements

Lecture 18: Temporal-Difference Learning
Cross validation, training and evaluation of data driven prediction methods Morten Nielsen Department of Systems Biology, DTU.
NEURAL NETWORKS Backpropagation Algorithm
EE 690 Design of Embodied Intelligence
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Gibbs sampling Morten Nielsen, CBS, BioSys, DTU. Class II MHC binding MHC class II binds peptides in the class II antigen presentation pathway Binds peptides.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Stabilization matrix method (Rigde regression) Morten Nielsen Department of Systems Biology, DTU.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Algorithms in Bioinformatics Morten Nielsen BioSys, DTU.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Motion Analysis (contd.) Slides are from RPI Registration Class.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Biological sequence analysis and information processing by artificial neural networks.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
CS 4700: Foundations of Artificial Intelligence
Ensemble Learning (2), Tree and Forest
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Algorithms in Bioinformatics Morten Nielsen Department of Systems Biology, DTU.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Sequence encoding, Cross Validation Morten Nielsen BioSys, DTU
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Non-Bayes classifiers. Linear discriminants, neural networks.
02/17/10 CSCE 769 Optimization Homayoun Valafar Department of Computer Science and Engineering, USC.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Dealing with Sequence redundancy Morten Nielsen Department of Systems Biology, DTU.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Chapter 2-OPTIMIZATION
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Psi-Blast Morten Nielsen, Department of systems biology, DTU.
Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Stabilization matrix method (Ridge regression) Morten Nielsen Department of Systems Biology, DTU.
Performance measures Morten Nielsen, CBS, Department of Systems Biology, DTU.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Blast heuristics, Psi-Blast, and Sequence profiles Morten Nielsen Department of systems biology, DTU.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Artificial Neural Networks
GEOGG121: Methods Monte Carlo methods, revision
第 3 章 神经网络.
Machine Learning Today: Reading: Maria Florina Balcan
Artificial Intelligence Chapter 3 Neural Networks
Morten Nielsen, CBS, BioSys, DTU
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Optimization methods Morten Nielsen Department of Systems Biology, DTU

Outline Optimization procedures –Gradient decent –Monte Carlo Overfitting –cross-validation Method evaluation

Linear methods. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent

Weights are changed in the opposite direction of the gradient of the error

Gradient decent (Linear function) Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient decent. Doing it your self Weights are changed in the opposite direction of the gradient of the error 10 W 1 =0.1W 2 =0.1 Linear function o What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use  =0.1, and t=1)?

Fill out the table itrW1W2O What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use  =0.1, t=1)? 10 W 1 =0.1W 2 =0.1 Linear function o

Monte Carlo Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm Or when you are too stupid to do the math yourself?

Monte Carlo (Minimization) dE<0dE>0

Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE E1 = 5.4 E2 = 5.7 E2 = 5.2 dE>0; P accept =1 dE<0; 0 < P accept < 1 Note the sign. Maximization

Monte Carlo Temperature What is the Monte Carlo temperature? Say dE=-0.2, T=1 T=0.001

MC minimization

Monte Carlo - Examples Why a temperature?

Local minima

A prediction method contains a very large set of parameters –A matrix for predicting binding for 9meric peptides has 9x20=180 weights Over fitting is a problem Data driven method training years Temperature

ALAKAAAAM ALAKAAAAN ALAKAAAAR ALAKAAAAT ALAKAAAAV GMNERPILT GILGFVFTM TLNAWVKVV KLNEPVLLL AVVPFIVSV MRSGRVHAV VRFNIDETP ANYIGQDGL AELCGDPGD QTRAVADGK GRPVPAAHP MTAQWWLDA FARGVVHVI LQRELTRLQ AVAEEMTKS Evaluation of predictive performance Train PSSM on raw data – No pseudo counts, No sequence weighting – Fit 9*20 parameters to 9*10 data points Evaluate on training data –PCC = 0.97 –AUC = 1.0 Close to a perfect prediction method Binders None Binders

AAAMAAKLA AAKNLAAAA AKALAAAAR AAAAKLATA ALAKAVAAA IPELMRTNG FIMGVFTGL NVTKVVAWL LEPLNLVLK VAVIVSVPF MRSGRVHAV VRFNIDETP ANYIGQDGL AELCGDPGD QTRAVADGK GRPVPAAHP MTAQWWLDA FARGVVHVI LQRELTRLQ AVAEEMTKS Evaluation of predictive performance Train PSSM on Permuted (random) data – No pseudo counts, No sequence weighting – Fit 9*20 parameters to 9*10 data points Evaluate on training data –PCC = 0.97 –AUC = 1.0 Close to a perfect prediction method AND Same performance as one the original data Binders None Binders

Repeat on large training data (229 ligands)

Cross validation Train on 4/5 of data Test/evaluate on 1/5 => Produce 5 different methods each with a different prediction focus

Model over-fitting 2000 MHC:peptide binding data PCC=0.99 Evaluate on 600 MHC:peptide binding data PCC=0.80

Model over-fitting (early stopping) Evaluate on 600 MHC:peptide binding data PCC=0.89 Stop training

What is going on? years Temperature  

5 fold training Which method to choose?

5 fold training

Method evaluation Use cross validation Evaluate on concatenated data and not as an average over each cross-validated performance

Method evaluation Which prediction to use?

Method evaluation

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o Per target: Global: Sum over weights Sum over data points

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o per target

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o

SMM training Evaluate on 600 MHC:peptide binding data L=0: PCC=0.70 L=0.1 PCC = 0.78

SMM - Stabilization matrix method Monte Carlo I1I1 I2I2 w1w1 w2w2 Linear function o Global: Make random change to weights Calculate change in “global” error Update weights if MC move is accepted Note difference between MC and GD in the use of “global” versus “per target” error

Training/evaluation procedure Define method Select data Deal with data redundancy –In method (sequence weighting) –In data (Hobohm) Deal with over-fitting either –in method (SMM regulation term) or –in training (stop fitting on test set performance) Evaluate method using cross-validation

A small doit script /usr/opt/www/pub/CBS/courses/27623.algo/exercises/code/SMM/doit_ex #! /bin/tcsh foreach a ( `cat allelefile` ) mkdir -p $ cd $a foreach l ( ) mkdir -p l.$l cd l.$l foreach n ( ) smm -nc 500 -l $l train.$n > mat.$n pep2score -mat mat.$n eval.$n > eval.$n.pred end echo $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr` cd.. end cd.. end