Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

EE 690 Design of Embodied Intelligence
Machine Learning and Data Mining Linear regression
Regularization David Kauchak CS 451 – Fall 2013.
Edge Preserving Image Restoration using L1 norm
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
RL for Large State Spaces: Value Function Approximation
Gibbs sampling Morten Nielsen, CBS, BioSys, DTU. Class II MHC binding MHC class II binds peptides in the class II antigen presentation pathway Binds peptides.
Pattern Recognition and Machine Learning
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Stabilization matrix method (Rigde regression) Morten Nielsen Department of Systems Biology, DTU.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Optimization methods Morten Nielsen Department of Systems Biology, DTU.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Motion Analysis (contd.) Slides are from RPI Registration Class.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CS 4700: Foundations of Artificial Intelligence
Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 some slides from Stephen Marsland, some images.
Collaborative Filtering Matrix Factorization Approach
Radial Basis Function Networks
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Classification Part 3: Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and.
02/17/10 CSCE 769 Optimization Homayoun Valafar Department of Computer Science and Engineering, USC.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
ADALINE (ADAptive LInear NEuron) Network and
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Chapter 2-OPTIMIZATION
Machine Learning 5. Parametric Methods.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Stabilization matrix method (Ridge regression) Morten Nielsen Department of Systems Biology, DTU.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Regularized Least-Squares and Convex Optimization.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
GEOGG121: Methods Monte Carlo methods, revision
CSE 4705 Artificial Intelligence
A Simple Artificial Neuron
A Brief Introduction of RANSAC
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CS 2750: Machine Learning Linear Regression
Probabilistic Models for Linear Regression
Machine Learning Today: Reading: Maria Florina Balcan
Collaborative Filtering Matrix Factorization Approach
RL for Large State Spaces: Value Function Approximation
Morten Nielsen, CBS, BioSys, DTU
Artificial Intelligence 10. Neural Networks
Facultad de Ingeniería, Centro de Cálculo
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization Minimization

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization Minimization

The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University Minimization

Outline Optimization procedures –Gradient descent –Monte Carlo Overfitting –cross-validation Method evaluation

Linear methods. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient descent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

Gradient descent (example)

Gradient descent

Weights are changed in the opposite direction of the gradient of the error

Gradient descent (Linear function) Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient descent Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient descent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient descent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

Gradient descent. Doing it your self Weights are changed in the opposite direction of the gradient of the error 10 W 1 =0.1W 2 =0.1 Linear function o What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use  =0.1, and t=1)?

Fill out the table itrW1W2O What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use  =0.1, t=1)? 10 W 1 =0.1W 2 =0.1 Linear function o

Fill out the table itrW1W2O What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use  =0.1, t=1)? 10 W 1 =0.1W 2 =0.1 Linear function o

Monte Carlo Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm Or when you are too stupid to do the math yourself?

Example: Estimating Π by Independent Monte-Carlo Samples Suppose we throw darts randomly (and uniformly) at the square: Algorithm: For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++ End Output: Adapted from course slides by Craig Douglas clab/mcintro.html

Estimating 

Monte Carlo (Minimization) dE<0dE>0

The Traveling Salesman Adapted from

Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE E1 = 5.4 E2 = 5.7 E2 = 5.2 dE>0; P accept =1 dE<0; 0 < P accept < 1 Note the sign. Maximization

Monte Carlo Temperature What is the Monte Carlo temperature? Say dE=-0.2, T=1 T=0.001

MC minimization

Monte Carlo - Examples Why a temperature?

Local minima

Stabilization matrix method

A prediction method contains a very large set of parameters –A matrix for predicting binding for 9meric peptides has 9x20=180 weights Over fitting is a problem Data driven method training years Temperature

Regression methods. The mathematics y = ax + b 2 parameter model Good description, poor fit y = ax 6 +bx 5 +cx 4 +dx 3 +ex 2 +fx+g 7 parameter model Poor description, good fit

Model over-fitting

Stabilization matrix method (Ridge regression). The mathematics y = ax + b 2 parameter model Good description, poor fit y = ax 6 +bx 5 +cx 4 +dx 3 +ex 2 +fx+g 7 parameter model Poor description, good fit

SMM training Evaluate on 600 MHC:peptide binding data L=0: PCC=0.70 L=0.1 PCC = 0.78

Stabilization matrix method. The analytic solution Each peptide is represented as 9*20 number (180) H is a stack of such vectors of 180 values t is the target value (the measured binding) is a parameter introduced to suppress the effect of noise in the experimental data and lower the effect of overfitting

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o Sum over weights Sum over data points

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o Per target error: Global error: Sum over weights Sum over data points

SMM - Stabilization matrix method Do it yourself I1I1 I2I2 w1w1 w2w2 Linear function o per target

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o per target

SMM - Stabilization matrix method I1I1 I2I2 w1w1 w2w2 Linear function o

SMM - Stabilization matrix method Monte Carlo I1I1 I2I2 w1w1 w2w2 Linear function o Global: Make random change to weights Calculate change in “global” error Update weights if MC move is accepted Note difference between MC and GD in the use of “global” versus “per target” error

Training/evaluation procedure Define method Select data Deal with data redundancy –In method (sequence weighting) –In data (Hobohm) Deal with over-fitting either –in method (SMM regulation term) or –in training (stop fitting on test set performance) Evaluate method using cross-validation

A small doit script //home/user1/bin/doit_ex #! /bin/tcsh foreach a ( `cat allelefile` ) mkdir -p $ cd $a foreach l ( ) mkdir -p l.$l cd l.$l foreach n ( ) smm -nc 500 -l $l train.$n > mat.$n pep2score -mat mat.$n eval.$n > eval.$n.pred end echo $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr` cd.. end cd.. end