CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent.

CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent

K EY C ONCEPTS Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts

Line search: pick step size to lead to decrease in function value

(Use your favorite univariate optimization method)  f(x-  f(x)) 

G RADIENT D ESCENT P SEUDOCODE Input: f, starting value x 1, termination tolerances For t=1,2,…,maxIters: Compute the search direction d t = -  f ( x t ) If || d t ||< ε g then: return “Converged to critical point”, output x t Find  t so that f ( x t +  t d t ) < f ( x t ) using line search If ||  t d t ||< ε x then: return “Converged in x ”, output x t Let x t +1 = x t +  t d t Return “Max number of iterations reached”, output x maxIters

R ELATED M ETHODS Steepest descent (discrete) Coordinate descent

Many local minima: good initialization, or random restarts

CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent.

Similar presentations

Presentation on theme: "CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent.

Similar presentations

Presentation on theme: "CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent."— Presentation transcript:

Similar presentations

About project

Feedback