CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent
K EY C ONCEPTS Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts
Line search: pick step size to lead to decrease in function value
(Use your favorite univariate optimization method) f(x- f(x))
G RADIENT D ESCENT P SEUDOCODE Input: f, starting value x 1, termination tolerances For t=1,2,…,maxIters: Compute the search direction d t = - f ( x t ) If || d t ||< ε g then: return “Converged to critical point”, output x t Find t so that f ( x t + t d t ) < f ( x t ) using line search If || t d t ||< ε x then: return “Converged in x ”, output x t Let x t +1 = x t + t d t Return “Max number of iterations reached”, output x maxIters
R ELATED M ETHODS Steepest descent (discrete) Coordinate descent
Many local minima: good initialization, or random restarts