Question about Gradient descent Hung-yi Lee
Larger gradient, larger steps? Best step:
Contradiction Original Gradient descent Adagrad RMSprop Larger gradient, larger step Divided by first derivative
Second Derivative Best step: The best step is |First derivative| Second derivative
More than one parameters |First derivative| Second derivative The best step is a b c d c < ac < a c > d Larger second derivative smaller second derivative a > b
What to do with Adagrad and RMSprop? |First derivative| Second derivative The best step is Use first derivative to estimate second derivative larger second derivative smaller second derivative
Acknowledgement This question is raised by 李廣和
Thanks for your attention!