Download presentation
Presentation is loading. Please wait.
1
Steepest Descent Algorithm: Step 1. π€ (π‘+1) = π€ (π‘) βπβ
ππΏ ππ€ π€ (π‘)
Very simple. Just one step. Poor convergence if the function surface is more steep in one dimension than in another. Letβs see an example!
2
Steepest Descent Objective: πΏ π€ = π€ 1 2 +25 π€ 2 2 Gradient:
π»πΏ= 2 π€ π€ 2 Small over π€ 1 direction, large over π€ 2 direction! -50 π€ 2 β2 π€ 1
3
Steepest Descent Zigzag behavior!
Move very slowly on the flat direction, oscillate on the steep direction.
4
Momentum Accelerate convergence on the direction that the gradient keeps the same sign over the past few iterations. If π is large, momentum will overshoot the target, but the overall convergence is still faster than steepest descent in practice. As π goes smaller, momentum behaves more similar to steepest descent.
6
Nesterov Accelerated Gradient
Algorithm: Step 1. π£ (π‘+1) =π π£ (π‘) βπβ
ππΏ ππ€ π€ (π‘) +π π£ (π‘) Step 2. π€ (π‘+1) = π€ (π‘) + π£ (π‘+1) Nesterov Accelerated Gradient has better performance than momentum [Ilya et al., 2013].
8
AdaGrad (Adaptive Gradient)
Consider the previous example: πΏ π€ = π€ π€ 2 2 Gradient: π»πΏ= 2 π€ 1 50 π€ 2 Initial π€ (0) is β1,β1 . π€ (1) = β1 β1 β π β β7 β
(β2) π β β7 β
(β50) β β1 β1 + π π Will follow the diagonal direction!
10
RMSProp (Root Mean Square Propagation)
Algorithm: Step 1. πΊ (π‘+1) =πΎ πΊ (π‘) + 1βπΎ β
ππΏ ππ€ π€ (π‘) 2 πΊ is a moving exponential average of the square gradient. Step 2. π€ (π‘+1) = π€ (π‘) β π πΊ (π‘+1) β7 β
ππΏ ππ€ π€ (π‘) Scale the learning rate by 1/ πΊ and follow the negative gradient direction. Default settings: πΎ=0.9, π=0.001.
12
Another example (saddle point)
Now consider minimizing πΏ π€ = π€ π€ 2 2 The problem is unbounded and (0,0) is a saddle point. At β0.01,β1 , we have π»πΏ= β2 . How will these methods behave if we start at this point?
14
Adam (Adaptive Moment Estimation)
Algorithm [Complete Version]: Step 1. π (π‘+1) = π½ 1 π (π‘) + 1β π½ 1 β
ππΏ ππ€ π€ (π‘) Step 2. π£ (π‘+1) = π½ 2 π£ (π‘) +(1β π½ 2 ) β
ππΏ ππ€ π€ (π‘) 2 Step 3. π (π‘+1) = 1 1β π½ 1 π‘ β
π (π‘+1) Step 4. π£ (π‘+1) = 1 1β π½ 2 π‘ β
π£ (π‘+1) Step 5. π€ (π‘+1) = π€ (π‘) β π π£ (π‘+1) β8 β
π (π‘+1) Bias Correction (since we set π (0) , π£ (0) to 0)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.