
上QQ阅读APP看书,第一时间看更新
Using momentum with gradient descent
Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.
Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:


Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.