AdaGrad: Difference between revisions
(→Theory) |
|||
Line 11: | Line 11: | ||
<math>f(x)</math>: Stochastic objective function with parameters <math>x</math>. | <math>f(x)</math>: Stochastic objective function with parameters <math>x</math>. | ||
<math>f_t(x) | <math>f_t(x)</math>: Realization of stochastic objective at time step <math>t</math>. For simplicity <math>f_t | ||
</math>: Realization of stochastic objective at time step <math>t</math>. | </math>. | ||
<math>g_t(x) | <math>g_t(x) | ||
</math>: The gradient of <math>f_t(x)</math> with respect to <math>x</math>, formally <math>\nabla_x f_t(x)</math>. | </math>: The gradient of <math>f_t(x)</math> with respect to <math>x</math>, formally <math>\nabla_x f_t(x)</math>. For simplicity, <math>g_t | ||
</math>. | |||
<math>x_t</math>: Parameters at time step <math>t</math>. | |||
=== Traditional Sub-gradient Update === | === Traditional Sub-gradient Update === |
Revision as of 16:46, 26 November 2021
Author: Daniel Villarraga (SYSEN 6800 Fall 2021)
Introduction
AdaGrad is a family of sub-gradient algorithms for stochastic optimization. The algorithms belonging to that family are similar to second-order stochastic gradient descend with an approximation for the Hessian of the optimized function. AdaGrad's name comes from Adaptative Gradient. Intuitively, it adapts the learning rate for each feature depending on the estimated geometry of the function; additionally, it tends to assign higher learning rates to infrequent features, which ensures that the parameter updates rely less on frequency and more on relevance.
AdaGrad was introduced by Duchi et al.[1] in a highly cited paper published in the Journal of machine learning research in 2011. It is arguably one of the most popular algorithms for machine learning (particularly for training deep neural networks) and it influenced the development of the Adam algorithm[2].
Theory
Definitions
: Stochastic objective function with parameters .
: Realization of stochastic objective at time step . For simplicity .
: The gradient of with respect to , formally . For simplicity, .
: Parameters at time step .