Adafactor: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
Line 5: Line 5:
== Introduction ==
== Introduction ==
== Problem formulation ==
== Problem formulation ==
=== 1. Objective ===
Minimize the loss function '''f(x)''', where '''x ∈ ℝⁿ''' and '''x''' is the weight vector to be optimized.
=== 2. Parameters ===
* '''Gradient:'''
  <math>G_t = \nabla f(x_{t-1})</math>
* '''Second moment estimate:'''
  <math>\hat{V}_t = \hat{\beta}_{2t} \hat{V}_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n)</math>
  ** Where:
    * <math>\hat{V}_t</math> is the running average of the squared gradient.
    * <math>\hat{\beta}_{2t}</math> is the corrected decay parameter.
    * <math>\epsilon_1</math> is a regularization constant.
* '''Step size:'''
  <math>\alpha_t = \max(\epsilon_2, \text{RMS}(x_{t-1})) \rho_t</math>
  ** Where:
    * <math>\rho_t</math> is the relative step size.
    * <math>\epsilon_2</math> is a regularization constant.
    * <math>\text{RMS}</math> is the root mean square, defined as:
      <math>u_{xt} = \frac{-g_{xt}}{\sqrt{\hat{v}_{xt}}}</math>
      <math>\text{RMS}(U_t) = \text{RMS}_{x \in X}(u_{xt}) = \sqrt{\text{Mean}_{x \in X}\left(\frac{(g_{xt})^2}{\hat{v}_{xt}}\right)}</math>
=== 3. Problem Formulation ===
==== Adafactor for Weighted Vectors ====
'''Inputs:'''
* Initial point: <math>X_0 \in \mathbb{R}^n</math>
* Relative step sizes: <math>\rho_t</math> for <math>t = 1</math> to <math>T</math>
* Second moment decay: <math>\hat{\beta}_{2t}</math> for <math>t = 1</math> to <math>T</math>, with <math>\hat{\beta}_{21} = 0</math>
* Regularization constants: <math>\epsilon_1, \epsilon_2</math>
* Clipping threshold: <math>d</math>
'''Algorithm:'''
# For <math>t = 1</math> to <math>T</math>:
## Compute adaptive step size:
  <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math>
## Compute gradient:
  <math>G_t = \nabla f_t(X_{t-1})</math>
## Update second moment estimate:
  <math>\hat{V}_t = \hat{\beta}_{2t} \hat{V}_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n)</math>
## Compute normalized gradient:
  <math>U_t = \frac{G_t}{\sqrt{\hat{V}_t}}</math>
## Apply clipping:
  <math>\hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}</math>
## Update parameter:
  <math>X_t = X_{t-1} - \alpha_t \hat{U}_t</math>
# End for
==== Adafactor for Weighted Matrices ====
'''Inputs:'''
* Initial point: <math>X_0 \in \mathbb{R}^{n \times m}</math>
* Relative step sizes: <math>\rho_t</math> for <math>t = 1</math> to <math>T</math>
* Second moment decay: <math>\hat{\beta}_{2t}</math> for <math>t = 1</math> to <math>T</math>, with <math>\hat{\beta}_{21} = 0</math>
* Regularization constants: <math>\epsilon_1, \epsilon_2</math>
* Clipping threshold: <math>d</math>
'''Algorithm:'''
# For <math>t = 1</math> to <math>T</math>:
## Compute adaptive step size:
  <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math>
## Compute gradient:
  <math>G_t = \nabla f_t(X_{t-1})</math>
## Update row-wise second moment:
  <math>R_t = \hat{\beta}_{2t} R_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n 1_m^T) 1_m</math>
## Update column-wise second moment:
  <math>C_t = \hat{\beta}_{2t} C_{t-1} + (1 - \hat{\beta}_{2t}) 1_n^T (G_t^2 + \epsilon_1 1_n 1_m^T)</math>
## Update overall second moment estimate:
  <math>\hat{V}_t = \frac{R_t C_t}{1_n^T R_t}</math>
## Compute normalized gradient:
  <math>U_t = \frac{G_t}{\sqrt{\hat{V}_t}}</math>
## Apply clipping:
  <math>\hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}</math>
## Update parameter:
  <math>X_t = X_{t-1} - \alpha_t \hat{U}_t</math>
# End for
=== 4. Proposed Hyperparameters for Adafactor ===
* Regularization constant 1: <math>\epsilon_1 = 10^{-30}</math>
* Regularization constant 2: <math>\epsilon_2 = 10^{-3}</math>
* Clipping threshold: <math>d = 1</math>
* Relative step size: <math>\rho_t = \min(10^{-2}, 1/\sqrt{t})</math>
* Second moment decay: <math>\hat{\beta}_{2t} = 1 - t^{-0.8}</math>


== Numerical Examples ==
== Numerical Examples ==

Revision as of 16:30, 10 December 2024

Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Problem formulation

1. Objective

Minimize the loss function f(x), where x ∈ ℝⁿ and x is the weight vector to be optimized.

2. Parameters

  • Gradient:
 
  • Second moment estimate:
 
 ** Where:
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_t}
 is the running average of the squared gradient.
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t}}
 is the corrected decay parameter.
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1}
 is a regularization constant.
  • Step size:
 Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t = \max(\epsilon_2, \text{RMS}(x_{t-1})) \rho_t}

 ** Where:
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t}
 is the relative step size.
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_2}
 is a regularization constant.
   * Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{RMS}}
 is the root mean square, defined as:
     Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle u_{xt} = \frac{-g_{xt}}{\sqrt{\hat{v}_{xt}}}}

     Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{RMS}(U_t) = \text{RMS}_{x \in X}(u_{xt}) = \sqrt{\text{Mean}_{x \in X}\left(\frac{(g_{xt})^2}{\hat{v}_{xt}}\right)}}

3. Problem Formulation

Adafactor for Weighted Vectors

Inputs:

  • Initial point: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0 \in \mathbb{R}^n}
  • Relative step sizes: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T}
  • Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t}} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} , with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{21} = 0}
  • Regularization constants: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1, \epsilon_2}
  • Clipping threshold: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}

Algorithm:

  1. For Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} :
    1. Compute adaptive step size:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t}

    1. Compute gradient:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t = \nabla f_t(X_{t-1})}

    1. Update second moment estimate:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_t = \hat{\beta}_{2t} \hat{V}_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n)}

    1. Compute normalized gradient:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t = \frac{G_t}{\sqrt{\hat{V}_t}}}

    1. Apply clipping:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}}

    1. Update parameter:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_t = X_{t-1} - \alpha_t \hat{U}_t}

  1. End for

Adafactor for Weighted Matrices

Inputs:

  • Initial point: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0 \in \mathbb{R}^{n \times m}}
  • Relative step sizes: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T}
  • Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t}} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} , with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{21} = 0}
  • Regularization constants: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1, \epsilon_2}
  • Clipping threshold: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}

Algorithm:

  1. For Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} :
    1. Compute adaptive step size:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t}

    1. Compute gradient:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t = \nabla f_t(X_{t-1})}

    1. Update row-wise second moment:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_t = \hat{\beta}_{2t} R_{t-1} + (1 - \hat{\beta}_{2t})(G_t^2 + \epsilon_1 1_n 1_m^T) 1_m}

    1. Update column-wise second moment:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_t = \hat{\beta}_{2t} C_{t-1} + (1 - \hat{\beta}_{2t}) 1_n^T (G_t^2 + \epsilon_1 1_n 1_m^T)}

    1. Update overall second moment estimate:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_t = \frac{R_t C_t}{1_n^T R_t}}

    1. Compute normalized gradient:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t = \frac{G_t}{\sqrt{\hat{V}_t}}}

    1. Apply clipping:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}}

    1. Update parameter:
  Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_t = X_{t-1} - \alpha_t \hat{U}_t}

  1. End for

4. Proposed Hyperparameters for Adafactor

  • Regularization constant 1: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1 = 10^{-30}}
  • Regularization constant 2: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_2 = 10^{-3}}
  • Clipping threshold: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d = 1}
  • Relative step size: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t = \min(10^{-2}, 1/\sqrt{t})}
  • Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t} = 1 - t^{-0.8}}

Numerical Examples

Applications

Conclusion

Reference