From Cornell University Computational Optimization Open Textbook - Optimization Wiki
|
|
Line 43: |
Line 43: |
| ** Compute adaptive step size: | | ** Compute adaptive step size: |
| <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math> | | <math>\alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t</math> |
| ** Compute gradient:
| | *Compute gradient: |
| <math>G_t = \nabla f_t(X_{t-1})</math> | | <math>G_t = \nabla f_t(X_{t-1})</math> |
| ** Update second moment estimate: | | ** Update second moment estimate: |
Revision as of 17:55, 10 December 2024
Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024)
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Introduction
Problem formulation
1. Objective
Minimize the loss function
, where
and
is the weight vector to be optimized.
2. Parameters
- Where:
is the running average of the squared gradient.
is the corrected decay parameter.
is a regularization constant.
- Where:
is the relative step size.
is a regularization constant.
is the root mean square, defined as:


3. Algorithms
Adafactor for Weighted Vectors
Inputs:
- Initial point:

- Relative step sizes:
for
to 
- Second moment decay:
for
to
, with 
- Regularization constants:

- Clipping threshold:

Algorithm:
- For
to
:
- Compute adaptive step size:
- Update second moment estimate:
- Compute normalized gradient:
Adafactor for Weighted Matrices
Inputs:
- Initial point:

- Relative step sizes:
for
to 
- Second moment decay:
for
to
, with 
- Regularization constants:

- Clipping threshold:

Algorithm:
- For
to
:
- Compute adaptive step size:
- Compute gradient:
- Update row-wise second moment:
- Update column-wise second moment:
- Update overall second moment estimate:
- Compute normalized gradient:
- Apply clipping:
- Update parameter:
- End for
4. Proposed Hyperparameters for Adafactor
- Regularization constant 1:

- Regularization constant 2:

- Clipping threshold:

- Relative step size:

- Second moment decay:

Numerical Examples
Applications
Conclusion
Reference