From Cornell University Computational Optimization Open Textbook - Optimization Wiki
|
|
Line 6: |
Line 6: |
| == Problem formulation == | | == Problem formulation == |
| === 1. Objective === | | === 1. Objective === |
| Minimize the loss function <math>f(x)</math>, where <math>x \in R^n</mat hand <math>x</math> is the weight vector to be optimized. | | Minimize the loss function <math>f(x)</math>, where <math>x \in R^n</math> and <math>x</math> is the weight vector to be optimized. |
|
| |
|
| === 2. Parameters === | | === 2. Parameters === |
Revision as of 16:35, 10 December 2024
Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024)
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Introduction
Problem formulation
1. Objective
Minimize the loss function , where and is the weight vector to be optimized.
2. Parameters
** Where:
* is the running average of the squared gradient.
* is the corrected decay parameter.
* is a regularization constant.
** Where:
* is the relative step size.
* is a regularization constant.
* is the root mean square, defined as:
3. Problem Formulation
Adafactor for Weighted Vectors
Inputs:
- Initial point:
- Relative step sizes: for to
- Second moment decay: for to , with
- Regularization constants:
- Clipping threshold:
Algorithm:
- For to :
- Compute adaptive step size:
- Compute gradient:
- Update second moment estimate:
- Compute normalized gradient:
- Apply clipping:
- Update parameter:
- End for
Adafactor for Weighted Matrices
Inputs:
- Initial point:
- Relative step sizes: for to
- Second moment decay: for to , with
- Regularization constants:
- Clipping threshold:
Algorithm:
- For to :
- Compute adaptive step size:
- Compute gradient:
- Update row-wise second moment:
- Update column-wise second moment:
- Update overall second moment estimate:
- Compute normalized gradient:
- Apply clipping:
- Update parameter:
- End for
4. Proposed Hyperparameters for Adafactor
- Regularization constant 1:
- Regularization constant 2:
- Clipping threshold:
- Relative step size:
- Second moment decay:
Numerical Examples
Applications
Conclusion
Reference