Adafactor: Difference between revisions

VisualWikitext

Revision as of 17:23, 11 December 2024

Author: Aolei Cao (ac3237), Ziyang Li (zl986), Junjia Liang (jl4439) (ChemE 6800 Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Problem formulation

1. Objective

Minimize the loss function $f(x)$ , where $x\in R^{n}$ and $x$ is the weight vector to be optimized.

2. Parameters

Gradient:

$G_{t}=\nabla f(x_{t-1})$

Second moment estimate:

${\hat {V}}_{t}={\hat {\beta }}_{2t}{\hat {V}}_{t-1}+(1-{\hat {\beta }}_{2t})(G_{t}^{2}+\epsilon _{1}1_{n})$

Where:
- ${\hat {V}}_{t}$ is the running average of the squared gradient.
- ${\hat {\beta }}_{2t}$ is the corrected decay parameter.
- $\epsilon _{1}$ is a regularization constant.

Step size:

$\alpha _{t}=\max(\epsilon _{2},{\text{RMS}}(x_{t-1}))\rho _{t}$

Where:
- $\rho _{t}$ is the relative step size.
- $\epsilon _{2}$ is a regularization constant.
- ${\text{RMS}}$ ${\text{RMS}}$ is the root mean square, defined as:
  - $u_{xt}={\frac {-g_{xt}}{\sqrt {{\hat {v}}_{xt}}}}$
  - ${\text{RMS}}(U_{t})={\text{RMS}}_{x\in X}(u_{xt})={\sqrt {{\text{Mean}}_{x\in X}\left({\frac {(g_{xt})^{2}}{{\hat {v}}_{xt}}}\right)}}$

3. Algorithms

Adafactor for Weighted Vectors

Inputs:

Initial point: $X_{0}\in \mathbb {R} ^{n}$
Relative step sizes: $\rho _{t}$ for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T}
Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t}} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} , with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{21} = 0}
Regularization constants: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1, \epsilon_2}
Clipping threshold: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}

Algorithm:

For $t=1$ $t=1$ to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} :
- Compute adaptive step size: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t}
- Compute gradient: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t = \nabla f_t(X_{t-1})}
- Update second moment estimate: ${\hat {V}}_{t}={\hat {\beta }}_{2t}{\hat {V}}_{t-1}+(1-{\hat {\beta }}_{2t})(G_{t}^{2}+\epsilon _{1}1_{n})$
- Compute normalized gradient: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t = \frac{G_t}{\sqrt{\hat{V}_t}}}
- Apply clipping: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{U}_t = \frac{U_t}{\max(1, \text{RMS}(U_t) / d)}}
- Update parameter: $X_{t}=X_{t-1}-\alpha _{t}{\hat {U}}_{t}$
End for

Adafactor for Weighted Matrices

Inputs:

Initial point: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0 \in \mathbb{R}^{n \times m}}
Relative step sizes: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t} for $t=1$ to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T}
Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t}} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t = 1} to $T$ , with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{21} = 0}
Regularization constants: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1, \epsilon_2}
Clipping threshold: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}

Algorithm:

For $t=1$ $t=1$ to Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} :
- Compute adaptive step size: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t = \max(\epsilon_2, \text{RMS}(X_{t-1})) \rho_t}
- Compute gradient: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t = \nabla f_t(X_{t-1})}
- Update row-wise second moment: $R_{t}={\hat {\beta }}_{2t}R_{t-1}+(1-{\hat {\beta }}_{2t})(G_{t}^{2}+\epsilon _{1}1_{n}1_{m}^{T})1_{m}$
- Update column-wise second moment: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_t = \hat{\beta}_{2t} C_{t-1} + (1 - \hat{\beta}_{2t}) 1_n^T (G_t^2 + \epsilon_1 1_n 1_m^T)}
- Update overall second moment estimate: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_t = \frac{R_t C_t}{1_n^T R_t}}
- Compute normalized gradient: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t = \frac{G_t}{\sqrt{\hat{V}_t}}}
- Apply clipping: ${\hat {U}}_{t}={\frac {U_{t}}{\max(1,{\text{RMS}}(U_{t})/d)}}$
- Update parameter: $X_{t}=X_{t-1}-\alpha _{t}{\hat {U}}_{t}$
End for

4. Proposed Hyperparameters for Adafactor

Regularization constant 1: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1 = 10^{-30}}
Regularization constant 2: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_2 = 10^{-3}}
Clipping threshold: $d=1$
Relative step size: $\rho _{t}=\min(10^{-2},1/{\sqrt {t}})$
Second moment decay: Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t} = 1 - t^{-0.8}}

Numerical Examples

Step-by-step instructions for determining the result of the first iteration.

Problem setup

Initial weights (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0} ):

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0 = \begin{bmatrix} 0.7 &-0.5& 0.9\\ -1.1 & 0.8& -1.6\\1.2&-0.7& 0.4 \end{bmatrix}}

Initial gradient (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t} ):

Gradient of the loss function with respect to X

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t = \begin{bmatrix} 0.3&-0.2&0.4\\ -0.5&0.6&-0.1\\0.2&-0.4 &0.3 \end{bmatrix}}

Hyperparameters setup

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_1 = 10^{-30}} (Minimum learning rate scaling factor))

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon_2 = 10^{-3}} (Regularization constant)

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d = 1} (Clipping threshold)

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_t = \min(10^{-2}, 1/\sqrt{t})} (Relative step size)

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{\beta}_{2t} = 1 - t^{-0.8}} (Second moment decay)

Step 1: Learning Rate Scaling

Define the relative step size

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho_1 = \min(10^{-2}, 1/\sqrt{1})= 10^{-2}}

Step 1.1: Root Mean Square(RMS) calculation for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0}

Root Mean Square(RMS) calculation for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_0}

RMS formula

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle RMS(X_0) = \sqrt{\tfrac{1}{n}\textstyle \sum_{i=1}^n\displaystyle X_0[i]^2}}

Substitute the initial weights

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle RMS(X_0) = \sqrt{\tfrac{1}{9}(0.72^2+(-0.5)^2+0.9^2+(-1.1)^2+0.8^2+(-0.6)^2+1.2^2+(-0.7)^2+0.4^2)}}

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle RMS(X_0) = \sqrt{\frac{6.85}{9}}\approx 0.806}

Step 1.2: Find the Learning Rate Scaling (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_t} ):

Learning rate formula

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_1 = max(\epsilon_2,RMS(X_0))\cdot p_1}

Substitute the RMS

$\alpha _{1}=max(0.001,0.806)\cdot 0.01=0.00806$

Step 2: Compute Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G^{2}_t} (Element-wise Square of Gradient)

Compute the squared value of each element in the gradient matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G_t} .

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G^{2}_t = \begin{bmatrix} 0.3^2&(-0.2)^2&0.4^2\\ (-0.5)^2&0.6^2&(-0.1)^2\\0.2^2&(-0.4)^2 &0.3^2 \end{bmatrix}}

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G^{2}_t = \begin{bmatrix} 0.09& 0.04&0.16\\ 0.25&0.36&0.01\\0.04&0.16&0.09\end{bmatrix}}

Step 3: Find the moment estimate

Compute the exponential moving average of squared gradients to capture the variance or scale of gradients.

Step 3.1: Compute row moments (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_t} )

This equation computes the row-wise second moments (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_t} ) as an exponential moving average of past moments ( $R_{t-1}$ ) and the current row-wise mean of squared gradients ( Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G^{2}_t} ), with a balance controlled by ( ${\hat {\beta }}_{2t}$ ).

For $G_{t}^{2}=\mathbb {R} ^{m\times n}$

$R_{t}={\hat {\beta _{2t}}}\cdot R_{t-1}+(1-{\hat {\beta }})\cdot ({\tfrac {1}{m}}\textstyle \sum _{j=1}^{m}\displaystyle G_{t}^{2}[i,j]+\epsilon _{1})$

Since ${\hat {\beta }}_{2t}=1-t^{-0.8}$ , for first iteration: ${\hat {\beta }}_{21}=0$ . And because $\epsilon _{1}$ is too small, we can ignore it. The update of $R_{t}$ is:

$R_{1}={\tfrac {1}{m}}\textstyle \sum _{j=1}^{m}\displaystyle G_{t}^{2}[i,j]$

Row-wise mean (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_t} ):

$R_{1}={\begin{bmatrix}{\tfrac {0.09+0.04+0.16}{3}}\\{\tfrac {0.25+0.36+0.01}{3}}\\{\tfrac {0.04+0.16+0.09}{3}}\end{bmatrix}}={\begin{bmatrix}0.0967\\0.2067\\0.0967\end{bmatrix}}$

Step 3.2: Compute column moments ( $C_{t}$ )

The process is same as row moments

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_t = \hat{\beta}\cdot C_{{t-1}} + (1-\hat{\beta})\cdot (\tfrac{1}{n}\textstyle \sum_{j=1}^n \displaystyle G^{2}_t[i,j]+\epsilon_1) }

Column-wise mean (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_t} ):

$C_{1}={\begin{bmatrix}{\tfrac {0.09+025+0.04}{3}}\\{\tfrac {0.04+0.36+0.16}{3}}\\{\tfrac {0.16+0.01+0.09}{3}}\end{bmatrix}}={\begin{bmatrix}0.1267\\0.1867\\0.0867\end{bmatrix}}$

Step 3.3: Second Moment Estimate (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle V_t} )

The Second Moment Estimate is calculated as the outer product of the row moments ( $R_{t}$ ) and column moments (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_t} ).

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_t = R_t \otimes C_t}

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{V}_1 = \begin{bmatrix} 0.0967\\0.2067\\0.0967 \end{bmatrix} \otimes \begin{bmatrix} 0.1267&0.1867&0.0867\\ \end{bmatrix} }

${\hat {V}}_{1}={\begin{bmatrix}0.0122&0.0180&0.0084\\0.0262&0.0386&0.0179\\0.0122&0.0180&0.0084\end{bmatrix}}$

Step 4: Update the vector ( $U_{t}$ )

step 4.1: Find the vector value of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t }

Formula of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_t }

$U_{t}={\frac {G_{t}}{\sqrt {V_{t}+\epsilon _{1}}}}$

Substitute $C_{t}$ and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle V_t}

$U_{1}={\frac {\begin{bmatrix}0.3&-0.2&0.4\\-0.5&0.6&-0.1\\0.2&-0.4&0.3\end{bmatrix}}{\sqrt {\begin{bmatrix}0.0122&0.0180&0.0084\\0.0262&0.0386&0.0179\\0.0122&0.0180&0.0084\end{bmatrix}}}}$

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U_1 = \begin{bmatrix} 2.711&-1.489&4.370\\-3.090&3.055&-0.747\\1.807&-2.978&3.278 \end{bmatrix} }

step 4.2: Clipped Update Vector ${\hat {U_{t}}}$

Formula of ${\hat {U_{t}}}$

${\hat {U_{t}}}={\frac {U_{t}}{max(1,{\tfrac {RMS(U_{t})}{d}})}}$

Compute RMS of $U_{t}$

$RMS(U_{t})={\sqrt {{\tfrac {1}{9}}\sum _{i=1}^{9}U_{t}[i]^{2}}}\approx 3.303$

Since RMS( $U_{t}$ )>d, scale $U_{t}$ by ${\tfrac {1}{3.303}}$

${\hat {U_{t}}}={\begin{bmatrix}0.965&-0.53&1.556\\-1.1&1.088&-0.266\\0.664&-1.06&1.167\end{bmatrix}}$

Step 4: Weight Update ( $X_{1}$ )

$X_{1}=X_{0}-\alpha \cdot {\hat {U_{t}}}$

The result for first iteration

$X_{1}={\begin{bmatrix}0.7&-0.5&0.9\\-1.1&0.8&-1.6\\1.2&-0.7&0.4\end{bmatrix}}-0.00806\cdot {\begin{bmatrix}0.965&-0.53&1.556\\-1.1&1.088&-0.266\\0.664&-1.06&1.167\end{bmatrix}}$

$X_{1}={\begin{bmatrix}0.692&-0.496&0.887\\-1.091&0.791&-0.596\\1.195&-0.691&0.391\end{bmatrix}}$

Applications

Conclusion

Reference

@@ Line 90: / Line 90: @@
 <math>G_t = \begin{bmatrix} 0.3&-0.2&0.4\\ -0.5&0.6&-0.1\\0.2&-0.4 &0.3 \end{bmatrix}</math>
 '''<big>Hyperparameters setup</big>'''
@@ Line 104: / Line 102: @@
 <math>\hat{\beta}_{2t} = 1 - t^{-0.8}</math> (Second moment decay)
 '''<big>Step 1:  Learning Rate Scaling</big>'''
@@ Line 136: / Line 132: @@
 <math>\alpha_1 = max(0.001,0.806)\cdot 0.01=0.00806</math>
 '''<big>Step 2: Compute <math>G^{2}_t</math> (Element-wise Square of Gradient)</big>'''
-Square the gradient value
+Compute the squared value of each element in the gradient matrix '''<math>G_t</math>'''.
 <math>G^{2}_t = \begin{bmatrix} 0.3^2&(-0.2)^2&0.4^2\\ (-0.5)^2&0.6^2&(-0.1)^2\\0.2^2&(-0.4)^2 &0.3^2 \end{bmatrix}</math>
@@ Line 153: / Line 147: @@
 '''<big>Step 3: Find the moment estimate</big>'''
+Compute the exponential moving average of squared gradients to capture the variance or scale of gradients.
 '''Step 3.1: Compute row moments (<math>R_t</math>)'''
@@ Line 163: / Line 157: @@
 <math>R_t = \hat{\beta_{2t}} \cdot R_{t-1} + (1-\hat{\beta})\cdot (\tfrac{1}{m}\textstyle \sum_{j=1}^m \displaystyle G^{2}_t[i,j]+\epsilon_1) </math>
-Since <math>\hat{\beta}_{2t} = 1 - t^{-0.8}</math>, for first iteration: <math>\hat{\beta}_{21} = 0</math>. And because <math>\epsilon_1 </math> is too small, we ignore it. The update of '''<math>R_1</math>''' is:
+Since <math>\hat{\beta}_{2t} = 1 - t^{-0.8}</math>, for first iteration: <math>\hat{\beta}_{21} = 0</math>. And because <math>\epsilon_1 </math> is too small, we can ignore it. The update of '''<math>R_t</math>''' is:
 <math>R_{1} = \tfrac{1}{m}\textstyle \sum_{j=1}^m \displaystyle G^{2}_t[i,j] </math>
@@ Line 189: / Line 183: @@
 The Second Moment Estimate is calculated as the outer product of the row moments ('''<math>R_t</math>''') and column moments ('''<math>C_t</math>''').
-<math>V_t = R_t \otimes C_t</math>
+<math>\hat{V}_t = R_t \otimes C_t</math>
+<math>\hat{V}_1   = \begin{bmatrix} 0.0967\\0.2067\\0.0967 \end{bmatrix} \otimes    \begin{bmatrix} 0.1267&0.1867&0.0867\\ \end{bmatrix} </math>
-<math>V_t = \begin{bmatrix} 0.0967\\0.2067\\0.0967 \end{bmatrix} \otimes    \begin{bmatrix} 0.1267&0.1867&0.0867\\ \end{bmatrix} </math>
-<math>V_t =  \begin{bmatrix} 0.0122&0.0180&0.0084\\ 0.0262&0.0386&0.0179\\ 0.0122&0.0180&0.0084\end{bmatrix} </math>
+<math>\hat{V}_1       =  \begin{bmatrix} 0.0122&0.0180&0.0084\\ 0.0262&0.0386&0.0179\\ 0.0122&0.0180&0.0084\end{bmatrix} </math>
@@ Line 228: / Line 223: @@
-Calculate RMS of '''<math>U_t </math>'''
+Compute RMS of '''<math>U_t </math>'''
 '''<small><math>RMS(U_t) = \sqrt{\tfrac{1}{9}  \sum_{i=1}^9 U_t[i]^2}  \approx 3.303 </math></small>'''
@@ Line 241: / Line 236: @@
 '''<big>Step 4: Weight Update (</big>'''<math>X_1 </math>'''<big>)</big>'''
+<math>X_1 = X_0 - \alpha \cdot      \hat{U_t}</math>
+The result for first iteration
+<math>X_1 = \begin{bmatrix} 0.7 &-0.5& 0.9\\ -1.1 & 0.8& -1.6\\1.2&-0.7& 0.4 \end{bmatrix}  - 0.00806 \cdot      \begin{bmatrix} 0.965&-0.53&1.556 \\-1.1&1.088&-0.266\\0.664&-1.06&1.167 \end{bmatrix}       </math>
+<math>X_1 =   \begin{bmatrix} 0.692&-0.496&0.887 \\-1.091&0.791&-0.596\\ 1.195&-0.691&0.391\end{bmatrix}       </math>