Adamax: Difference between revisions

Revision as of 01:48, 15 December 2024

Author: Chengcong Xu (cx253), Jessica Liu (hl2482), Xiaolin Bu (xb58), Qiaoyue Ye (qy252), Haoru Feng (hf352) (ChemE 6800 Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Adamax is an optimization algorithm derived from the Adam optimizer, a popular method in the field of machine learning. While Adam leverages both momentum and adaptive learning rates for efficient training, Adamax extends Adam by replacing the root mean square (RMS) of the gradient's history with an infinity norm. This modification simplifies certain aspects of the algorithm, leading to improved stability, especially in cases where the gradient updates are very sparse or have large variations.

Historically, Adamax was introduced as part of the original Adam optimizer paper by Kingma and Ba (2014). It was presented as a variant of Adam tailored for scenarios where ℓ∞ norms offer computational or numerical advantages over ℓ2 norms.

The motivation for studying Adamax lies in its ability to handle complex optimization problems with ease, particularly in high-dimensional parameter spaces. Its stable convergence properties and efficient computation make it an essential tool for training deep learning models, where challenges such as sparse gradients, large datasets, and complex loss surfaces are prevalent.

@@ Line 1: / Line 1: @@
-Author: Chengcong Xu (cx253), Jessica Liu (hl2482), Xiaolin Bu (xb58), Qiaoyue Ye (qy252), Haoru Feng (hf352) (ChemE 6800 Fall 2024)
+[[Author]]: Chengcong Xu (cx253), Jessica Liu (hl2482), Xiaolin Bu (xb58), Qiaoyue Ye (qy252), Haoru Feng (hf352) (ChemE 6800 Fall 2024)
 Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
+== Introduction ==
+Adamax is an optimization algorithm derived from the Adam optimizer, a popular method in the field of machine learning. While Adam leverages both momentum and adaptive learning rates for efficient training, Adamax extends Adam by replacing the root mean square (RMS) of the gradient's history with an infinity norm. This modification simplifies certain aspects of the algorithm, leading to improved stability, especially in cases where the gradient updates are very sparse or have large variations.
+Historically, Adamax was introduced as part of the original Adam optimizer paper by Kingma and Ba (2014). It was presented as a variant of Adam tailored for scenarios where ℓ∞ norms offer computational or numerical advantages over ℓ2 norms.
+The motivation for studying Adamax lies in its ability to handle complex optimization problems with ease, particularly in high-dimensional parameter spaces. Its stable convergence properties and efficient computation make it an essential tool for training deep learning models, where challenges such as sparse gradients, large datasets, and complex loss surfaces are prevalent.

Adamax: Difference between revisions

Revision as of 01:48, 15 December 2024

Introduction

Navigation menu

Search