FTRL algorithm: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
Line 50: Line 50:


'''Figure 1: Stabilizing Effects of FTRL for Linear Loss Function'''
'''Figure 1: Stabilizing Effects of FTRL for Linear Loss Function'''
[[File:FTRL_Linear_Loss_Stabilization.png|thumb|alt=Stabilizing effects of FTRL for linear loss function|Stabilizing effects of FTRL for linear loss function. Without regularization (blue), updates fluctuate significantly. With regularization (orange), updates stabilize over time.]]
* Without Regularization: Updates (<math>w_t</math>) exhibit significant fluctuations as they respond solely to the cumulative gradients <math>\sum_{i=1}^t a_i</math>.
* Without Regularization: Updates (<math>w_t</math>) exhibit significant fluctuations as they respond solely to the cumulative gradients <math>\sum_{i=1}^t a_i</math>.
* With Regularization: Updates are smoother, as the regularization term <math>\lambda w</math> reduces the influence of cumulative gradients, stabilizing the updates over time.
* With Regularization: Updates are smoother, as the regularization term <math>\lambda w</math> reduces the influence of cumulative gradients, stabilizing the updates over time.

Revision as of 12:58, 14 December 2024

Author: Tsz Ki Peter Wei (tmw86), Nhi Nguyen (npn25), Colin Erb (cte24) (ChemE 6800 Fall 2024)

Introduction

Algorithm Discussion

Example 1: Linear Loss with Quadratic Regularization

Problem Setup:

  • Decision variable: ,
  • Loss functions: , where are coefficients provided by the environment. In the case of a linear loss function, determines the slope or gradient of the loss function ,
  • Regularization: , where is a hyperparameter that determines the update rate.

FTRL Update Rule: The update rule is:

To minimize, take the derivative and set it to zero:

Rearranging gives:

Step-by-Step Calculation: Let the gradient of the loss functions be , , , initialize , and choose .

  • Iteration 1:

  • Iteration 2:

  • Iteration 3:

Observation: The updates for gradually converge rather than overshooting or oscillating, thanks to regularization reducing large shifts with the term in the update rule. Decision variables calculated with and without the regularization term are compared for 10 iterations to demonstrate this stabilizing effect.

Figure 1: Stabilizing Effects of FTRL for Linear Loss Function

Stabilizing effects of FTRL for linear loss function
Stabilizing effects of FTRL for linear loss function. Without regularization (blue), updates fluctuate significantly. With regularization (orange), updates stabilize over time.
  • Without Regularization: Updates () exhibit significant fluctuations as they respond solely to the cumulative gradients .
  • With Regularization: Updates are smoother, as the regularization term reduces the influence of cumulative gradients, stabilizing the updates over time.

Example 2: Quadratic Loss with Quadratic Regularization

Problem Setup:

  • Quadratic loss function: , where is the target set by the adversary,
  • Regularization: .

FTRL Update Rule: The update rule becomes:

Simplifying:

Step-by-Step Calculation: Let coefficients , , , initialize , and choose .

  • Iteration 1:

  • Iteration 2:

  • Iteration 3:

Observation: The denominator ensures that updates become smaller over time, stabilizing the solution as more data is observed. Decision variables calculated with and without the regularization term are compared for 10 iterations to demonstrate this stabilizing effect in Figure 2.

Figure 2: Stabilizing Effects of FTRL for Quadratic Loss Function

  • Without Regularization: Linear and unbounded growth. Updates () grow rapidly as they are driven entirely by the cumulative target values , leading to unbounded growth over time.
  • With Regularization: Gradual convergence with stabilized updates. The denominator dampens the influence of cumulative targets, stabilizing the updates.

Application

The FTRL algorithm is a powerful framework for online learning problems due to its ability to handle large datasets and adapt to new data in real-time. It has been applied in finance, healthcare, and electrical engineering. In finance, the FTRL algorithm has been used in online advertising, e-commerce recommendations, and fraud detection [1][2]. In healthcare, the FTRL algorithm has been used for the analysis of patient data [3]. In electrical engineering, the FTRL algorithm has been used to optimize the performance of electricity-generating systems [4][5]. Specific case studies of the FTRL algorithm in each field are shown below.

Finance

The first application that the FTRL algorithm was used for was online advertising by Google [1]. In this case study, the FTRL algorithm was used to predict ad click-through rates (CTR) for sponsored search advertising at Google [1]. They were able to demonstrate the FTRL algorithm’s ability to both predict accurately and also be sparse compared to other online learning algorithms [1]. Additional case studies in the finance sector have been explored such as online portfolio optimization [6]. In this case study, the authors were able to build off the original FTRL algorithm to come up with a new algorithm called VB-FTRL that can maximize a trader’s return on their portfolio while having reduced runtime compared to the best-performing algorithm (Universal Portfolios) [6].  

Healthcare

In healthcare, the FTRL algorithm was used in a case study to classify thyroid nodules for Thyroid cancer diagnosis [7]. In this case study, the authors proposed a novel FTRL-Deep Neural Network technique to precisely classify thyroid nodules into benign or malignant. The algorithm would analyze ultrasound images of the thyroid nodules and then determine whether or not the patient has thyroid cancer or not [7]. They were able to show that their FTRL-Deep Neural Network algorithm had superior accuracy compared to other algorithms such as the Hybrid Feature Cropping Network and Multi-Channel Convolutional Neural Network [7].

Electrical Engineering

Recently, FTRL has been applied to electricity-generating systems. A recent case study used FTRL to optimize the output of a solar panel system to offset the effect of non-uniform irradiance [4]. They developed this algorithm to control a system of switches that would determine the optimal configuration of the solar panel array when non-uniform irradiance is detected to maximize the power output from the solar panel [4]. In an earlier case study, researchers were able to use an FTRL algorithm to predict when low voltages would occur in a power distribution system, allowing for better system management [5].

Softwares and Platforms that Utilize the FTRL Algorithm

Software tools such as H2O Driverless AI and Keras can utilize the FTRL algorithm for various machine-learning applications [8][9]. In addition, platforms such as Google Ads utilize the FTRL algorithm [2]. These are just a few examples of softwares and platforms where the FTRL algorithm is used. As machine-learning becomes more and more widespread in different applications, the usage of the FTRL algorithm is expected to increase.

Conclusion

The FTRL algorithm is most useful in online learning, where large amounts of data need to be processed in real time to enable accurate predictions. Currently, the FTRL algorithm is used on platforms such as Google Ads and included in software tools such as Keras and H2O driverless AI. In recent years, other areas of application that involve online learning, such as healthcare and power systems, have been explored for the FTRL algorithm.

The algorithm works through a minimization of a linearized loss function considering all historical data and at least one regularization term every time a new point is added. This minimization operation is reasonably space and time-efficient due to linearization, and due to flexibility in how the weight vector is regularized, very versatile. The simplest implementation of the algorithm is equivalent to gradient descent, while more complex versions of the algorithm are able to effectively stabilize the weight vector as the dataset grows and maintain sparsity in its updates through the inclusion of an L1 norm.

From the numerical examples, it is evident that regularization plays a pivotal role in ensuring convergence. For linear loss functions, FTRL mitigates oscillatory behavior by reducing the influence of cumulative gradients. For quadratic loss functions, regularization stabilizes updates over time, resulting in gradual convergence and bounded growth. These properties underline FTRL's adaptability in noisy environments and its capacity to produce reliable predictions.

Looking ahead, potential areas for improvement include enhancing computational efficiency for more complex loss functions and exploring hybrid regularization schemes that combine the strengths of L1, L2, and other techniques. Additionally, further advancements in applying FTRL to evolving fields such as autonomous systems, environmental modeling, and personalized medicine could broaden its impact.

References

  1. Jump up to: 1.0 1.1 1.2 1.3 H. B. McMahan et al., “Ad click prediction: a view from the trenches,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago Illinois USA: ACM, Aug. 2013, pp. 1222–1230. doi: 10.1145/2487575.2488200.
  2. Jump up to: 2.0 2.1 J. O. Schneppat, “Follow The Regularized Leader (FTRL),” Schneppat AI. Accessed: Nov. 30, 2024. [Online]. Available: https://schneppat.com/follow-the-regularized-leader_ftrl.html
  3. Z. Ye, F. Chen, and Y. Jiang, “Analysis and Privacy Protection of Healthcare Data Using Digital Signature,” in Proceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology, Harbin China: ACM, Jan. 2024, pp. 171–176. doi: 10.1145/3673277.3673307.
  4. Jump up to: 4.0 4.1 4.2 X. Gao et al., “Followed The Regularized Leader (FTRL) prediction model based photovoltaic array reconfiguration for mitigation of mismatch losses in partial shading condition,” IET Renew. Power Gener., vol. 16, no. 1, pp. 159–176, 2022, doi: 10.1049/rpg2.12275.
  5. Jump up to: 5.0 5.1 C. Gao, Z. Ding, S. Yan, and H. Mai, “Low Voltage Prediction Based on Spark and Ftrl,” in Proceedings of the Advances in Materials, Machinery, Electrical Engineering (AMMEE 2017), Tianjin City, China: Atlantis Press, 2017. doi: 10.2991/ammee-17.2017.34.
  6. Jump up to: 6.0 6.1 R. Jézéquel, D. M. Ostrovskii, and P. Gaillard, “Efficient and Near-Optimal Online Portfolio Selection,” Sep. 28, 2022, arXiv: arXiv:2209.13932. doi: 10.48550/arXiv.2209.13932.
  7. Jump up to: 7.0 7.1 7.2 A. Beyyala, R. Priya, S. R. Choudari, and R. Bhavani, “Classification of Thyroid Nodules Using Follow the Regularized Leader Optimization Based Deep Neural Networks. | EBSCOhost.” Accessed: Nov. 30, 2024. [Online]. Available: https://openurl.ebsco.com/contentitem/doi:10.18280%2Fria.370315?sid=ebsco:plink:crawler&id=ebsco:doi:10.18280%2Fria.370315
  8. “Supported Algorithms — Using Driverless AI 1.11.0 documentation.” Accessed: Nov. 30, 2024. [Online]. Available: https://docs.h2o.ai/driverless-ai/1-10-lts/docs/userguide/supported-algorithms.html
  9. K. Team, “Keras documentation: Ftrl.” Accessed: Nov. 30, 2024. [Online]. Available: https://keras.io/api/optimizers/ftrl/

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu