Trust-region methods
Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Introduction
Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions [1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.
The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.
The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].
Algorithm Discussion
The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of around :
(1)
where represents the step or direction vector from the current point, , , and is a symmetric matrix.
The first two terms are designed to match the first two terms of the Taylor series expansion of . The expression can also be written as:
where higher-order terms are approximated using . It can also be noted that the difference between and is of order , suggesting that when p is small, the approximation error is minimal.
For each iteration, seek a solution of the subproblem:
(2)
where represents the trust region radius and is greater than 0.
Here, represents the step taken at iteration , hence the next point calculated can be denoted as . For the moment, is the Euclidean norm.
To solve the subproblem, the trust region radius must be determined at each iteration.
(3)
If is negative, the objective value would be greater than the current value , leading to the rejection of the step. If is positive but not close to 1, the trust region remains unchanged. If is positive but close to 0, the trust region radius is reduced. On the other hand, if is approximately equal to 1, the trust region radius is expanded for the next point iteration [1].
Cauchy Point
Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:
(4)
where
Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about to refine . Consider the following:
as the full Newton step where .
The Dogleg Method
The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:
Numerical example
Applications
Approach on Newton methods on Riemannian manifold
Approach on policy optimization