Introduction

Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions [1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.

The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.

The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].

Algorithm Discussion

The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of $$ f $$ around $$ x_k $$ :

(1) $m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p$

where $$ p $$ represents the step or direction vector from the current point, $$ f_k=f(x_k) $$ , $\nabla f_k = \nabla f(x_k)$ , and $$ B_k $$ is a symmetric matrix.

The first two terms $$ m_k(p) $$ are designed to match the first two terms of the Taylor series expansion of $$ f(x_k+p) $$ . The expression can also be written as:

$m_k(p) = f_k + \nabla f_k^T p + O(||p||^2)$

where higher-order terms $$ O(||p||^2) $$ are approximated using $$ B_k $$ . It can also be noted that the difference between $$ m_k(p) $$ and $$ f(x_k+p) $$ is of order $$ O(||p||^2) $$ , suggesting that when p is small, the approximation error is minimal.

For each iteration, seek a solution of the subproblem:

(2) $m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p$ $s.t. ||p|| \leq \Delta_k$

where $\Delta_k$ represents the trust region radius and is greater than 0.

Here, $$ p_k $$ represents the step taken at iteration $$ k $$ , hence the next point calculated can be denoted as $$ x_k+1=x_k+p $$ . For the moment, $|| \cdot ||$ is the Euclidean norm.

To solve the subproblem, the trust region radius $$ k $$ must be determined at each iteration.

(3) $p_k=\frac{f(x_k) - f(x_k+p_k)}{m_k(0) - m_k(p_k)} = \frac{actual\; prediction}{predicted\; reduction}$

If $$ p_k $$ is negative, the objective value $$ f(x_k+p) $$ would be greater than the current value $$ f(x_k) $$ , leading to the rejection of the step. If $$ p_k $$ is positive but not close to 1, the trust region remains unchanged. If $$ p_k $$ is positive but close to 0, the trust region radius $\Delta_k$ is reduced. On the other hand, if $$ p_k $$ is approximately equal to 1, the trust region radius $\Delta_k$ is expanded for the next point iteration [1].

Cauchy Point

Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:

(4) $p_k^C =-\tau_k\frac{\Delta_k}{||\nabla f_k||}$

where

$\tau_k= \begin{cases} 1, & \text{if }\nabla f_k^TB_k\nabla f_k\\ min(\frac{||\nabla f_k||^3}{\Delta _k\nabla f_k^TB_k\nabla f_k}, 1), & \text{if otherwise} \end{cases}$

Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about $$ B_k $$ to refine $$ p_k $$ . Consider the following:

$p_B= B_k^{-1} \nabla f_k$

as the full Newton step where $||p_k||\leq \Delta_k$ .

The Dogleg Method

The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:

Trust-region methods

Contents

Introduction

Algorithm Discussion

Numerical example

Applications

Conclusion

References

Navigation menu

Trust-region methods

Introduction

Algorithm Discussion

Numerical example

Applications

Conclusion

References

Navigation menu

Search