Trust-region methods: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
No edit summary
 
(33 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Autor: Chun-Yu Chou, Ting-Guang Yeh, Yun-Chung Pan, Chen-Hua Wang (CHEME 6800, Fall 2021)
Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)
 
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
 
=Introduction=
=Introduction=


Trust region method is a numerical optimization method that is employed to solve non-linear programming (NLP) problem. Instead of finding objective solution of the original function f, in each step the method defines a neighborhood around current best solution xk as a trust region f’ (typically by using quadratic model), which is capable of representing the f function appropriately, in order to derive the next point xk+1. Different from line search, the model selects the direction and step size simultaneously. For example, in a minimization problem, if the decrease in the value of optimal solution is not sufficient, we can conclude that the region is too large to get close to the minimizer of the objective function, so we will shrink the f’ to find the solution again. On the other hand, if such decrease is significant, it is believed that the model has adequate representation to the problem. Generally, the step direction depends on extent that the region is altered in the previous iteration.
Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions<ref>Nocedal, J., & Wright, S. J. (2006). <em>Numerical optimization</em>. Springer.</ref>. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.
=Methodology and theory=
 
'''Cauchy point calculation'''
The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method <ref>Yuan, Y. (2015b). Recent advances in trust region algorithms. <em>Mathematical Programming</em>, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2</ref>. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.  
 
The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise <ref>Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). <em>Trust-region methods</em>. Siam.</ref>.
 
=Algorithm Discussion=
The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of <math>f</math> around <math>x_k</math>:
 
(1) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>
 
where <math>p
</math> represents the step or direction vector from the current point, <math>f_k=f(x_k)
</math>, <math>\nabla f_k = \nabla f(x_k)
</math>, and <math>B_k
</math> is a symmetric matrix.
 
The first two terms <math>m_k(p)
</math> are designed to match the first two terms of the Taylor series expansion of <math>f(x_k+p)
</math>. The expression can also be written as:
 
<math>m_k(p) = f_k + \nabla f_k^T p + O(||p||^2)
</math>
 
where higher-order terms <math>O(||p||^2)
</math> are approximated using <math>B_k
</math>. It can also be noted that the difference between <math>m_k(p)
</math> and <math>f(x_k+p)
</math> is of order <math>O(||p||^2)
</math>, suggesting that when p is small, the approximation error is minimal.
 
 
For each iteration, seek a solution of the subproblem:
 
(2) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>&nbsp;&nbsp;&nbsp;<math>s.t. ||p|| \leq \Delta_k
</math>
 
where <math>\Delta_k
</math> represents the trust region radius and is greater than 0.
 
Here, <math>p_k
</math> represents the step taken at iteration <math>k
</math>, hence the next point calculated can be denoted as <math>x_k+1=x_k+p
</math>. For the moment, <math>|| \cdot ||
</math> is the Euclidean norm.
 
 
To solve the subproblem, the trust region radius <math>k
</math> must be determined at each iteration.
 
(3) <math>p_k=\frac{f(x_k) - f(x_k+p_k)}{m_k(0) - m_k(p_k)} = \frac{actual\; reduction}{predicted\; reduction}
</math>
 
If <math>p_k
</math> is negative, the objective value <math>f(x_k+p)
</math> would be greater than the current value <math>f(x_k)
</math>, leading to the rejection of the step. If <math>p_k
</math> is positive but not close to 1, the trust region remains unchanged. If <math>p_k
</math> is positive but close to 0, the trust region radius <math>\Delta_k
</math> is reduced. On the other hand, if <math>p_k
</math>is approximately equal to 1, the trust region radius <math>\Delta_k
</math> is expanded for the next point iteration [1].
 
 
===Cauchy Point===
 
Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:
 
(4) <math>p_k^C =-\tau_k\frac{\Delta_k}{||\nabla f_k||}
</math>
 
where
 
<math>\tau_k= \begin{cases} 1, & \text{if }\nabla f_k^TB_k\nabla f_k \leq 0\\ min(\frac{||\nabla f_k||^3}{\Delta _k\nabla f_k^TB_k\nabla f_k}, 1), & \text{if otherwise} \end{cases}
</math>


Similar to line serach method which do not require optimal step lengths to be convergent, trust-region method is suffficient for global convergence purpose to find an approximate solution pk that lies within trust region.


<math>p_{c}^{K}=-\tau _{k}\frac{\Delta k}{\left \| \bigtriangledown f_{k} \right \|}\bigtriangledown f_{k}</math>
Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about <math>B_k
</math> to refine <math>p_k
</math>. Consider the following:


if <math>\bigtriangledown f_{k}^{T}B_{k}\bigtriangledown f_{k}\leq 0</math>, <math>\tau _{k}=1</math>
<math>p_B= -B_k^{-1} \nabla f_k
</math>


otherewise, <math>\tau _{k}= min\left ( \left \| \bigtriangledown f_{k} \right \|^{3}/\left ( \bigtriangleup _{k}\bigtriangledown f_{k}^{T}B_{k}\bigtriangledown f_{k} \right ),1 \right )  </math>
as the full Newton step where <math>||p_k||\leq \Delta_k
</math>.




===The Dogleg Method===


'''Improving on the couchy point'''
The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:


The reason we improving on the couchy point is that we just implementing the steepest decent method with a particular choice of step length by using cauchy point. However, steepest decent performs poorly even if an optimial step length is used at every iteration.
<math>p^U =-\frac{g^Tg}{g^TB_g}g
</math>


<math>min m(p)=f+g^{T}p+\frac{1}{2}p^{T}B_{p}  </math>, <math>p\in R^{n}  </math> s.t. <math>\left \| p \right \|\leq \Delta  </math>
where <math>g= \nabla f(x_k)
</math>.  


The second line runs from <math>p^U
</math> to <math>p^B
</math> for <math>\tau\in [0,2] 
</math>:


<math>p(\tau)= \begin{cases} \tau p^U & 0\leq \tau\leq 1\\ p^U+(\tau -1)(p^B-p^U) & 1\leq \tau\leq 2\end{cases}
</math>


'''Dogleg method'''
where <math>p^B= -B^{-1}f(x_k)
</math>. 


The dogleg method chooses p to minimize the model m along this path.Furthermore,gogled method is limited to
The method works well when <math>B_k
</math> is positive definite. However, when <math>B_k
</math> is not convex, <math>p_B
</math> cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative <math>B_k
</math>. 


the trust-region bound. Since the dogleg path interacts the trust region at most one points and intersectin can be computed anlytically, there is no need to conduct a search.


===The Conjugated Gradient Steihaug’s Method===


In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems <ref>Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In <em>Unitext</em>. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1</ref>. The method is described as follows:


<math>\tilde{p}=\tau p^{U}  </math>, when <math>0\leq \tau \leq 1  </math>
given <math>\varepsilon > 0  
</math>  


<math>\tilde{p}= p^{U}+\left (\tau -1 \right )\left ( p^{B}-p^{U} \right )  </math>, when <math>1\leq \tau \leq 2  </math>
set <math>p_0 = 0, r_0= g, d_0=-g,j=0
</math>;


where <math>p^{U}=-\frac{g^{T}g}{g^{T}Bg}g  </math>,  <math>p^{B}  </math>=opitimal solution of quadratic model
if <math>||r_0||< \varepsilon
</math>  


:return <math>p_0 = p_j
</math>;


for all <math>j
</math>,


'''Steihau's approach'''
:if <math>d_j^TBd_j\leq0
</math>


Given<math>\epsilon > </math>
::determine <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> minimizes (2) and satisfies <math>||p|| = \Delta
</math>


Set  <math>p_{0}=0,r_{0}=g,d_{0}=-r_{0}    </math>
::return <math>p
</math>;


if<math>\left \| r_{0} \right \|< \epsilon  </math>
:set <math>\alpha _j =\frac{r_j^Tr_j}{d_j^TBd_j}
</math>;


return p=p0
:set <math>p_{j+1} = \alpha _jd_j
</math>;


for j=0,1,2,.....
:if <math>||p_{j+1}|| \geq \Delta
</math>;


if <math>d_{j}^{T}Bd_{j}\leq 0  </math>
::determine the positive value of <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> and satisfies <math>||p|| = \Delta
</math>


Find <math>\tau  </math>such that minimizes m(p) and satisfies<math>\left \| p \right \|=\Delta  </math>
::return <math>p
</math>;


return p;
:set <math>r_{j+1}=r_j+\alpha_jBd_j
</math>;  


Set  <math>\alpha _{j}=r_{j}^{T}r_{j}/d_{j}^{T}Bd_{j}</math>
:if <math>||r_{j+1}|| \leq \varepsilon
</math>  


Set  <math> p_{j+1}=p_{j}+\alpha _{j}d_{j}</math>
::return <math>p=p_{j+1}
</math>;


if <math>\left \| p_{j+1} \right \|\geq \Delta</math>
:set <math>\beta_{j+1}=\frac{r_{j+1}^Tr_{j+1}}{r_{j+1}r_j}
</math>;


Find <math>\tau \geq 0</math> such that <math>p=p_{j}+\tau d_{j}</math> satisfies <math>\left \| p \right \|=\Delta </math>
:set <math>d_{j+1}= -r_{j+1}+\beta_{j+1}d_j
</math>;


return p;
end (for)


Set <math>r_{j+1}=r_{j}+\alpha _{j}Bd_{j}</math>
The initialization of <math>p_0 = 0
</math> is fundamental, as the Cauchy Point is obtained right after in <math>p_1
</math>; this condition guarantees global convergence. 


if <math> \left \| r_{j+1} \right \|< \epsilon \left \| r_{0} \right \|</math>


return <math>p=p_{j+1}</math>
===Termination Criteria===


Set <math>\beta _{j+1} = r_{j+1}^{T}r_{j+1}/r_{j}^{T}r_{j}</math>
The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence <ref>Dutta, S. (2016). <em>Optimization in Chemical Engineering</em>. Cambridge University Press.</ref>. Fletcher proposes that the algorithm terminates when either of the following are small <ref>Fletcher, R. (1987). <em>Practical Methods of Optimization</em> (2e), John Wiley and Sons.</ref>:


Set <math>d_{j+1}= r_{j+1}+ \beta _{j+1}d_{j}</math>
# <math>f(x_k)-f(x_{k+1})
</math>
# <math>x_k-x_{k+1}
</math>
# <math>\nabla f(x_k)
</math>


end(for)
These criteria ensure that the algorithm terminates efficiently while maintaining convergence.
 
In addition, termination can be considered when <math>\Delta_k
</math> is small as it is possible for <math>\Delta_k
</math> to converge to 0 when <math>k
</math> approaches infinity.


=Numerical example=
=Numerical example=
Here we will use the trust-region method to solve a classic optimization problem, the Rosenbrock function. The Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960<ref>H. H. Rosenbrock, An Automatic Method for Finding the Greatest or Least Value of a Function, ''The Computer Journal'', Volume 3, Issue 3, 1960, Pages 175–184, <nowiki>https://doi.org/10.1093/comjnl/3.3.175</nowiki></ref>, which is often used as a performance test problem for optimization algorithms. This problem is solved using MATLAB's <code>fminunc</code> as the solver, with 'trust-region' as the solving algorithm which uses the preconditioned conjugate method.


The function is defined by
===Example One===


<math>\min f(x,y)=100(y-x^2)^2+(1-x)^2</math>
'''This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:


The starting point chosen is <math>x=0</math> <math>y=0
<math>f(x)=(x-2)^2
</math>
 
Our goal is to minimize this function, and clearly, the solution is <math>x=2
</math> where <math>f(2)=0  
</math>.
</math>.
[[File:Trust-region method example trajectory.png|thumb|576x576px|Trust-region method trajectory of Rosenbrock function starting from (0,0). The data points represent the optimal solutions after each iteration, ending at iteration number 16 (1,1).]]


The gradient is <math>f'(x)=2(x-2)
</math> and the Hessian is <math>f''(x)=2
</math>.


'''Iteration Process'''


'''Iteration 1:''' The algorithm starts from the initial point of  <math>x=0</math>,  <math>y=0
'''Iteration 1:'''
</math>. The Rosenbrock function is visualized with a color coded map. For the first iteration, a full step was taken and the optimal solution (<math>x=0.25</math>,  <math>y=0
</math>) within the trust-region is denoted as a red dot.


To calculate, begin the initial point at <math>x_0=0
</math> and a trust region radius as <math>1.0
</math>. From this initial point, the following is obtained:


'''Iteration 2:'''  Start with '''<math>x=0.25
Step 1: evaluate the current point:
</math>,  <math>y=0
</math>'''. The new iteration gives a good prediction, which increases the trust-region's size. The new optimal solution within the trust-region is '''<math>x=0.263177536
</math>,  <math>y=0.061095029
</math>'''.


<math>f(x_0)=(0-2)^2=4
</math>


'''Iteration 3:'''  Start with '''<math>x=0.263177536
Gradient: <math>f'(x_0)=2(0-2)=-4
</math>,  <math>y=0.061095029
</math>
</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is <math>x=0.371151679
</math>, <math>y=0.124075855


</math>.
Hessian: <math>2
</math>
 
Step 2: form the quadratic model
 
<math>m_0(p) = f(x_0)  + \nabla f_0^T p + \frac{1}{2} p^T H_0 = m_0(p) = 4 + (-4) p + \frac{1}{2} p^2(2)=4-4p+p^2
</math>
 
Step 3: solve the subproblem
 
<math>m_0'(p)=-4+2p=0
</math>        <math>p=2
</math>
 
However, <math>p=2
</math> is outside the trust region radius <math>1.0
</math>. Hence, <math>|p|\leq 1
</math> will be utilized to find the best step.
 
<math>m_0(1)=4-4(1)+1^2=1
</math>
 
<math>m_0(-1)=4-4(-1)+(-1)^2=9
</math>
 
The lower value is <math>p=1
</math>
 
Step 4: check function reduction
 
Proposed new point: <math>x_1=x_0+p=0+1=1
</math>
 
Actual function at new point: <math>f(1)=(1-2)^2=1
</math>
 
Step 5: compute ratio
 
Actual reduction: <math>f(x_0)-f(x_1)=4-1=3
</math>
 
Prediction reduction: <math>f(x_0)-m_0(1)=4-1=3
</math>
 
Hence <math>p=\frac{3}{3}=1
</math>
 
Step 6: update trust region and accept/reject step
 
Acceptant criteria for the ratio are typically as follows (may vary):
 
If <math>p<0.25
</math>, reduce the trust region radius, if <math>p>0.75
</math>, expand the trust region radius.
 
In the example, <math>p
</math> is greater than <math>p>0.75
</math>, thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to <math>2.0
</math>).
 
 
'''Iteration 2: '''
 
The point now is at <math>x_1=1
</math> and a radius as <math>2.0
</math>. From this point, the following is obtained:
 
Step 1: evaluate the current point:
 
<math>f(1)=1
</math>
 
Gradient: <math>f'(1)=2(1-2)=-2
</math>
 
Hessian: <math>2
</math>
 
Step 2: form the quadratic model
 
<math>m_1(p) = f(1) + \nabla f_0^T p + \frac{1}{2} p^TH_1= 1-2p+p^2
</math>
 
Step 3: solve the subproblem
 
<math>m_1'(p)= -2+2p =0
</math>&nbsp;&nbsp;&nbsp;<math>p=1
</math>
 
<math>p\leq 2
</math>, then <math>p=1
</math> is valid
 
Step 4: check function reduction
 
Proposed new point: <math>x_2=x_1+1=2
</math>
 
Actual function at new point: <math>f(2)=(2-2)^2=0
</math>
 
Step 5: compute ratio
 
Actual reduction: <math>f(1)-f(2)=1-0=1
</math>
 
Prediction reduction: <math>f(1)-m_1(1)=1-0=1
</math>


...
Hence <math>p=\frac{1}{1}=1
</math>


'''Iteration 7:'''  Start with  '''<math>x=0.765122406
Step 6: update trust region and accept/reject step
</math>,  <math>y=0.560476539


In the example, since <math>p
</math> is greater than <math>p>0.75
</math>, the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at <math>x=2
</math>, and therefore terminates.


</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is '''<math>x=0.804352654
===Example Two - Himmulblau’s Function===


</math>, ''' <math>y=0.645444179
The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research <ref>Himmelblau, D. M. (2018). <em>Applied nonlinear programming</em>. McGraw-Hill.</ref>. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.


The function is defined as the following:


</math>.
<math>f(x,y)=(x^2+y-11)^2+(x+y^2-7)^2


</math>


where <math>x
</math> and <math>y
</math> are real numbers.


'''Iteration 8:'''  Start with  '''<math>x=0.804352654
The four global minima of the function are at:


</math>, <math>y=0.645444179
# <math>(3.0,2.0)
</math>
# <math>(-2.805, 3.131)
</math>
# <math>(-3.779,-3.283)
</math>
# <math>(3.584,-1.848)


</math>
[[File:Trust-Region-Radius.png|300px|right|thumb|The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.]]


</math>'''.The new iteration gives a poor prediction, therefore current best solution is unchanged and the radius for the trust-region is decreased.


...
Due to the complexities in the calculations involved in the trust region method algorithm, the '''minimize''' function with the '''<nowiki/>'trust-constr'''' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:


At the 16th iteration, the global optimal solution is found, '''<math>x=1
'''minimize(method=’trust-constr’)'''
</math>,  <math>y=1


</math>'''.
'''Iteration 1:'''


{| class="wikitable"
To calculate, begin the optimization with the initial guess as <math>(-4.0, 0)
|+Summary of all iterations
</math> and trust region radius as <math>1.0
!Iterations
</math>. The optimal solution achieved from this iteration is <math>(-3.382, -0.786)
!f(x)
</math>, yield a function value of <math>95.453
!x
</math>. The initial point gives a sufficient prediction, thus, increase the trust region radius to <math>7.0
!y
</math>.
!Norm of step
!First-order optimality
|-
|1
|1
|0.25
|0
|1
|2
|-
|2
|0.953125
|0.263178
|0.061095
|0.25
|12.5
|-
|3
|0.549578
|0.371152
|0.124076
|0.0625
|1.63
|-
|4
|0.414158
|0.539493
|0.262714
|0.125
|2.74
|-
|5
|0.292376
|0.608558
|0.365573
|0.218082
|5.67
|-
|6
|0.155502
|0.765122
|0.560477
|0.123894
|0.954
|-
|7
|0.117347
|0.804353
|0.645444
|0.25
|7.16
|-
|8
|0.0385147
|0.804353
|0.645444
|0.093587
|0.308
|-
|9
|0.0385147
|0.836966
|0.69876
|0.284677
|0.308
|-
|10
|0.0268871
|0.90045
|0.806439
|0.0625
|0.351
|-
|11
|0.0118213
|0.953562
|0.90646
|0.125
|1.38
|-
|12
|0.0029522
|0.983251
|0.9659
|0.113247
|0.983
|-
|13
|0.000358233
|0.99749
|0.994783
|0.066442
|0.313
|-
|14
|1.04121e-05
|0.999902
|0.999799
|0.032202
|0.0759
|-
|15
|1.2959e-08
|1
|1
|0.005565
|0.00213
|-
|16
|2.21873e-14
|1
|1
|0.000224
|3.59E-06
|}


=Applications=
[[File:Trust-Region-Heatmap.png|300px|right|thumb|The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.]]
'''Approaches for a convex model problem'''


* Levenberg–Marquardt Steps


The objective function is iteratively approximated by a quadratic surface, and the estimate is updated. However, this method may fail to converge if the initial guess is too large or 
'''Iteration 2:'''


too small. The trust region methods help to control the trust region size, preventing each step from being too far from the optimum.
In this iteration, the updated point <math>(-4.0, 0)
</math> and radius <math>7.0
</math> are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to <math>0.7
</math>. The optimal solution obtained from the previous iteration is retained for the next step.


* Powell Dogleg Steps


Compared to Levenberg–Marquardt steps, the objective function is iteratively approximated by a linear surface, and the estimate is updated as well. Besides, the Levenberg–
'''Iteration 3:'''


Marquardt steps provide an exact solution to the model problem, while the dogleg method provides only an approximate solution.
With the updated point <math>(-3.382, -0.786)
</math> trust region radius <math>0.7
</math>, a new point at <math>(-3.072, -1.413)
</math> was obtained.


'''Approaches for a nonconvex model problem'''
The following shows the remaining iterations: <br>
[[File:Iterations.png|500px|Caption]] <br>
The convergence achieved at iteration 11 with <math>x=-3.779,y= -3.283 </math> and <math>f(x) = 0</math>.


If the exact Hessian is not positive definite, then a dogleg method that relies on Newton steps may be ascent steps or unbounded steps. Furthermore, the Levenberg–Marquardt method may not be flexible to exploit the size of the trust region. However, the trust region method can used to iteratively minimize a quadratic model of the Lagrangian subject to a possibly relaxed linearization of the problem constraints and a trust region constraint.


=Conclusion=
=Applications=
Trusted region is a powerful method that can update the objective function in each step to ensure the model is always getting improved while keeping the previously learnt knowledge as the baseline. Nowadays, trust region algorithms are widely used in machine learning, applied mathematics, physics, chemistry, biology, etc. It is believed that the trust region method will have more far-reaching development in a wider range of fields in the near future.
Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:


=References=
===Engineering===
[1] J. Nocedal, S. J. Wright, ''Numerical Optimization''. Springer, 1999.
Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. <ref>Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. <em>Computer Methods in Applied Mechanics and Engineering</em>, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635</ref> introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.


[2] W. Sun and Y.-x. Yuan, Optimization theory and methods : nonlinear programming. New York: Springer, 2006.
===Finance===
Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. <ref>Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. <em>Proceedings of the International Joint Conference on Neural Networks</em>, 2003. https://doi.org/10.1109/ijcnn.2003.1223354</ref>, trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.  


[3] S. Boyd, L. Vandenberghe, Convex Optimization. Cambridge University Press, 2009
===Machine Learning===
Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems <ref>Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. <em>Neural Information Processing Systems</em>, 30, 5279–5288.</ref>.


[4] Trust region. (2020). Retrieved November 10, 2021, from <nowiki>https://en.wikipedia.org/wiki/Trust_region</nowiki>.<references />
=Conclusion=
Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence <ref>Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. <em>Applied Mathematics and Optimization</em>, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6</ref>. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling <ref>Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. <em>Journal of Machine Learning Research</em>, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703</ref>, and developing more efficient techniques to handle constraints and uncertainties in real-world applications.
=References=

Latest revision as of 02:25, 13 December 2024

Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions[1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.

The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.

The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].

Algorithm Discussion

The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of around :

(1)

where represents the step or direction vector from the current point, , , and is a symmetric matrix.

The first two terms are designed to match the first two terms of the Taylor series expansion of . The expression can also be written as:

where higher-order terms are approximated using . It can also be noted that the difference between and is of order , suggesting that when p is small, the approximation error is minimal.


For each iteration, seek a solution of the subproblem:

(2)    

where represents the trust region radius and is greater than 0.

Here, represents the step taken at iteration , hence the next point calculated can be denoted as . For the moment, is the Euclidean norm.


To solve the subproblem, the trust region radius must be determined at each iteration.

(3)

If is negative, the objective value would be greater than the current value , leading to the rejection of the step. If is positive but not close to 1, the trust region remains unchanged. If  is positive but close to 0, the trust region radius is reduced. On the other hand, if is approximately equal to 1, the trust region radius is expanded for the next point iteration [1].


Cauchy Point

Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:

(4)

where


Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about to refine . Consider the following:

as the full Newton step where .


The Dogleg Method

The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:

where .

The second line runs from to for :

where .

The method works well when is positive definite. However, when is not convex, cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative .


The Conjugated Gradient Steihaug’s Method

In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems [4]. The method is described as follows:

given

set ;

if

return ;

for all ,

if
determine such that minimizes (2) and satisfies
return ;
set ;
set ;
if ;
determine the positive value of such that and satisfies
return ;
set ;
if
return ;
set ;
set ;

end (for)

The initialization of is fundamental, as the Cauchy Point is obtained right after in ; this condition guarantees global convergence.


Termination Criteria

The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence [5]. Fletcher proposes that the algorithm terminates when either of the following are small [6]:

These criteria ensure that the algorithm terminates efficiently while maintaining convergence.

In addition, termination can be considered when is small as it is possible for to converge to 0 when approaches infinity.

Numerical example

Example One

This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:

Our goal is to minimize this function, and clearly, the solution is where .

The gradient is and the Hessian is .


Iteration 1:

To calculate, begin the initial point at and a trust region radius as . From this initial point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

However, is outside the trust region radius . Hence, will be utilized to find the best step.

The lower value is

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

Acceptant criteria for the ratio are typically as follows (may vary):

If , reduce the trust region radius, if , expand the trust region radius.

In the example, is greater than , thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to ).


Iteration 2:

The point now is at and a radius as . From this point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

   

, then is valid

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

In the example, since is greater than , the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at , and therefore terminates.

Example Two - Himmulblau’s Function

The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research [7]. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.

The function is defined as the following:

where and are real numbers.

The four global minima of the function are at:

The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.


Due to the complexities in the calculations involved in the trust region method algorithm, the minimize function with the 'trust-constr' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:

minimize(method=’trust-constr’)

Iteration 1:

To calculate, begin the optimization with the initial guess as and trust region radius as . The optimal solution achieved from this iteration is , yield a function value of . The initial point gives a sufficient prediction, thus, increase the trust region radius to .

The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.


Iteration 2:

In this iteration, the updated point and radius are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to . The optimal solution obtained from the previous iteration is retained for the next step.


Iteration 3:

With the updated point trust region radius , a new point at was obtained.

The following shows the remaining iterations:
Caption
The convergence achieved at iteration 11 with and .


Applications

Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:

Engineering

Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. [8] introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.

Finance

Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. [9], trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.

Machine Learning

Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems [10].

Conclusion

Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence [11]. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling [12], and developing more efficient techniques to handle constraints and uncertainties in real-world applications.

References

  1. Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer.
  2. Yuan, Y. (2015b). Recent advances in trust region algorithms. Mathematical Programming, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2
  3. Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust-region methods. Siam.
  4. Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In Unitext. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1
  5. Dutta, S. (2016). Optimization in Chemical Engineering. Cambridge University Press.
  6. Fletcher, R. (1987). Practical Methods of Optimization (2e), John Wiley and Sons.
  7. Himmelblau, D. M. (2018). Applied nonlinear programming. McGraw-Hill.
  8. Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. Computer Methods in Applied Mechanics and Engineering, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635
  9. Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. Proceedings of the International Joint Conference on Neural Networks, 2003. https://doi.org/10.1109/ijcnn.2003.1223354
  10. Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. Neural Information Processing Systems, 30, 5279–5288.
  11. Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. Applied Mathematics and Optimization, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6
  12. Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703