Trust-region methods: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
Author: Tung Yen Wang (tw565) (CHEME 6800, Fall 2024)
Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)


Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
Line 5: Line 5:
=Introduction=
=Introduction=


Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions [1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.
Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions<ref>Nocedal, J., & Wright, S. J. (2006). <em>Numerical optimization</em>. Springer.</ref>. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.


The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.  
The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method <ref>Yuan, Y. (2015b). Recent advances in trust region algorithms. <em>Mathematical Programming</em>, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2</ref>. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.  


The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].
The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise <ref>Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). <em>Trust-region methods</em>. Siam.</ref>.


=Methodology and theory=
=Algorithm Discussion=
The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of <math>f</math> around <math>x_k</math>:


The quadratic model function <math>m_k</math> is based on derivative information at <math>x_k</math> and possibly also on information accumulated from previous iterations and steps.
(1) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>


<math>m_{k}(p)=f_{k} +\bigtriangledown f_k^Tp + 1/2p^TB_kp</math>
where <math>p
</math> represents the step or direction vector from the current point, <math>f_k=f(x_k)
</math>, <math>\nabla f_k = \nabla f(x_k)
</math>, and <math>B_k
</math> is a symmetric matrix.


where
The first two terms <math>m_k(p)
</math> are designed to match the first two terms of the Taylor series expansion of <math>f(x_k+p)
</math>. The expression can also be written as:


<math>f_k=f(x_k), \bigtriangledown f_k=\bigtriangledown f(x_k)</math>, and <math>B_k</math> is symmetric matrix.
<math>m_k(p) = f_k + \nabla f_k^T p + O(||p||^2)
</math>


Taylor’s theorem is used as a mathematical tool to study minimizers of smooth functions.
where higher-order terms <math>O(||p||^2)
</math> are approximated using <math>B_k
</math>. It can also be noted that the difference between <math>m_k(p)
</math> and <math>f(x_k+p)
</math> is of order <math>O(||p||^2)
</math>, suggesting that when p is small, the approximation error is minimal.


<math>f(x_k + p)=f_{k} +\bigtriangledown f_k^Tp + 1/2p^T\bigtriangledown ^2f(x_k + tp)p</math>


The first two terms of <math>m_k</math> are assumed to be identical to the first two terms of the Taylor-series expansion.
For each iteration, seek a solution of the subproblem:


<math>m_{k}(p)=f_{k} +\bigtriangledown f_k^Tp + O(||p^2||)</math>
(2) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>&nbsp;&nbsp;&nbsp;<math>s.t. ||p|| \leq \Delta_k
</math>  


The difference between <math>m_k(p)</math> and <math>f (x_k + p)</math> is O. Therefore, when <math>p</math> is small, the approximation error is small.
where <math>\Delta_k
</math> represents the trust region radius and is greater than 0.  


To obtain each step, we seek a solution to the subproblem as shown below.
Here, <math>p_k
</math> represents the step taken at iteration <math>k
</math>, hence the next point calculated can be denoted as <math>x_k+1=x_k+p
</math>. For the moment, <math>|| \cdot ||
</math> is the Euclidean norm.


<math>\min_{p\in\R} m_{k}(p)=f_{k} +\bigtriangledown f_k^Tp + 1/2p^TB_kp</math>


s.t <math>||p^2||\leqq\Delta _k</math>
To solve the subproblem, the trust region radius <math>k
</math> must be determined at each iteration.


The strategies for finding approximate solutions are introduced as follows, which achieve at least as so-called Cauchy point. This point is simply the minimizer of <math>m_k</math> along the steepest descent direction that is subject to the trust region bound.
(3) <math>p_k=\frac{f(x_k) - f(x_k+p_k)}{m_k(0) - m_k(p_k)} = \frac{actual\; reduction}{predicted\; reduction}
</math>


'''Cauchy point calculation'''
If <math>p_k
</math> is negative, the objective value <math>f(x_k+p)
</math> would be greater than the current value <math>f(x_k)
</math>, leading to the rejection of the step. If <math>p_k
</math> is positive but not close to 1, the trust region remains unchanged. If <math>p_k
</math> is positive but close to 0, the trust region radius <math>\Delta_k
</math> is reduced. On the other hand, if <math>p_k
</math>is approximately equal to 1, the trust region radius <math>\Delta_k
</math> is expanded for the next point iteration [1].


Similar to the line search method which does not require optimal step lengths to be convergent, the trust-region method is sufficient for global convergence purposes to find an approximate solution <math>p_k</math> that lies within the trust region. Cauchy step  <math>p_k^c</math> is an inexpensive method( no matrix factorization) to solve trust-region subproblem. Furthermore, the Cauchy point has been valued because it can be globally convergent. Following is a closed-form equation of the Cauchy point.


<math>p_k^c=-\tau _k\frac{\Delta k}{\left \| \bigtriangledown f_k \right \|}\bigtriangledown f_k</math>
===Cauchy Point===
 
Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:
 
(4) <math>p_k^C =-\tau_k\frac{\Delta_k}{||\nabla f_k||}  
</math>


where
where


<math>\displaystyle \tau _k={\begin{cases}1,&{\text{if }}\bigtriangledown f_k^TB_k\bigtriangledown f_k\leq 0\\min\left ( \left \| \bigtriangledown f_k \right \|^{3}/\left ( \bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k \right ),1 \right ),&{\text{otherwise }}\end{cases}}</math>
<math>\tau_k= \begin{cases} 1, & \text{if }\nabla f_k^TB_k\nabla f_k \leq 0\\ min(\frac{||\nabla f_k||^3}{\Delta _k\nabla f_k^TB_k\nabla f_k}, 1), & \text{if otherwise} \end{cases}
</math>
 
 
Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about <math>B_k
</math> to refine <math>p_k
</math>. Consider the following:
 
<math>p_B= -B_k^{-1} \nabla f_k
</math>
 
as the full Newton step where <math>||p_k||\leq \Delta_k
</math>.
 
 
===The Dogleg Method===
 
The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:
 
<math>p^U =-\frac{g^Tg}{g^TB_g}g
</math>
 
where <math>g= \nabla f(x_k)
</math>.
 
The second line runs from <math>p^U
</math> to <math>p^B
</math> for <math>\tau\in [0,2] 
</math>:
 
<math>p(\tau)= \begin{cases} \tau p^U & 0\leq \tau\leq 1\\ p^U+(\tau -1)(p^B-p^U) & 1\leq \tau\leq 2\end{cases}  
</math>
 
where <math>p^B= -B^{-1}f(x_k)
</math>. 
 
The method works well when <math>B_k
</math> is positive definite. However, when <math>B_k
</math> is not convex, <math>p_B
</math> cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative <math>B_k
</math>
 


===The Conjugated Gradient Steihaug’s Method===


Although it is inexpensive to apply the Cauchy point, the steepest descent methods sometimes perform poorly. Thus, we introduce some improving strategies. The improvement strategies is based on <math>B_k</math> where it contains valid curvature information about the function.
In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems <ref>Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In <em>Unitext</em>. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1</ref>. The method is described as follows:


'''Dogleg method'''
given <math>\varepsilon > 0
</math>


This method can be used if  <math>B_k</math> is a positive definite. The dogleg method finds an approximate solution by replacing the curved trajectory
set <math>p_0 = 0, r_0= g, d_0=-g,j=0
</math>;


for <math>p^*\left ( \bigtriangleup  \right )</math> with a path consisting of two line segments. It chooses p to minimize the model m along this path, subject to the trust-region bound.
if <math>||r_0||< \varepsilon
</math>  


First line segments  <math>p^U=-\frac{g^Tg}{g^TBg}g  </math>, where  <math>p^U</math>runs from the origin to the minimizer of m along the steepest descent direction.
:return <math>p_0 = p_j
</math>;


While the second line segment run from <math>p^U</math>to <math>p^B</math>, we donate this trajectory by <math>\tilde{p}\left ( \tau  \right )</math> for <math>\tau \in \left [ 0,2 \right ]</math>
for all <math>j
</math>,  


Then a V-shaped trajectory can be determined by
:if <math>d_j^TBd_j\leq0
</math>


<math>\tilde{p}=\tau p^U  </math>, when <math>0\leq \tau \leq 1  </math>
::determine <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> minimizes (2) and satisfies <math>||p|| = \Delta
</math>


<math>\tilde{p}= p^U+\left (\tau -1 \right )\left ( p^B-p^U \right )  </math>, when <math>1\leq \tau \leq 2  </math>
::return <math>p
</math>;


where  <math>p^B </math>=opitimal solution of quadratic model
:set <math>\alpha _j =\frac{r_j^Tr_j}{d_j^TBd_j}
</math>;


Although the dogleg strategy can be adapted to handle indefinite B, there is not much point in doing so because the full step  <math>p^B</math> is not the unconstrained minimizer of m in this case. Instead, we now describe another strategy, which aims to include directions of negative
:set <math>p_{j+1} = \alpha _jd_j
</math>;


curvature in the space of trust-region steps.
:if <math>||p_{j+1}|| \geq \Delta
</math>;


::determine the positive value of <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> and satisfies <math>||p|| = \Delta
</math>


'''Conjugated Gradient Steihaug’s Method ( CG-Steihaug)'''
::return <math>p
</math>;


This is the most widely used method for the approximate solution of the trust-region problem. The method works for quadratic models <math>m_{k}</math> defined by an arbitrary symmetric <math>B_k</math> . Thus, it is not necessary for <math>B_k</math> to be positive. CG-Steihaug has the advantage of Cauchy point calculation and Dogleg method which is super-linear convergence rate and inexpensive costs.
:set <math>r_{j+1}=r_j+\alpha_jBd_j
</math>;


Given<math>\epsilon > 0  </math>
:if <math>||r_{j+1}|| \leq \varepsilon
</math>  


Set  <math>p_0=0,r_0=g,d_0=-r_0    </math>
::return <math>p=p_{j+1}
</math>;


if<math>\left \| r_0 \right \|< \epsilon  </math>
:set <math>\beta_{j+1}=\frac{r_{j+1}^Tr_{j+1}}{r_{j+1}r_j}
</math>;


return <math>p=p0</math>
:set <math>d_{j+1}= -r_{j+1}+\beta_{j+1}d_j
</math>;


for <math>j=0,1,2,...</math>
end (for)


if <math>d_j^TB_kd_j\leq 0 </math>
The initialization of <math>p_0 = 0
</math> is fundamental, as the Cauchy Point is obtained right after in <math>p_1
</math>; this condition guarantees global convergence. 


Find <math>\tau  </math> such that minimizes <math>m\left ( p \right )</math> and satisfies<math>\left \| p \right \|=\Delta  </math>


return p;
===Termination Criteria===


Set  <math>\alpha _j=r_j^Tr_j/d_j^TB_kd_j</math>
The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence <ref>Dutta, S. (2016). <em>Optimization in Chemical Engineering</em>. Cambridge University Press.</ref>. Fletcher proposes that the algorithm terminates when either of the following are small <ref>Fletcher, R. (1987). <em>Practical Methods of Optimization</em> (2e), John Wiley and Sons.</ref>:


Set  <math> p_{j+1}=p_j+\alpha _jd_j</math>
# <math>f(x_k)-f(x_{k+1})
</math>
# <math>x_k-x_{k+1}
</math>
# <math>\nabla f(x_k)
</math>


if <math>\left \| p_{j+1} \right \|\geq \Delta</math>
These criteria ensure that the algorithm terminates efficiently while maintaining convergence.


Find <math>\tau \geq 0</math> such that <math>p=p_{j}+\tau d_{j}</math> satisfies <math>\left \| p \right \|=\Delta </math>
In addition, termination can be considered when <math>\Delta_k
</math> is small as it is possible for <math>\Delta_k
</math> to converge to 0 when <math>k
</math> approaches infinity.


return p;
=Numerical example=


Set <math>r_{j+1}=r_j+\alpha _jB_kd_j</math>
===Example One===


if <math> \left \| r_{j+1} \right \|< \epsilon \left \| r_{0} \right \|</math>
'''This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:


return <math>p=p_{j+1}</math>
<math>f(x)=(x-2)^2
</math>


Set <math>\beta _{j+1} = r_{j+1}^{T}r_{j+1}/r_j^Tr_j</math>
Our goal is to minimize this function, and clearly, the solution is <math>x=2
</math> where <math>f(2)=0
</math>.


Set <math>d_{j+1}= r_{j+1}+ \beta _{j+1}d_j</math>
The gradient is <math>f'(x)=2(x-2)
</math> and the Hessian is <math>f''(x)=2
</math>.


end(for)


'''Iteration 1:'''


'''Global Convergence'''
To calculate, begin the initial point at <math>x_0=0
</math> and a trust region radius as <math>1.0
</math>. From this initial point, the following is obtained:


To study the convergence of trust region, we have to study how much reduction can we achieve at each
Step 1: evaluate the current point:


iteration (similar to line search method). Thus, we derive an estimate in the following form:
<math>f(x_0)=(0-2)^2=4
</math>


<math>m_k\left ( 0 \right )-m_k\left ( p_k \right )\geq c_1\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup k,\frac{\left \|  \bigtriangledown f_k\right \|}{\left \| B_k \right \|} \right )</math> for <math>c_1\in \left [ 0,1 \right ]</math>
Gradient: <math>f'(x_0)=2(0-2)=-4
</math>


For cauchy point, <math>c_1</math>=0.5
Hessian: <math>2
</math>


that is
Step 2: form the quadratic model


<math>m_k\left ( 0 \right )-m_k\left ( p_k \right )\geq 0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup k,\frac{\left \|  \bigtriangledown f_k\right \|}{\left \| B_k \right \|} \right )</math>
<math>m_0(p) = f(x_0) + \nabla f_0^T p + \frac{1}{2} p^T H_0 = m_0(p) = 4 + (-4) p + \frac{1}{2} p^2(2)=4-4p+p^2
</math>


we first consider the case of <math>\bigtriangledown f_k^TB_k\bigtriangledown f_k\leq 0</math>
Step 3: solve the subproblem


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )\geq m_k\left ( \bigtriangleup _k\bigtriangledown f_k/\left \| \bigtriangledown f_k \right \| \right )</math>
<math>m_0'(p)=-4+2p=0  
</math>        <math>p=2
</math>


<math>=-\frac{\bigtriangleup _k}{\left \| \bigtriangledown f_k \right \|}\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\ \bigtriangledown f_k^TB_k\bigtriangledown f_k</math>
However, <math>p=2
</math> is outside the trust region radius <math>1.0
</math>. Hence, <math>|p|\leq 1
</math> will be utilized to find the best step.


<math>\leq -\bigtriangleup _k\left \| \bigtriangledown f_k \right \|</math>
<math>m_0(1)=4-4(1)+1^2=1
</math>


<math>\leq -\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{B_k} \right )</math>
<math>m_0(-1)=4-4(-1)+(-1)^2=9
</math>


For the next case, consider <math>\bigtriangledown f_k^TB_k\bigtriangledown f_k> 0</math> and
The lower value is <math>p=1
</math>


<math>\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}\leq 1</math>
Step 4: check function reduction


we then have <math>\tau =\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}</math>
Proposed new point: <math>x_1=x_0+p=0+1=1
</math>


so
Actual function at new point: <math>f(1)=(1-2)^2=1
</math>


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )= -\frac{\left \| \bigtriangledown f_k \right \|^4}{\bigtriangledown f_k^TB_k\bigtriangledown f_k}+0.5\bigtriangledown f_k^TB_k\bigtriangledown f_k\frac{\left \| \bigtriangledown f_k \right \|^4}{\left ( \bigtriangledown f_k^TB_k\bigtriangledown f_k \right )^2}</math>
Step 5: compute ratio


<math>=-0.5\frac{\left \| \bigtriangledown f_k \right \|^4}{\bigtriangledown f_k^TB_k\bigtriangledown f_k}</math>
Actual reduction: <math>f(x_0)-f(x_1)=4-1=3
</math>


<math>\leq -0.5\frac{\left \| \bigtriangledown f_k \right \|^4}{\left \| B_k \right \|\left \| \bigtriangledown f_k \right \|^2}</math>
Prediction reduction: <math>f(x_0)-m_0(1)=4-1=3
</math>


<math>=-0.5\frac{\left \| \bigtriangledown f_k \right \|^2}{\left \| B_k \right \|}</math>
Hence <math>p=\frac{3}{3}=1
</math>


<math>\leq -0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{\left \| B_k \right \|} \right )</math>,
Step 6: update trust region and accept/reject step


since <math>\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}\leq 1</math> does not hold, thus
Acceptant criteria for the ratio are typically as follows (may vary):


<math>\bigtriangledown f_k^TB_k\bigtriangledown f_k< \frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k}</math>
If <math>p<0.25
</math>, reduce the trust region radius, if <math>p>0.75
</math>, expand the trust region radius.


From the definition of <math>p_c^k</math> , we have <math>\tau =1</math>, therefore
In the example, <math>p
</math> is greater than <math>p>0.75
</math>, thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to <math>2.0
</math>).


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )= -\frac{\bigtriangleup _k}{\left \| \bigtriangledown f_k \right \|}\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\bigtriangledown f_k^TB_k\bigtriangledown f_k</math>


<math>\leq -\bigtriangleup _k\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k}</math>
'''Iteration 2: '''


<math>=-0.5\bigtriangleup _k\left \| \bigtriangledown f_k \right \|</math>
The point now is at <math>x_1=1
</math> and a radius as <math>2.0
</math>. From this point, the following is obtained:


<math>\leq -0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{\left \| B_k \right \|} \right )</math>
Step 1: evaluate the current point:


<math>f(1)=1
</math>


=Numerical example=
Gradient: <math>f'(1)=2(1-2)=-2
</math>
 
Hessian: <math>2
</math>
 
Step 2: form the quadratic model
 
<math>m_1(p) = f(1) + \nabla f_0^T p + \frac{1}{2} p^TH_1= 1-2p+p^2
</math>
 
Step 3: solve the subproblem
 
<math>m_1'(p)= -2+2p =0
</math>&nbsp;&nbsp;&nbsp;<math>p=1
</math>
 
<math>p\leq 2
</math>, then <math>p=1
</math> is valid
 
Step 4: check function reduction
 
Proposed new point: <math>x_2=x_1+1=2
</math>
 
Actual function at new point: <math>f(2)=(2-2)^2=0
</math>


Step 5: compute ratio


Here we will use the trust-region method to solve a classic optimization problem, the Rosenbrock function. The Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960<ref>H. H. Rosenbrock, An Automatic Method for Finding the Greatest or Least Value of a Function, ''The Computer Journal'', Volume 3, Issue 3, 1960, Pages 175–184, <nowiki>https://doi.org/10.1093/comjnl/3.3.175</nowiki></ref>, which is often used as a performance test problem for optimization algorithms. This problem is solved using MATLAB's <code>fminunc</code> as the solver, with 'trust-region' as the solving algorithm which uses the preconditioned conjugate method.
Actual reduction: <math>f(1)-f(2)=1-0=1
</math>


The function is defined by
Prediction reduction: <math>f(1)-m_1(1)=1-0=1
</math>


<math>\min f(x,y)=100(y-x^2)^2+(1-x)^2</math>
Hence <math>p=\frac{1}{1}=1
</math>


The starting point chosen is <math>x=0</math>  <math>y=0
Step 6: update trust region and accept/reject step
</math>.
[[File:Trust-region method example trajectory.png|thumb|576x576px|Trust-region method trajectory of Rosenbrock function starting from (0,0). The data points represent the optimal solutions after each iteration, ending at iteration number 16 (1,1).]]


In the example, since <math>p
</math> is greater than <math>p>0.75
</math>, the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at <math>x=2
</math>, and therefore terminates.


'''Iteration Process'''
===Example Two - Himmulblau’s Function===


'''Iteration 1:''' The algorithm starts from the initial point of <math>x=0</math>,  <math>y=0
The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research <ref>Himmelblau, D. M. (2018). <em>Applied nonlinear programming</em>. McGraw-Hill.</ref>. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.
</math>. The Rosenbrock function is visualized with a color coded map. For the first iteration, a full step was taken and the optimal solution (<math>x=0.25</math><math>y=0
</math>) within the trust-region is denoted as a red dot.  


The function is defined as the following:


'''Iteration 2:'''  Start with '''<math>x=0.25
<math>f(x,y)=(x^2+y-11)^2+(x+y^2-7)^2
</math>, <math>y=0
</math>'''. The new iteration gives a good prediction, which increases the trust-region's size. The new optimal solution within the trust-region is '''<math>x=0.263177536
</math>,  <math>y=0.061095029
</math>'''.


</math>


'''Iteration 3:'''  Start with '''<math>x=0.263177536
where <math>x
</math><math>y=0.061095029
</math> and <math>y
</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is <math>x=0.371151679
</math> are real numbers.
</math>, <math>y=0.124075855


</math>.
The four global minima of the function are at:


...
# <math>(3.0,2.0)
</math>
# <math>(-2.805, 3.131)
</math>
# <math>(-3.779,-3.283)
</math>
# <math>(3.584,-1.848)


'''Iteration 7:'''  Start with  '''<math>x=0.765122406
</math>
</math>, <math>y=0.560476539
[[File:Trust-Region-Radius.png|300px|right|thumb|The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.]]




</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is '''<math>x=0.804352654
Due to the complexities in the calculations involved in the trust region method algorithm, the '''minimize''' function with the '''<nowiki/>'trust-constr'''' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:


</math>, ''' <math>y=0.645444179
'''minimize(method=’trust-constr’)'''


'''Iteration 1:'''


To calculate, begin the optimization with the initial guess as <math>(-4.0, 0)
</math> and trust region radius as <math>1.0
</math>. The optimal solution achieved from this iteration is <math>(-3.382, -0.786)
</math>, yield a function value of <math>95.453
</math>. The initial point gives a sufficient prediction, thus, increase the trust region radius to <math>7.0
</math>.
</math>.


[[File:Trust-Region-Heatmap.png|300px|right|thumb|The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.]]




'''Iteration 8:''' Start with  '''<math>x=0.804352654
'''Iteration 2:'''
 
</math>,  <math>y=0.645444179


In this iteration, the updated point <math>(-4.0, 0)
</math> and radius <math>7.0
</math> are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to <math>0.7
</math>. The optimal solution obtained from the previous iteration is retained for the next step.


</math>'''.The new iteration gives a poor prediction, therefore the current best solution is unchanged and the radius for the trust region is decreased.


...
'''Iteration 3:'''


At the 16th iteration, the global optimal solution is found, '''<math>x=1
With the updated point <math>(-3.382, -0.786)
</math>, <math>y=1
</math> trust region radius <math>0.7
</math>, a new point at <math>(-3.072, -1.413)
</math> was obtained.


</math>'''.  
The following shows the remaining iterations: <br>
[[File:Iterations.png|500px|Caption]] <br>
The convergence achieved at iteration 11 with <math>x=-3.779,y= -3.283 </math> and <math>f(x) = 0</math>.


{| class="wikitable"
|+Summary of all iterations
!Iterations
!f(x)
!x
!y
!Norm of step
!First-order optimality
|-
|1
|1
|0.25
|0
|1
|2
|-
|2
|0.953125
|0.263178
|0.061095
|0.25
|12.5
|-
|3
|0.549578
|0.371152
|0.124076
|0.0625
|1.63
|-
|4
|0.414158
|0.539493
|0.262714
|0.125
|2.74
|-
|5
|0.292376
|0.608558
|0.365573
|0.218082
|5.67
|-
|6
|0.155502
|0.765122
|0.560477
|0.123894
|0.954
|-
|7
|0.117347
|0.804353
|0.645444
|0.25
|7.16
|-
|8
|0.0385147
|0.804353
|0.645444
|0.093587
|0.308
|-
|9
|0.0385147
|0.836966
|0.69876
|0.284677
|0.308
|-
|10
|0.0268871
|0.90045
|0.806439
|0.0625
|0.351
|-
|11
|0.0118213
|0.953562
|0.90646
|0.125
|1.38
|-
|12
|0.0029522
|0.983251
|0.9659
|0.113247
|0.983
|-
|13
|0.000358233
|0.99749
|0.994783
|0.066442
|0.313
|-
|14
|1.04121e-05
|0.999902
|0.999799
|0.032202
|0.0759
|-
|15
|1.2959e-08
|1
|1
|0.005565
|0.00213
|-
|16
|2.21873e-14
|1
|1
|0.000224
|3.59E-06
|}


=Applications=
=Applications=
'''Approach on Newton methods on Riemannian manifold'''
Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:


Absil et. Al (2007) proposed a trust-region approach for improving the Newton method on the Riemannian manifold<ref>Absil, PA., Baker, C. & Gallivan, K(2007). Trust-Region Methods on Riemannian Manifolds, ''Found Comput Math 7'', Page 303–330, <nowiki>https://doi.org/10.1007/s10208-005-0179-9</nowiki></ref>. The trust-region approach optimizes a smooth function on a Riemannian manifold in three ways. First, the exponential mapping is relaxed to general retractions with a view to reducing computational complexity. Second, a trust region approach is applied for both local and global convergence. Third, the trust-region approach allows early stopping of the inner iteration under criteria that preserve the convergence properties of the overall algorithm.
===Engineering===
Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. <ref>Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. <em>Computer Methods in Applied Mechanics and Engineering</em>, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635</ref> introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.


'''Approach on policy optimization'''
===Finance===
Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. <ref>Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. <em>Proceedings of the International Joint Conference on Neural Networks</em>, 2003. https://doi.org/10.1109/ijcnn.2003.1223354</ref>, trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.


Schulman et. al (2015) proposed trust-region methods for optimizing stochastic control policies and developed a practical algorithm called Trust Region Policy Optimization (TRPO)<ref>Schulman, J., et al. (2015). Trust region policy optimization, ''International conference on machine learning'', <nowiki>http://proceedings.mlr.press/v37/schulman15</nowiki></ref>. The method is scalable and effective for optimizing large nonlinear policies such as neural networks. It can optimize nonlinear policies with tens of thousands of parameters, which is a major challenge for model-free policy search.
===Machine Learning===
Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems <ref>Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. <em>Neural Information Processing Systems</em>, 30, 5279–5288.</ref>.


=Conclusion=
=Conclusion=
The trusted region is a powerful method that can update the objective function in each step to ensure the model is always getting improved while keeping the previously learned knowledge as the baseline. Unlike line search methods, trust region can be used in non-convex approximate models, making such class of iterative methods more reliable, robust and applicable to ill-conditioned problems<ref>Yuan, Y. X. (2000). A review of trust region algorithms for optimization, ''Iciam conference'', <nowiki>https://doi.org/10.4236/blr.2019.104046</nowiki></ref>. Recently, due to its capability to address large-scale problems<ref>Lin, C. J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression, ''Journal of Machine Learning Research'', <nowiki>https://www.jmlr.org/papers/volume9/lin08b/lin08b.pdf</nowiki></ref><ref>Rojas, M., Santos, S. A., & Sorensen, D. C. (2008). MATLAB software for large-scale trust-region subproblems and regularization, ''ACM Transactions on Mathematical Software'', <nowiki>https://doi.org/10.1145/1326548.1326553</nowiki></ref><ref>Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation, ''Advances in neural information processing systems'', <nowiki>https://proceedings.neurips.cc/paper/2017/file/361440528766bbaaaa1901845cf4152b-Paper.pdf</nowiki></ref>, trust region has been paired with several machine-learning topics, including tuning parameter selection<ref>Geminiani, E., Marra, G., & Moustaki, I. (2021). Single-and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection, ''psychometrika'', Page 65-69, <nowiki>https://doi.org/10.1007/s11336-021-09751-8</nowiki></ref>, ridge function<ref>Gross, J. C., Seshadri, P., & Parks, G. (2020). Optimisation with intrinsic dimension reduction: A ridge informed trust-region method, ''AIAA Scitech 2020 Forum'', <nowiki>https://doi.org/10.2514/6.2020-0157</nowiki></ref>, reinforcement learning<ref>Kuba, J. G., Chen, R., Wen, M., Wen, Y., Sun, F., Wang, J., & Yang, Y. (2021). Trust region policy optimisation in multi-agent reinforcement learning, ''arXiv preprint arXiv:2109.11251'', <nowiki>https://arxiv.org/pdf/2109.11251.pdf</nowiki></ref>, etc., to develop more robust numerical algorithms. It is believed that the trust region method will have more far-reaching development in a wider range of fields in the near future.
Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence <ref>Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. <em>Applied Mathematics and Optimization</em>, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6</ref>. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling <ref>Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. <em>Journal of Machine Learning Research</em>, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703</ref>, and developing more efficient techniques to handle constraints and uncertainties in real-world applications.
 
=References=
=References=
<references />

Latest revision as of 03:25, 13 December 2024

Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions[1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.

The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.

The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].

Algorithm Discussion

The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of around :

(1)

where represents the step or direction vector from the current point, , , and is a symmetric matrix.

The first two terms are designed to match the first two terms of the Taylor series expansion of . The expression can also be written as:

where higher-order terms are approximated using . It can also be noted that the difference between and is of order , suggesting that when p is small, the approximation error is minimal.


For each iteration, seek a solution of the subproblem:

(2)    

where represents the trust region radius and is greater than 0.

Here, represents the step taken at iteration , hence the next point calculated can be denoted as . For the moment, is the Euclidean norm.


To solve the subproblem, the trust region radius must be determined at each iteration.

(3)

If is negative, the objective value would be greater than the current value , leading to the rejection of the step. If is positive but not close to 1, the trust region remains unchanged. If  is positive but close to 0, the trust region radius is reduced. On the other hand, if is approximately equal to 1, the trust region radius is expanded for the next point iteration [1].


Cauchy Point

Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:

(4)

where


Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about to refine . Consider the following:

as the full Newton step where .


The Dogleg Method

The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:

where .

The second line runs from to for :

where .

The method works well when is positive definite. However, when is not convex, cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative .


The Conjugated Gradient Steihaug’s Method

In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems [4]. The method is described as follows:

given

set ;

if

return ;

for all ,

if
determine such that minimizes (2) and satisfies
return ;
set ;
set ;
if ;
determine the positive value of such that and satisfies
return ;
set ;
if
return ;
set ;
set ;

end (for)

The initialization of is fundamental, as the Cauchy Point is obtained right after in ; this condition guarantees global convergence.


Termination Criteria

The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence [5]. Fletcher proposes that the algorithm terminates when either of the following are small [6]:

These criteria ensure that the algorithm terminates efficiently while maintaining convergence.

In addition, termination can be considered when is small as it is possible for to converge to 0 when approaches infinity.

Numerical example

Example One

This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:

Our goal is to minimize this function, and clearly, the solution is where .

The gradient is and the Hessian is .


Iteration 1:

To calculate, begin the initial point at and a trust region radius as . From this initial point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

However, is outside the trust region radius . Hence, will be utilized to find the best step.

The lower value is

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

Acceptant criteria for the ratio are typically as follows (may vary):

If , reduce the trust region radius, if , expand the trust region radius.

In the example, is greater than , thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to ).


Iteration 2:

The point now is at and a radius as . From this point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

   

, then is valid

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

In the example, since is greater than , the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at , and therefore terminates.

Example Two - Himmulblau’s Function

The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research [7]. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.

The function is defined as the following:

where and are real numbers.

The four global minima of the function are at:

The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.


Due to the complexities in the calculations involved in the trust region method algorithm, the minimize function with the 'trust-constr' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:

minimize(method=’trust-constr’)

Iteration 1:

To calculate, begin the optimization with the initial guess as and trust region radius as . The optimal solution achieved from this iteration is , yield a function value of . The initial point gives a sufficient prediction, thus, increase the trust region radius to .

The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.


Iteration 2:

In this iteration, the updated point and radius are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to . The optimal solution obtained from the previous iteration is retained for the next step.


Iteration 3:

With the updated point trust region radius , a new point at was obtained.

The following shows the remaining iterations:
Caption
The convergence achieved at iteration 11 with and .


Applications

Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:

Engineering

Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. [8] introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.

Finance

Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. [9], trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.

Machine Learning

Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems [10].

Conclusion

Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence [11]. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling [12], and developing more efficient techniques to handle constraints and uncertainties in real-world applications.

References

  1. Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer.
  2. Yuan, Y. (2015b). Recent advances in trust region algorithms. Mathematical Programming, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2
  3. Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust-region methods. Siam.
  4. Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In Unitext. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1
  5. Dutta, S. (2016). Optimization in Chemical Engineering. Cambridge University Press.
  6. Fletcher, R. (1987). Practical Methods of Optimization (2e), John Wiley and Sons.
  7. Himmelblau, D. M. (2018). Applied nonlinear programming. McGraw-Hill.
  8. Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. Computer Methods in Applied Mechanics and Engineering, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635
  9. Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. Proceedings of the International Joint Conference on Neural Networks, 2003. https://doi.org/10.1109/ijcnn.2003.1223354
  10. Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. Neural Information Processing Systems, 30, 5279–5288.
  11. Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. Applied Mathematics and Optimization, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6
  12. Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703