Trust-region methods: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
(Edit on application)
No edit summary
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Autor: Chun-Yu Chou, Ting-Guang Yeh, Yun-Chung Pan, Chen-Hua Wang (CHEME 6800, Fall 2021)
Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)
 
Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu
 
=Introduction=
=Introduction=


Trust region method is a numerical optimization method that is employed to solve non-linear programming (NLP) problem. Instead of finding objective solution of the original function f, in each step the method defines a neighborhood around current best solution xk as a trust region f’ (typically by using quadratic model), which is capable of representing the f function appropriately, in order to derive the next point xk+1. Different from line search, the model selects the direction and step size simultaneously. For example, in a minimization problem, if the decrease in the value of optimal solution is not sufficient, we can conclude that the region is too large to get close to the minimizer of the objective function, so we will shrink the f’ to find the solution again. On the other hand, if such decrease is significant, it is believed that the model has adequate representation to the problem. Generally, the step direction depends on extent that the region is altered in the previous iteration.
Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions<ref>Nocedal, J., & Wright, S. J. (2006). <em>Numerical optimization</em>. Springer.</ref>. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.
=Methodology and theory=
 
'''Cauchy point calculation'''
The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method <ref>Yuan, Y. (2015b). Recent advances in trust region algorithms. <em>Mathematical Programming</em>, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2</ref>. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.
 
The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise <ref>Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). <em>Trust-region methods</em>. Siam.</ref>.
 
=Algorithm Discussion=
The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of <math>f</math> around <math>x_k</math>:
 
(1) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>
 
where <math>p
</math> represents the step or direction vector from the current point, <math>f_k=f(x_k)
</math>, <math>\nabla f_k = \nabla f(x_k)
</math>, and <math>B_k
</math> is a symmetric matrix.
 
The first two terms <math>m_k(p)
</math> are designed to match the first two terms of the Taylor series expansion of <math>f(x_k+p)
</math>. The expression can also be written as:
 
<math>m_k(p) = f_k + \nabla f_k^T p + O(||p||^2)
</math>
 
where higher-order terms <math>O(||p||^2)
</math> are approximated using <math>B_k
</math>. It can also be noted that the difference between <math>m_k(p)
</math> and <math>f(x_k+p)
</math> is of order <math>O(||p||^2)
</math>, suggesting that when p is small, the approximation error is minimal.
 
 
For each iteration, seek a solution of the subproblem:
 
(2) <math>m_k(p) = f_k + \nabla f_k^T p + \frac{1}{2} p^T B_k p
</math>&nbsp;&nbsp;&nbsp;<math>s.t. ||p|| \leq \Delta_k
</math>
 
where <math>\Delta_k
</math> represents the trust region radius and is greater than 0.
 
Here, <math>p_k
</math> represents the step taken at iteration <math>k
</math>, hence the next point calculated can be denoted as <math>x_k+1=x_k+p
</math>. For the moment, <math>|| \cdot ||
</math> is the Euclidean norm.
 
 
To solve the subproblem, the trust region radius <math>k
</math> must be determined at each iteration.
 
(3) <math>p_k=\frac{f(x_k) - f(x_k+p_k)}{m_k(0) - m_k(p_k)} = \frac{actual\; reduction}{predicted\; reduction}
</math>
 
If <math>p_k
</math> is negative, the objective value <math>f(x_k+p)
</math> would be greater than the current value <math>f(x_k)
</math>, leading to the rejection of the step. If <math>p_k
</math> is positive but not close to 1, the trust region remains unchanged. If <math>p_k
</math> is positive but close to 0, the trust region radius <math>\Delta_k
</math> is reduced. On the other hand, if <math>p_k
</math>is approximately equal to 1, the trust region radius <math>\Delta_k
</math> is expanded for the next point iteration [1].
 
 
===Cauchy Point===
 
Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:
 
(4) <math>p_k^C =-\tau_k\frac{\Delta_k}{||\nabla f_k||}
</math>
 
where
 
<math>\tau_k= \begin{cases} 1, & \text{if }\nabla f_k^TB_k\nabla f_k \leq 0\\ min(\frac{||\nabla f_k||^3}{\Delta _k\nabla f_k^TB_k\nabla f_k}, 1), & \text{if otherwise} \end{cases}
</math>
 
 
Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about <math>B_k
</math> to refine <math>p_k
</math>. Consider the following:
 
<math>p_B= -B_k^{-1} \nabla f_k
</math>
 
as the full Newton step where <math>||p_k||\leq \Delta_k
</math>.
 
 
===The Dogleg Method===
 
The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:
 
<math>p^U =-\frac{g^Tg}{g^TB_g}g
</math>
 
where <math>g= \nabla f(x_k)
</math>.


Similar to line serach method which do not require optimal step lengths to be convergent, trust-region method is suffficient for global convergence purpose to find an approximate solution pk that lies within trust region. Cauchy step  <math>P_{K}^{C}</math> is an unexpensive method( no matrix factorization) to solve trust-region subproblem. Furthermore, Cauchy point has been valued due to the fact that it can globally convergent. Following is a closed-form equations of the Cauchy point.
The second line runs from <math>p^U
</math> to <math>p^B
</math> for <math>\tau\in [0,2] 
</math>:


<math>p_k^c=-\tau _k\frac{\Delta k}{\left \| \ g_k \right \|}\ g_k</math>
<math>p(\tau)= \begin{cases} \tau p^U & 0\leq \tau\leq 1\\ p^U+(\tau -1)(p^B-p^U) & 1\leq \tau\leq 2\end{cases}  
</math>  


if <math>g_k^TB_kg_k\leq 0</math>, <math>\tau _k=1</math>
where <math>p^B= -B^{-1}f(x_k)
</math>


otherewise, <math>\tau _k= min\left ( \left \| \bigtriangledown g_k \right \|^{3}/\left ( \bigtriangleup _k\bigtriangledown g_k^TB_kg_k \right ),1 \right )  </math>
The method works well when <math>B_k
</math> is positive definite. However, when <math>B_k
</math> is not convex, <math>p_B
</math> cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative <math>B_k
</math>




Although it is unexpensive to apply Cauchy point, steepest descent methods sometimes performs poorly. Thus, we introduce some improving strategy. The improvement strategies is based on <math>B_k</math> where it contains valid curvature information about the function.
===The Conjugated Gradient Steihaug’s Method===


'''Dogleg method'''
In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems <ref>Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In <em>Unitext</em>. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1</ref>. The method is described as follows:


This method can be used if  <math>B_k</math> is a positive definite. The dogleg method finds an approximate solution by replacing the curved trajectory
given <math>\varepsilon > 0
</math>  


for <math>p^*\left ( \bigtriangleup  \right )</math> with a path consisting of two line segments. It chooses p to minimize the model m along this path, subject to the trust-region bound.
set <math>p_0 = 0, r_0= g, d_0=-g,j=0
</math>;


First line segments  <math>p^U=-\frac{g^Tg}{g^TBg}g  </math>, where  <math>p^U</math>runs from the origin to the minimizer of m along the steepest descent direction.
if <math>||r_0||< \varepsilon
</math>  


While the second line segment run from <math>p^U</math>to <math>p^B</math>, we donate this trajectory by <math>\tilde{p}\left ( \tau  \right )</math> for <math>\tau \in \left [ 0,2 \right ]</math>
:return <math>p_0 = p_j
</math>;


Then a V-shaped trajectory can be determined by
for all <math>j
</math>,


<math>\tilde{p}=\tau p^U  </math>, when <math>0\leq \tau \leq 1  </math>
:if <math>d_j^TBd_j\leq0
</math>  


<math>\tilde{p}= p^U+\left (\tau -1 \right )\left ( p^B-p^U \right )  </math>, when <math>1\leq \tau \leq 2  </math>
::determine <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> minimizes (2) and satisfies <math>||p|| = \Delta
</math>


where  <math>p^B </math>=opitimal solution of quadratic model
::return <math>p
</math>;


Although the dogleg strategy can be adapted to handle indefinite B, there is not much point in doing so because the full step  <math>p^B</math> is not the unconstrained minimizer of m in this case. Instead, we now describe another strategy, which aims to include directions of negative
:set <math>\alpha _j =\frac{r_j^Tr_j}{d_j^TBd_j}
</math>;


curvature  in the space of  trust-region steps.
:set <math>p_{j+1} = \alpha _jd_j
</math>;


:if <math>||p_{j+1}|| \geq \Delta
</math>;


'''Conjugated Gradient Steihaug’s Method ( CG-Steihaug)'''
::determine the positive value of <math>\tau
</math> such that <math>p=p_j+\tau d_j
</math> and satisfies <math>||p|| = \Delta
</math>


This is the most widely used method for the approximate solution of the trust-region problem. The method works for quadratic models <math>m_{k}</math> defined by an arbitrary symmetric <math>B_k</math> . Thus, it is not necessary for <math>B_k</math> to be positive. CG-Steihaug has the advantage of Cauchy point calculation and Dogleg method which is super-linear convergence rate and unexpensive costs .
::return <math>p
</math>;


Given<math>\epsilon > 0  </math>
:set <math>r_{j+1}=r_j+\alpha_jBd_j
</math>;


Set  <math>p_0=0,r_0=g,d_0=-r_0    </math>
:if <math>||r_{j+1}|| \leq \varepsilon
</math>  


if<math>\left \| r_0 \right \|< \epsilon  </math>
::return <math>p=p_{j+1}
</math>;


return <math>p=p0</math>
:set <math>\beta_{j+1}=\frac{r_{j+1}^Tr_{j+1}}{r_{j+1}r_j}
</math>;


for <math>j=0,1,2,...</math>
:set <math>d_{j+1}= -r_{j+1}+\beta_{j+1}d_j
</math>;


if <math>d_j^TB_kd_j\leq 0  </math>
end (for)


Find <math>\tau  </math> such that minimizes <math>m\left ( p \right )</math> and satisfies<math>\left \| p \right \|=\Delta  </math>
The initialization of <math>p_0 = 0
</math> is fundamental, as the Cauchy Point is obtained right after in <math>p_1
</math>; this condition guarantees global convergence. 


return p;


Set  <math>\alpha _j=r_j^Tr_j/d_j^TB_kd_j</math>
===Termination Criteria===


Set  <math> p_{j+1}=p_j+\alpha _jd_j</math>
The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence <ref>Dutta, S. (2016). <em>Optimization in Chemical Engineering</em>. Cambridge University Press.</ref>. Fletcher proposes that the algorithm terminates when either of the following are small <ref>Fletcher, R. (1987). <em>Practical Methods of Optimization</em> (2e), John Wiley and Sons.</ref>:


if <math>\left \| p_{j+1} \right \|\geq \Delta</math>
# <math>f(x_k)-f(x_{k+1})
</math>
# <math>x_k-x_{k+1}
</math>
# <math>\nabla f(x_k)
</math>


Find <math>\tau \geq 0</math> such that <math>p=p_{j}+\tau d_{j}</math> satisfies <math>\left \| p \right \|=\Delta </math>
These criteria ensure that the algorithm terminates efficiently while maintaining convergence.


return p;
In addition, termination can be considered when <math>\Delta_k
</math> is small as it is possible for <math>\Delta_k
</math> to converge to 0 when <math>k
</math> approaches infinity.


Set <math>r_{j+1}=r_j+\alpha _jB_kd_j</math>
=Numerical example=


if <math> \left \| r_{j+1} \right \|< \epsilon \left \| r_{0} \right \|</math>
===Example One===


return <math>p=p_{j+1}</math>
'''This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:


Set <math>\beta _{j+1} = r_{j+1}^{T}r_{j+1}/r_j^Tr_j</math>
<math>f(x)=(x-2)^2
</math>


Set <math>d_{j+1}= r_{j+1}+ \beta _{j+1}d_j</math>
Our goal is to minimize this function, and clearly, the solution is <math>x=2
</math> where <math>f(2)=0
</math>.


end(for)
The gradient is <math>f'(x)=2(x-2)
</math> and the Hessian is <math>f''(x)=2
</math>.




'''Global Convergence'''
'''Iteration 1:'''


To study the convergence of trust region, we have to study how much reduction can we achieve at each
To calculate, begin the initial point at <math>x_0=0
</math> and a trust region radius as <math>1.0
</math>. From this initial point, the following is obtained:


iteration (similar to line search method). Thus, we derive an estimate in the following form:
Step 1: evaluate the current point:


<math>m_k\left ( 0 \right )-m_k\left ( p_k \right )\geq c_1\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup k,\frac{\left \|  \bigtriangledown f_k\right \|}{\left \| B_k \right \|} \right )</math> for <math>c_1\in \left [ 0,1 \right ]</math>
<math>f(x_0)=(0-2)^2=4
</math>


For cauchy point, <math>c_1</math>=0.5
Gradient: <math>f'(x_0)=2(0-2)=-4
</math>


that is
Hessian: <math>2
</math>


<math>m_k\left ( 0 \right )-m_k\left ( p_k \right )\geq 0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup k,\frac{\left \|  \bigtriangledown f_k\right \|}{\left \| B_k \right \|} \right )</math>
Step 2: form the quadratic model


we first consider the case of <math>\bigtriangledown f_k^TB_k\bigtriangledown f_k\leq 0</math>
<math>m_0(p) = f(x_0)  + \nabla f_0^T p + \frac{1}{2} p^T H_0 = m_0(p) = 4 + (-4) p + \frac{1}{2} p^2(2)=4-4p+p^2
</math>


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )\geq m_k\left ( \bigtriangleup _k\bigtriangledown f_k/\left \| \bigtriangledown f_k \right \| \right )</math>
Step 3: solve the subproblem


<math>=-\frac{\bigtriangleup _k}{\left \| \bigtriangledown f_k \right \|}\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\ \bigtriangledown f_k^TB_k\bigtriangledown f_k</math>
<math>m_0'(p)=-4+2p=0  
</math>        <math>p=2
</math>


<math>\leq -\bigtriangleup _k\left \| \bigtriangledown f_k \right \|</math>
However, <math>p=2
</math> is outside the trust region radius <math>1.0
</math>. Hence, <math>|p|\leq 1
</math> will be utilized to find the best step.


<math>\leq -\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{B_k} \right )</math>
<math>m_0(1)=4-4(1)+1^2=1
</math>


For the next case, consider <math>\bigtriangledown f_k^TB_k\bigtriangledown f_k> 0</math> and
<math>m_0(-1)=4-4(-1)+(-1)^2=9
</math>


<math>\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}\leq 1</math>
The lower value is <math>p=1
</math>


we then have <math>\tau =\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}</math>
Step 4: check function reduction


so
Proposed new point: <math>x_1=x_0+p=0+1=1
</math>


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )= -\frac{\left \| \bigtriangledown f_k \right \|^4}{\bigtriangledown f_k^TB_k\bigtriangledown f_k}+0.5\bigtriangledown f_k^TB_k\bigtriangledown f_k\frac{\left \| \bigtriangledown f_k \right \|^4}{\left ( \bigtriangledown f_k^TB_k\bigtriangledown f_k \right )^2}</math>
Actual function at new point: <math>f(1)=(1-2)^2=1
</math>


<math>=-0.5\frac{\left \| \bigtriangledown f_k \right \|^4}{\bigtriangledown f_k^TB_k\bigtriangledown f_k}</math>
Step 5: compute ratio


<math>\leq -0.5\frac{\left \| \bigtriangledown f_k \right \|^4}{\left \| B_k \right \|\left \| \bigtriangledown f_k \right \|^2}</math>
Actual reduction: <math>f(x_0)-f(x_1)=4-1=3
</math>


<math>=-0.5\frac{\left \| \bigtriangledown f_k \right \|^2}{\left \| B_k \right \|}</math>
Prediction reduction: <math>f(x_0)-m_0(1)=4-1=3
</math>


<math>\leq -0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{\left \| B_k \right \|} \right )</math>,
Hence <math>p=\frac{3}{3}=1
</math>


since <math>\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k\bigtriangledown f_k^TB_k\bigtriangledown f_k}\leq 1</math> does not hold, thus
Step 6: update trust region and accept/reject step


<math>\bigtriangledown f_k^TB_k\bigtriangledown f_k< \frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k}</math>
Acceptant criteria for the ratio are typically as follows (may vary):


From the definition of <math>p_c^k</math> , we have <math>\tau =1</math>, therefore
If <math>p<0.25
</math>, reduce the trust region radius, if <math>p>0.75
</math>, expand the trust region radius.


<math>m_k\left ( p_k^c \right )-m_k\left ( 0 \right )= -\frac{\bigtriangleup _k}{\left \| \bigtriangledown f_k \right \|}\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\bigtriangledown f_k^TB_k\bigtriangledown f_k</math>
In the example, <math>p
</math> is greater than <math>p>0.75
</math>, thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to <math>2.0
</math>).


<math>\leq -\bigtriangleup _k\left \| \bigtriangledown f_k \right \|^2+0.5\frac{\bigtriangleup _k^2}{\left \| \bigtriangledown f_k \right \|^2}\frac{\left \| \bigtriangledown f_k \right \|^3}{\bigtriangleup _k}</math>


<math>=-0.5\bigtriangleup _k\left \| \bigtriangledown f_k \right \|</math>
'''Iteration 2: '''


<math>\leq -0.5\left \| \bigtriangledown f_k \right \|min\left ( \bigtriangleup _k,\frac{\left \| \bigtriangledown f_k \right \|}{\left \| B_k \right \|} \right )</math>
The point now is at <math>x_1=1
</math> and a radius as <math>2.0
</math>. From this point, the following is obtained:


Step 1: evaluate the current point:


=Numerical example=
<math>f(1)=1
</math>
 
Gradient: <math>f'(1)=2(1-2)=-2
</math>
 
Hessian: <math>2
</math>
 
Step 2: form the quadratic model
 
<math>m_1(p) = f(1) + \nabla f_0^T p + \frac{1}{2} p^TH_1= 1-2p+p^2
</math>
 
Step 3: solve the subproblem


<math>m_1'(p)= -2+2p =0
</math>&nbsp;&nbsp;&nbsp;<math>p=1
</math>


Here we will use the trust-region method to solve a classic optimization problem, the Rosenbrock function. The Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960<ref>H. H. Rosenbrock, An Automatic Method for Finding the Greatest or Least Value of a Function, ''The Computer Journal'', Volume 3, Issue 3, 1960, Pages 175–184, <nowiki>https://doi.org/10.1093/comjnl/3.3.175</nowiki></ref>, which is often used as a performance test problem for optimization algorithms. This problem is solved using MATLAB's <code>fminunc</code> as the solver, with 'trust-region' as the solving algorithm which uses the preconditioned conjugate method.
<math>p\leq 2
</math>, then <math>p=1
</math> is valid


The function is defined by
Step 4: check function reduction


<math>\min f(x,y)=100(y-x^2)^2+(1-x)^2</math>
Proposed new point: <math>x_2=x_1+1=2
</math>


The starting point chosen is <math>x=0</math>  <math>y=0
Actual function at new point: <math>f(2)=(2-2)^2=0  
</math>.
</math>
[[File:Trust-region method example trajectory.png|thumb|576x576px|Trust-region method trajectory of Rosenbrock function starting from (0,0). The data points represent the optimal solutions after each iteration, ending at iteration number 16 (1,1).]]


Step 5: compute ratio


'''Iteration Process'''
Actual reduction: <math>f(1)-f(2)=1-0=1
</math>


'''Iteration 1:''' The algorithm starts from the initial point of  <math>x=0</math>,  <math>y=0
Prediction reduction: <math>f(1)-m_1(1)=1-0=1
</math>. The Rosenbrock function is visualized with a color coded map. For the first iteration, a full step was taken and the optimal solution (<math>x=0.25</math>,  <math>y=0
</math>
</math>) within the trust-region is denoted as a red dot.


Hence <math>p=\frac{1}{1}=1
</math>


'''Iteration 2:'''  Start with '''<math>x=0.25
Step 6: update trust region and accept/reject step
</math>,  <math>y=0
</math>'''. The new iteration gives a good prediction, which increases the trust-region's size. The new optimal solution within the trust-region is '''<math>x=0.263177536
</math>,  <math>y=0.061095029
</math>'''.


In the example, since <math>p
</math> is greater than <math>p>0.75
</math>, the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at <math>x=2
</math>, and therefore terminates.


'''Iteration 3:'''  Start with '''<math>x=0.263177536
===Example Two - Himmulblau’s Function===
</math>,  <math>y=0.061095029
</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is <math>x=0.371151679
</math>, <math>y=0.124075855


</math>.
The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research <ref>Himmelblau, D. M. (2018). <em>Applied nonlinear programming</em>. McGraw-Hill.</ref>. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.


...
The function is defined as the following:


'''Iteration 7:'''  Start with  '''<math>x=0.765122406
<math>f(x,y)=(x^2+y-11)^2+(x+y^2-7)^2
</math>, <math>y=0.560476539


</math>


</math>'''. The new iteration gives a poor prediction, which decreases the trust-region's size to improve the model's validity. The new optimal solution within the trust-region is '''<math>x=0.804352654
where <math>x
</math> and <math>y
</math> are real numbers.


</math>, ''' <math>y=0.645444179
The four global minima of the function are at:


# <math>(3.0,2.0)
</math>
# <math>(-2.805, 3.131)
</math>
# <math>(-3.779,-3.283)
</math>
# <math>(3.584,-1.848)


</math>.
</math>
[[File:Trust-Region-Radius.png|300px|right|thumb|The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.]]




Due to the complexities in the calculations involved in the trust region method algorithm, the '''minimize''' function with the '''<nowiki/>'trust-constr'''' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:


'''Iteration 8:''' Start with  '''<math>x=0.804352654
'''minimize(method=’trust-constr’)'''


</math>,  <math>y=0.645444179
'''Iteration 1:'''


To calculate, begin the optimization with the initial guess as <math>(-4.0, 0)
</math> and trust region radius as <math>1.0
</math>. The optimal solution achieved from this iteration is <math>(-3.382, -0.786)
</math>, yield a function value of <math>95.453
</math>. The initial point gives a sufficient prediction, thus, increase the trust region radius to <math>7.0
</math>.


</math>'''.The new iteration gives a poor prediction, therefore current best solution is unchanged and the radius for the trust-region is decreased.
[[File:Trust-Region-Heatmap.png|300px|right|thumb|The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.]]


...


At the 16th iteration, the global optimal solution is found, '''<math>x=1
'''Iteration 2:'''
</math>,  <math>y=1


</math>'''.  
In this iteration, the updated point <math>(-4.0, 0)
</math> and radius <math>7.0
</math> are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to <math>0.7
</math>. The optimal solution obtained from the previous iteration is retained for the next step.


{| class="wikitable"
|+Summary of all iterations
!Iterations
!f(x)
!x
!y
!Norm of step
!First-order optimality
|-
|1
|1
|0.25
|0
|1
|2
|-
|2
|0.953125
|0.263178
|0.061095
|0.25
|12.5
|-
|3
|0.549578
|0.371152
|0.124076
|0.0625
|1.63
|-
|4
|0.414158
|0.539493
|0.262714
|0.125
|2.74
|-
|5
|0.292376
|0.608558
|0.365573
|0.218082
|5.67
|-
|6
|0.155502
|0.765122
|0.560477
|0.123894
|0.954
|-
|7
|0.117347
|0.804353
|0.645444
|0.25
|7.16
|-
|8
|0.0385147
|0.804353
|0.645444
|0.093587
|0.308
|-
|9
|0.0385147
|0.836966
|0.69876
|0.284677
|0.308
|-
|10
|0.0268871
|0.90045
|0.806439
|0.0625
|0.351
|-
|11
|0.0118213
|0.953562
|0.90646
|0.125
|1.38
|-
|12
|0.0029522
|0.983251
|0.9659
|0.113247
|0.983
|-
|13
|0.000358233
|0.99749
|0.994783
|0.066442
|0.313
|-
|14
|1.04121e-05
|0.999902
|0.999799
|0.032202
|0.0759
|-
|15
|1.2959e-08
|1
|1
|0.005565
|0.00213
|-
|16
|2.21873e-14
|1
|1
|0.000224
|3.59E-06
|}


=Applications=
'''Iteration 3:'''
'''Approach on Newton methods on Riemannian manifold'''


Absil et. Al (2007) proposed a trust-region approach for improving the Newton method on the Riemannian manifold<ref>Absil, PA., Baker, C. & Gallivan, K(2007). Trust-Region Methods on Riemannian Manifolds, ''Found Comput Math 7'', Page 303–330, <nowiki>https://doi.org/10.1007/s10208-005-0179-9</nowiki></ref>. The trust-region approach optimizes a smooth function on a Riemannian manifold in three ways. First, the exponential mapping is relaxed to general retractions with a view to reducing computational complexity. Second, a trust region approach is applied for both local and global convergence. Third, the trust-region approach allows early stopping of the inner iteration under criteria that preserve the convergence properties of the overall algorithm.
With the updated point <math>(-3.382, -0.786)
</math> trust region radius <math>0.7
</math>, a new point at <math>(-3.072, -1.413)
</math> was obtained.


'''Approach on policy optimization'''
The following shows the remaining iterations: <br>
[[File:Iterations.png|500px|Caption]] <br>
The convergence achieved at iteration 11 with <math>x=-3.779,y= -3.283 </math> and <math>f(x) = 0</math>.


Schulman et. al (2015) proposed trust-region methods for optimizing stochastic control policies and developed a practical algorithm called Trust Region Policy Optimization (TRPO)<ref>Schulman, J., et al. (2015). Trust region policy optimization, ''International conference on machine learning'', <nowiki>http://proceedings.mlr.press/v37/schulman15</nowiki></ref>. The method is scalable and effective for optimizing large nonlinear policies such as neural networks. It can optimize nonlinear policies with tens of thousands of parameters, which is a major challenge for model-free policy search.


=Conclusion=
=Applications=
Trusted region is a powerful method that can update the objective function in each step to ensure the model is always getting improved while keeping the previously learnt knowledge as the baseline. Nowadays, trust region algorithms are widely used in machine learning, applied mathematics, physics, chemistry, biology, etc. It is believed that the trust region method will have more far-reaching development in a wider range of fields in the near future.
Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:


=References=
===Engineering===
[1] J. Nocedal, S. J. Wright, ''Numerical Optimization''. Springer, 1999.
Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. <ref>Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. <em>Computer Methods in Applied Mechanics and Engineering</em>, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635</ref> introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.


[2] W. Sun and Y.-x. Yuan, Optimization theory and methods : nonlinear programming. New York: Springer, 2006.
===Finance===
Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. <ref>Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. <em>Proceedings of the International Joint Conference on Neural Networks</em>, 2003. https://doi.org/10.1109/ijcnn.2003.1223354</ref>, trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.  


[3] S. Boyd, L. Vandenberghe, Convex Optimization. Cambridge University Press, 2009
===Machine Learning===
Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems <ref>Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. <em>Neural Information Processing Systems</em>, 30, 5279–5288.</ref>.


[4] Trust region. (2020). Retrieved November 10, 2021, from <nowiki>https://en.wikipedia.org/wiki/Trust_region</nowiki>.<references />
=Conclusion=
Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence <ref>Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. <em>Applied Mathematics and Optimization</em>, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6</ref>. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling <ref>Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. <em>Journal of Machine Learning Research</em>, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703</ref>, and developing more efficient techniques to handle constraints and uncertainties in real-world applications.
=References=

Latest revision as of 02:25, 13 December 2024

Author: Tung-Yen Wang (tw565) (CHEME 6800, Fall 2024)

Stewards: Nathan Preuss, Wei-Han Chen, Tianqi Xiao, Guoqing Hu

Introduction

Trust region methods are iterative optimization techniques designed to find local minima or maxima of objective functions, particularly in nonlinear problems (NLP), by iteratively refining approximations within dynamically adjusted trust regions[1]. Unlike line search methods, which determine the step size along a predefined direction, trust region methods concurrently optimize both the direction and the magnitude of the step within a specified neighborhood around the current iterate.

The origins of these methods date back to Levenberg, who introduced a modified version of the Gauss-Newton method for solving nonlinear least squares problems by adding a damping term to control the step size. This approach was later rediscovered and refined by Morrison and Marquardt, ultimately gaining prominence as the Levenberg-Morrison-Marquardt method [2]. Subsequent advancements led to the development and standardization of trust region methods, further improving their robustness and application.

The central core of trust region methods lies in the idea of constructing a simplified model — often a quadratic approximation — that represents the objective function near the current point. This model serves as a surrogate, guiding the search for the optimum within a bounded region around the current estimate. The size of the trust region is dynamically adjusted based on how well the model predicts the actual behavior of the objective function. If the model is predicting accurate behavior of the objective function, the size of the trust region is increased, allowing the algorithm to explore a larger area around the current solution; the trust region is reduced if otherwise [3].

Algorithm Discussion

The mathematical framework involved in trust region methods starts with the construction of a local approximation of the objective function and defining a constrained region within which this model is trusted to be accurate. Consider the following local quadratic approximation of around :

(1)

where represents the step or direction vector from the current point, , , and is a symmetric matrix.

The first two terms are designed to match the first two terms of the Taylor series expansion of . The expression can also be written as:

where higher-order terms are approximated using . It can also be noted that the difference between and is of order , suggesting that when p is small, the approximation error is minimal.


For each iteration, seek a solution of the subproblem:

(2)    

where represents the trust region radius and is greater than 0.

Here, represents the step taken at iteration , hence the next point calculated can be denoted as . For the moment, is the Euclidean norm.


To solve the subproblem, the trust region radius must be determined at each iteration.

(3)

If is negative, the objective value would be greater than the current value , leading to the rejection of the step. If is positive but not close to 1, the trust region remains unchanged. If  is positive but close to 0, the trust region radius is reduced. On the other hand, if is approximately equal to 1, the trust region radius is expanded for the next point iteration [1].


Cauchy Point

Identical to line search, there is no need to compute the subproblem (2) to achieve optimal minimum as the model improves itself with each iteration, meaning that as long as the step lies in the trust region and provides sufficient reduction, the algorithm can converge to the best solution. This sufficient reduction can be expressed as the Cauchy Point, that is:

(4)

where


Computations often start with this as it is simple to calculate. To achieve better performance, however, it is necessary to integrate information about to refine . Consider the following:

as the full Newton step where .


The Dogleg Method

The full Newton step, however, is not feasible when the trust region constraint is active. To handle such cases, the Dogleg method is used to approximate the subproblem (2). This method finds a solution that considers two line segments. The first line is the steepest descent direction:

where .

The second line runs from to for :

where .

The method works well when is positive definite. However, when is not convex, cannot guaranteed to be a meaningful solution since the method is not well-suited to handle negative .


The Conjugated Gradient Steihaug’s Method

In contrast to the methods above, Steihaug’s method is designed to better handle large-scale trust region subproblems. It mitigates the limitations of slower convergence and the need for a positive definite Hessian. In addition, it handles negative curvature by enabling early termination when such directions are detected or when the trust region boundary is reached. These features make the method more scalable and robust, suited for large-scale or poorly conditioned optimization problems [4]. The method is described as follows:

given

set ;

if

return ;

for all ,

if
determine such that minimizes (2) and satisfies
return ;
set ;
set ;
if ;
determine the positive value of such that and satisfies
return ;
set ;
if
return ;
set ;
set ;

end (for)

The initialization of is fundamental, as the Cauchy Point is obtained right after in ; this condition guarantees global convergence.


Termination Criteria

The convergence of a trust region algorithm (1) depends on the accuracy of the approximate solution to the subproblem (2). It is desirable that the subproblem is approximated rather than solved exactly, provided the approximation guarantees the algorithm’s convergence [5]. Fletcher proposes that the algorithm terminates when either of the following are small [6]:

These criteria ensure that the algorithm terminates efficiently while maintaining convergence.

In addition, termination can be considered when is small as it is possible for to converge to 0 when approaches infinity.

Numerical example

Example One

This section introduces trust region methods and demonstrates their application using a simple function as a numerical example. Consider the following:

Our goal is to minimize this function, and clearly, the solution is where .

The gradient is and the Hessian is .


Iteration 1:

To calculate, begin the initial point at and a trust region radius as . From this initial point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

However, is outside the trust region radius . Hence, will be utilized to find the best step.

The lower value is

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

Acceptant criteria for the ratio are typically as follows (may vary):

If , reduce the trust region radius, if , expand the trust region radius.

In the example, is greater than , thus the trust region should be expanded and the step is accepted. Suppose the trust region radius is doubled (i.e. to ).


Iteration 2:

The point now is at and a radius as . From this point, the following is obtained:

Step 1: evaluate the current point:

Gradient:

Hessian:

Step 2: form the quadratic model

Step 3: solve the subproblem

   

, then is valid

Step 4: check function reduction

Proposed new point:

Actual function at new point:

Step 5: compute ratio

Actual reduction:

Prediction reduction:

Hence

Step 6: update trust region and accept/reject step

In the example, since is greater than , the proposed step is accepted. By taking this step, the problem reaches the exact minimum is reached at , and therefore terminates.

Example Two - Himmulblau’s Function

The following demonstrates a more complex example of trust region methods. Himmulblau’s function is a well-known mathematical function used extensively in optimization testing and research [7]. It is defined as two real variables and is characterized by having multiple global minima. The function is non-convex and multimodal, making it an excellent candidate for demonstrating the effectiveness of optimization algorithms. The following uses Steihaug’s method.

The function is defined as the following:

where and are real numbers.

The four global minima of the function are at:

The figure illustrates how the trust region radius adjusts throughout the optimization process, reflecting the optimizer’s confidence in the model’s predictions.


Due to the complexities in the calculations involved in the trust region method algorithm, the minimize function with the 'trust-constr' method from the SciPy library in Python can be utilized to solve this problem. Consider the following implementation:

minimize(method=’trust-constr’)

Iteration 1:

To calculate, begin the optimization with the initial guess as and trust region radius as . The optimal solution achieved from this iteration is , yield a function value of . The initial point gives a sufficient prediction, thus, increase the trust region radius to .

The figure shows the trajectory of the optimizer from the initial guess to the found minimum over the contour plot of Hummuelblau’s function.


Iteration 2:

In this iteration, the updated point and radius are used to find a new step. However, an insufficient prediction was obtained, leading to a decrease in the trust region radius to . The optimal solution obtained from the previous iteration is retained for the next step.


Iteration 3:

With the updated point trust region radius , a new point at was obtained.

The following shows the remaining iterations:
Caption
The convergence achieved at iteration 11 with and .


Applications

Trust region methods are applied in optimization problems across various domains due to their robustness, adaptability, and efficiency in handling complex constraints. Their ability to balance local and global optimization strategies makes them invaluable in solving high-dimensional, nonlinear, and constrained problems. These application areas include:

Engineering

Trust region methods are widely used in engineering, particularly in structural optimization, to improve the stability and efficiency of design processes. These methods address challenges in high-dimensional design problems by iteratively refining solutions with defined trust regions,making them effective for complex objectives such as optimization of material distribution, minimizing stress, or improving flow geometries. For instance, Yano et al. [8] introduce a globally convergent method to accelerate topology optimization with trust region frameworks. This approach significantly reduces computational costs while maintaining accuracy in optimal design solutions.

Finance

Trust region methods are also increasingly applied in the financial sector as these methods ensure stable and reliable convergence even when dealing with complex, non-linear and volatile financial data. For example, in the study by Phua et al. [9], trust region methods were employed during the optimization of neural network training. This approach significantly improved the forecasting accuracy of major stick indices like DAX and NASDAQ by effectively managing high-dimensional and noisy financial datasets.

Machine Learning

Algorithms like ACKTR (Actor-Critic using Kronecker-Factored Trust Region) use trust regions to improve training stabilities and sample efficiencies in deep reinforcement learning. Specifically, ACKTR uses an approximation of the natural gradient to maintain a trust region during updates, preventing large and unstable parameter changes. This benefits facilitation of applications in robotics and autonomous systems [10].

Conclusion

Trust region methods provide a robust and efficient framework for solving optimization problems through the construction of a region around the current solution where an accurate simplified model approximates the objective function. This approach ensures convergence, even for nonlinear and complex problems encountered in fields such as engineering. The algorithm’s advantage lies in its adaptability, dynamically adjusting the trust region based on the accuracy of the model’s predictions, thereby ensuring global and, in some cases, superlinear convergence [11]. Future research directions may involve improving the scalability of trust region methods for high-dimensional problems, integrating them with machine learning frameworks for better predictive modeling [12], and developing more efficient techniques to handle constraints and uncertainties in real-world applications.

References

  1. Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer.
  2. Yuan, Y. (2015b). Recent advances in trust region algorithms. Mathematical Programming, 151(1), 249–281. https://doi.org/10.1007/s10107-015-0893-2
  3. Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust-region methods. Siam.
  4. Grippo, L., & Sciandrone, M. (2023). Introduction to Methods for Nonlinear Optimization. In Unitext. Springer International Publishing. https://doi.org/10.1007/978-3-031-26790-1
  5. Dutta, S. (2016). Optimization in Chemical Engineering. Cambridge University Press.
  6. Fletcher, R. (1987). Practical Methods of Optimization (2e), John Wiley and Sons.
  7. Himmelblau, D. M. (2018). Applied nonlinear programming. McGraw-Hill.
  8. Yano, M., Huang, T., & Zahr, M. J. (2021). A globally convergent method to accelerate topology optimization using on-the-fly model reduction. Computer Methods in Applied Mechanics and Engineering, 375, 113635–113635. https://doi.org/10.1016/j.cma.2020.113635
  9. Phua, P. K. H., Xiaotian Zhu, & Chung Haur Koh. (n.d.). Forecasting stock index increments using neural networks with trust region methods. Proceedings of the International Joint Conference on Neural Networks, 2003. https://doi.org/10.1109/ijcnn.2003.1223354
  10. Wu, Y., Elman Mansimov, Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. Neural Information Processing Systems, 30, 5279–5288.
  11. Zhang, J., Wu, L., & Zhang, X. (2007). A Trust Region Method for Optimization Problem with Singular Solutions. Applied Mathematics and Optimization, 56(3), 379–394. https://doi.org/10.1007/s00245-007-9009-6
  12. Lin, C.-J., Weng, R. C., & S. Sathiya Keerthi. (2008). Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research, 9(22), 627–650. https://doi.org/10.1145/1390681.1390703