Quasi-Newton methods
Author: Jianmin Su (ChemE 6800 Fall 2020)
Steward: Allen Yang, Fengqi You
Quasi-Newton Methods are a kind of methods used to solve nonlinear optimization problems. They are based on Newton's method yet can be an alternative of Newton's method when the objective function is not twice-differentiable, which means the Hessian matrix is unavailable, or it is too expensive to calculate the Hessian matrix and its inverse.
Introduction
The first quasi-Newton algorithm was developed by W.C. Davidon in the mid1950s and it turned out to be a milestone in nonlinear optimization problems. He was trying to solve a long optimization calculation but he failed to get the result with the original method due to the low performances of computers at that time, thus he managed to build the quasi-Newton method to solve it. Later then, Fletcher and Powell proved that the new algorithm was more efficient and more reliable than the other existing methods.
During the following years, numerous variants were proposed, include Broyden's method (1965), the SR1 formula (Davidon 1959, Broyden 1967), the DFP method (Davidon, 1959; Fletcher and Powell, 1963), and the BFGS method (Broyden, 1969; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970).
In optimization problems, Newton's method uses first and second derivatives, gradient and the Hessian in multivariate scenarios, to find the optimal point, it is applied to a twice-differentiable function to find the roots of the first derivative (solutions to ), also known as the stationary points of .
The iteration of Newton's method is usually written as: , where is the iteration number, is the Hessian matrix and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H=[\bigtriangledown ^2 f(x_k)]}
Iteraton would stop when it satisfies the convergence criteria like Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {df \over dx}=0, ||\bigtriangledown f(x)||<\epsilon \text{ or } |f(x_{k+1})-f(x_k)|<\epsilon }
Though we can solve an optimization problem quickly with Newton's method, it has two obvious disadvantages:
- The objective function must be twice-differentiable and the Hessian matrix must be positive definite.
- The calculation is costly because it requires to compute the Jacobian matrix, Hessian matrix and its inverse, which is time-consuming when dealing with a large-scale optimization problem.
However, we can use Quasi-Newton methods to avoid these two disadvantages.·
Quasi-Newton methods are similar to Newton's method but with one key idea that is different, they don't calculate the Hessian matrix, they introduce a matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B}
to estimate the Hessian matrix instead so that they can avoid the time-consuming calculations of Hessian matrix and its inverse. And there are many variants of quasi-Newton methods that simply depend on the exact methods they use in the estimation of the Hessian matrix.
Theory and Algorithm
To illustrate the basic idea behind quasi-Newton methods, we start with building a quadratic model of the objective function at the current iterate Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_k} :
(1.1), where is an symmetric positive definite matrix that will be updated at every iteration.
The minimizer of this convex quadratic model is:
(1.2), which is also used as the search direction.
Then the new iterate could be written as: (1.3),
where is the step length that should satisfy the Wolfe conditions. The iteration is similar to Newton's method, but we use the approximate Hessian instead of the true Hessian.
To maintain the curve information we got from the previous iteration in , we generate a new iterate and new quadratic modelto in the form of:
(1.4).
To construct the relationship between 1.1 and 1.4, we require that in 1.1 at the function value and gradient match and , and the gradient of should match the gradient of the objective function at the latest two iterates and , then we can get:
(1.5)
and with some arrangements:
(1.6)
Define , (1.7)
So that (1.6) becomes: (1.8), which is the secant equation.
To make sure is still a symmetric positive definite matrix, we need (1.9).
To further preserve properties of and determine uniquely, we assume that among all symmetric matrices satisfying secant equation, is closest to the current matrix , which leads to a minimization problem:
(1.10) s.t. , ,
where and satisfy (1.9) and is symmetric and positive definite.
Different matrix norms applied in (1.10) results in different quasi-Newton methods. The weighted Frobenius norm can help us get an easy solution to the minimization problem: (1.11).
The weighted matrix can be any matrix that satisfies the relation ., where can be assumed as Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle W=G_{k}^{-1}} , is the mean Hessian defined by: Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle G_{k}=[calculus]}
We skip the procedure of solving the minimization problem (1.10) and the unique solution of (1.10) is:
(1.12)
(1.13)
Set Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H_k=B_k^{-1} }
, with Sherman-Morrison formula we can get:
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H_{k+1}=H_k+\frac{s_k s_k^T}{s_k^T y_k}-\frac{H_k y_k y_k^T H_k}{y_k^T H_k y_k} }
In the DFP method, we use Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B_k} to estimate the inverse of Hessian matrix
In the BFGS method, we use to estimate the Hessian matrix
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B_k H_k }
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle }
DFP Algorithm
- Given the starting point ; convergence tolerance Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon, \epsilon>0} ; the initial estimation of inverse Hessian matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle D_0=I} ; Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k=0} .
- Compute the search direction .
- Compute the step length Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda_k} with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda=\arg \underset{\lambda \in \mathbb{R}}{min} f(x_k+\lambda d_k), } , and then setFailed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_k={\lambda}_k d_k} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_{k+1}=x_k+s_k}
- If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle ||g_{k+1}||<\epsilon} , then end of the iteration, otherwise continue step5.
- Computing Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y_k=g_{k+1}-g_k} .
- Update the Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle D_{k+1}} withFailed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle D_{k+1}=D_k+\frac{s_k s_k^T}{s_k^T y_k}-\frac{D_k y_k y_k^T D_k}{y_k^T D_k y_k} }
- Update Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k} with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k=k+1} and go back to step2.
BFGS Algorithm
- Given the starting point Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_0} ; convergence tolerance Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon, \epsilon>0} ; the initial estimation of Hessian matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B_0=I} ; Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k=0} .
- Compute the search direction Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_k=-B_k^{-1}\cdot g_k} .
- Compute the step length Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda_k} with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda=\arg \underset{\lambda \in \mathbb{R}}{min} f(x_k+\lambda d_k), } , and then setFailed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_k={\lambda}_k d_k} , then Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_{k+1}=x_k+s_k}
- If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle ||g_{k+1}||<\epsilon} , then end of the iteration, otherwise continue step5.
- Computing Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y_k=g_{k+1}-g_k} .
- Update the Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B_{k+1} } withFailed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle B_{k+1}=B_k+\frac{y_k y_k^T}{y_k^T s_k}-\frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} }
- Update Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k}
with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k=k+1}
and go back to step2.
Numerical Example
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} f(x_1, x_2) & = x_1^2 +\frac{1}{2}x_2^2+3\end{align} }