# Difference between revisions of "Quasi-Newton methods"

Author: Jianmin Su (ChemE 6800 Fall 2020)

Steward: Allen Yang, Fengqi You

Quasi-Newton Methods are a kind of methods used to solve nonlinear optimization problems. They are based on Newton’s method yet can be an alternative of Newton’s method when the objective function is not twice-differentiable, which means the Hessian matrix is unavailable, or it is too expensive to calculate the Hessian matrix and its inverse.

## Introduction

The first quasi-Newton algorithm was developed by W.C. Davidon in the mid1950s and it turned out to be a milestone in nonlinear optimization problems. He was trying to solve a long optimization calculation but he failed to get the result with the original method due to the low performances of computers at that time, thus he managed to build the quasi-Newton method to solve it. Later then, Fletcher and Powell proved that the new algorithm was more efficient and more reliable than the other existing methods.

During the following years, numerous variants were proposed, include Broyden's method (1965), the SR1 formula (Davidon 1959, Broyden 1967), the DFP method (Davidon, 1959; Fletcher and Powell, 1963), and the BFGS method (Broyden, 1969; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970).

In optimization problems, Newton's method uses first and second derivatives, gradient and the Hessian in multivariate scenarios, to find the optimal point, it is applied to a twice-differentiable function ${\displaystyle f}$ to find the roots of the first derivative (solutions to ${\displaystyle f'(x)=0}$), also known as the stationary points of ${\displaystyle f}$.

The iteration of Newton's method is usually written as: ${\displaystyle x_{k+1}=x_{k}-H^{-1}\cdot \bigtriangledown f(x_{k})}$, where${\displaystyle k}$ is the iteration number, ${\displaystyle H}$is the Hessian matrix and ${\displaystyle H=[\bigtriangledown ^{2}f(x_{k})]}$

Iteraton would stop when it satisfies the convergence criteria like ${\displaystyle {df \over dx}=0,||\bigtriangledown f(x)||<\epsilon {\text{ or }}|f(x_{k+1})-f(x_{k})|<\epsilon }$

Quasi-Newton methods are similar to Newton's method but with one key idea that is different, they don't calculate the Hessian matrix, they introduce a matrix ${\displaystyle B}$ to estimate the Hessian matrix instead so that they can avoid the time-consuming calculations of Hessian matrix and its inverse. And there are many variants of quasi-Newton methods that simply depend on the exact methods they use in the estimation of the Hessian matrix.

## Theory and Algorithm

formulas that will use

the objective function at current iterate ${\displaystyle x_{k}}$ B_k

${\displaystyle m_{k}(p)=f_{k}+\bigtriangledown f_{k}^{T}p+{\frac {1}{2}}p^{T}B_{k}p}$

the objective function at iterate ${\displaystyle x_{k}+1}$ to update B_k+1

${\displaystyle m_{k+1}(p)=f_{k+1}+\bigtriangledown f_{k+1}^{T}p+{\frac {1}{2}}p^{T}B_{k+1}p}$

${\displaystyle B_{k+1}\alpha _{k}p_{k}=\bigtriangledown f_{k+1}-\bigtriangledown f_{k}}$

Define ${\displaystyle s_{k}=x_{k+1}-x_{k}}$, ${\displaystyle y_{k}=\bigtriangledown f_{k+1}-\bigtriangledown f_{k}}$

So that ${\displaystyle B_{k+1}s_{k}=y_{k}}$, which is secant equation, can maintain ${\displaystyle s_{k}^{T}B_{k+1}s_{k}>0}$, along with Wolfe conditions to maintain is minimizing the objective function.

To further prove ${\displaystyle B_{k+1}}$ is positive definite , converge

${\displaystyle B_{k+1}={\underset {B}{min}}||B-B_{k}||}$

Using different norms will lead to different methods that used to update ${\displaystyle B_{k}}$

${\displaystyle ||A||_{W}=||W^{\frac {1}{2}}AW^{\frac {1}{2}}||_{F}}$, the right is Frobenius norm, W is the mean of Hessian matrix

${\displaystyle B_{k+1}=(I-\rho y_{k}s_{k}^{T})B_{k}(I-\rho s_{k}y_{k}^{T})+\rho y_{k}y_{k}^{T}}$

${\displaystyle \rho ={\frac {1}{y_{k}^{T}s_{k}}}}$

Set ${\displaystyle H_{k}=B_{k}^{-1}}$ with Sherman-Morrison formaula, we can get ${\displaystyle H_{k+1}=H_{k}+{\frac {s_{k}s_{k}^{T}}{s_{k}^{T}y_{k}}}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}}$

In the DFP method, we use ${\displaystyle B_{k}}$ to estimate the inverse of Hessian matrix

In the BFGS method, we use ${\displaystyle B_{k}}$ to estimate the Hessian matrix

${\displaystyle B_{k}H_{k}}$

### DFP Algorithm

1. Given the starting point ${\displaystyle x_{0}}$; convergence tolerance ${\displaystyle \epsilon ,\epsilon >0}$; the initial estimation of inverse Hessian matrix ${\displaystyle D_{0}=I}$; ${\displaystyle k=0}$.
2. Compute the search direction ${\displaystyle d_{k}=-D_{k}\cdot g_{k}}$.
3. Compute the step length ${\displaystyle \lambda _{k}}$ with ${\displaystyle \lambda =\arg {\underset {\lambda \in \mathbb {R} }{min}}f(x_{k}+\lambda d_{k}),}$, and then set${\displaystyle s_{k}={\lambda }_{k}d_{k}}$, then ${\displaystyle x_{k+1}=x_{k}+s_{k}}$
4. If ${\displaystyle ||g_{k+1}||<\epsilon }$, then end of the iteration, otherwise continue step5.
5. Computing ${\displaystyle y_{k}=g_{k+1}-g_{k}}$.
6. Update the ${\displaystyle D_{k+1}}$ with${\displaystyle D_{k+1}=D_{k}+{\frac {s_{k}s_{k}^{T}}{s_{k}^{T}y_{k}}}-{\frac {D_{k}y_{k}y_{k}^{T}D_{k}}{y_{k}^{T}D_{k}y_{k}}}}$
7. Update ${\displaystyle k}$ with ${\displaystyle k=k+1}$ and go back to step2.

### BFGS Algorithm

1. Given the starting point ${\displaystyle x_{0}}$; convergence tolerance ${\displaystyle \epsilon ,\epsilon >0}$; the initial estimation of Hessian matrix ${\displaystyle B_{0}=I}$; ${\displaystyle k=0}$.
2. Compute the search direction ${\displaystyle d_{k}=-B_{k}^{-1}\cdot g_{k}}$.
3. Compute the step length ${\displaystyle \lambda _{k}}$ with ${\displaystyle \lambda =\arg {\underset {\lambda \in \mathbb {R} }{min}}f(x_{k}+\lambda d_{k}),}$, and then set${\displaystyle s_{k}={\lambda }_{k}d_{k}}$, then ${\displaystyle x_{k+1}=x_{k}+s_{k}}$
4. If ${\displaystyle ||g_{k+1}||<\epsilon }$, then end of the iteration, otherwise continue step5.
5. Computing ${\displaystyle y_{k}=g_{k+1}-g_{k}}$.
6. Update the ${\displaystyle B_{k+1}}$ with${\displaystyle B_{k+1}=B_{k}+{\frac {y_{k}y_{k}^{T}}{y_{k}^{T}s_{k}}}-{\frac {B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}}}$
7. Update ${\displaystyle k}$ with ${\displaystyle k=k+1}$ and go back to step2.

## Numerical Example

{\displaystyle {\begin{aligned}f(x_{1},x_{2})&=x_{1}^{2}+{\frac {1}{2}}x_{2}^{2}+3\end{aligned}}}