Line search methods

Authors: Lihe Cao, Zhengyi Sui, Jiaqi Zhang, Yuqing Yan, and Yuhui Gu (6800 Fall 2021).

Introduction

When solving unconstrained optimization problems, the user need to supply a starting point for all algorithms. With the initial starting point, $x_{0}$ , optimization algorithms generate a sequence of iterates $\{x_{k}\}_{k=0}^{\infty }$ which terminates when an approximated solution has been achieved or no more progress can be made. Line Search is one of the two fundamental strategies for locating the new $x_{k+1}$ given the current point.

Generic Line Search Method

Basic Algorithm

Pick an initial iterate point $x_{0}$
Do the following steps until $x_{k}$ $x_{k}$ is converged:
- Choose a descent direction $p_{k}$ from $x_{k}$ , which is defined as if $g_{k}\not =0$ , then $g_{k}^{\top }p_{k}<0$
- Calculate a decent step length $\alpha >0$ so that $f(x_{k}+\alpha _{k}p_{k})<f_{k}$
- Set $x_{k+1}=x_{k}+\alpha _{k}p_{k}$

Search Direction for Line Search

The direction of the line search should be chosen to make $f$ decrease moving from point $x_{k}$ to $x_{k+1}$ . The most obvious direction is the $-\nabla f_{k}$ because it is the one to make $f$ decreases most rapidly. We can verify the claim by Taylor's theorem:

$f(x_{k}+\alpha )=f(x_{k})+\alpha p^{\top }\nabla f_{k}$ where $t\in (0,\alpha )$

The rate of change in $f$ along the direction $p$ at $x_{k}$ is the coefficient of $\alpha$ . Therefore, the unit direction $p$ of most rapid decrease is the solution to

$\min p^{\top }\nabla f_{k}$ subject to $||p||=1$ .

$p=-\nabla f_{k}/||\nabla f_{k}||$ is the solution and this direction is orthogonal to the contours of the function. In the following sections, we will use this as the default direction of the line search.

Step Length

The step length is a non-negative value such that $f(x^{k}+\alpha ^{k}p^{k})<f^{k}$ . When choosing the step length $\alpha _{k}$ , we need to trade off between giving a substantial reduction of $f$ and not spending too much time finding the solution.If $\alpha _{k}$ is too large, then the step will overshoot, while if the step length is too small, it is time consuming to find the convergent point. We have exact line search and inexact line search to find the value of $\alpha$ and more detail about these approaches will be introduced in the next section.

Convergence

For a line search algorithm to be reliable, it should be globally convergent, that is the gradient norms, $||\nabla f(x_{k})||$ , should converge to zero with each iteration, i.e., $\lim _{k\to \infty }||\nabla f(x_{k})||=0$ .

It can be shown from Zoutendijk's theorem that if the line search algorithm satisfies (weak) Wolfe's conditions (similar results also hold for strong Wolfe and Goldstein conditions) and has a search direction that makes an angle with the steepest descent direction that is bounded away from 90°, the algorithm is globally convergent.

The Zoutendijk's theorem states that, given an iteration where $p_{k}$ is the descent direction and $\alpha _{k}$ is the step length that satisfies (weak) Wolfe conditions, if the objective $f$ is bounded below in $\mathbb {R} ^{n}$ and is continuously differentiable in an open set ${\mathcal {N}}$ containing the level set ${\mathcal {L}}:=\{x\ |\ f(x)\leq f(x_{0})\}$ where $x_{0}$ is the starting point of the iteration, and the gradient $\nabla f$ is Lipschitz continuous on ${\mathcal {N}}$ , then

$\sum _{k=0}^{\infty }\cos ^{2}\theta _{k}||\nabla f_{k}||^{2}<\infty$ ,

where $\theta _{k}$ is the angle between $p_{k}$ and the steepest descent direction $-\nabla f(x_{k})$ .

The Zoutendijk condition above implies that

$\lim _{k\to \infty }\cos ^{2}\theta _{k}||\nabla f_{k}||^{2}=0$ ,

by the n-th term divergence test. Hence, if the algorithm chooses a search direction that is bounded away from $90^\circ$ relative to the gradient, i.e., given $\epsilon >0$ ,

$\cos \theta _{k}\geq \epsilon >0,\ \forall k$ ,

it follows that

$\lim _{k\to \infty }||\nabla f_{k}||=0$ .

However, the Zoutendijk condition doesn't guarantee convergence to a local minimum but only stationary points. Hence, additional conditions on the search direction is necessary, such as finding a direction of negative curvature, to prevent the iteration from converging to a nonminimizing stationary point.

Exact Search

Steepest Descent Method

Given the intuition that the negative gradient $-\nabla f_{k}$ can be an effective search direction, steepest descent follows the idea and establishes a systematic method for minimizing the objective function. Setting $-\nabla f_{k}$ as the direction, steepest descent computes the step-length $\alpha ^{k}$ by minimizing a single-variable objective function. More specifically, the steps of Steepest Descent Method are as follows.

Steepest Descent Algorithm

Set a starting point $x_{0}$
Set a convergence rate $\epsilon$
Set $k=0$
Set the maximum iteration $N$
While $k\leq N$ :

$\nabla f(x_{k})={\frac {\partial f(x)}{\partial x}}|_{x=x_{k}}$

If $\nabla f(x_{k})\leq \epsilon$ :

Break

EndIf

$\alpha _{k}={\underset {\alpha }{\arg \min }}f(x_{k}-\alpha \nabla f(x_{k}))$
$x_{k+1}=x_{k}-\alpha _{k}\nabla f(x_{k})$
$k=k+1$
Return $x_{k}$ , $f(x_{k})$

One advantage of the steepest descent method is that it has a nice convergence theory. For a steepest descent method, it converges to a local minimal from any starting point.

Theorem: global convergence of steepest descent^[1]

Let the gradient of $f\in C^{1}$ be uniformly Lipschitz continuous on $\mathbb {R} ^{n}$ . Then, for the iterates with steepest-descent search directions, one of the following situations occurs:

$\nabla f(x_{k})=0$ for some finite $k$
$\lim _{k\to \infty }f(x_{k})=-\infty$
$\lim _{k\to \infty }\nabla f(x_{k})=0$

Steepest descent method is a special case of gradient descent in that the step-length is rigorously defined. Generalization can be made regarding the choice of $\alpha$ .

Inexact Search

Wolfe Conditions

Goldstein Conditions

Backtracking

Numeric Example

Applications

Reference

↑ Dr Raphael Hauser, Oxford University Computing Laboratory, Line Search Methods for Unconstrained Optimization [1]

[1] Dr Raphael Hauser, Oxford University Computing Laboratory, Line Search Methods for Unconstrained Optimization [1]

[1]