Line search methods
Authors: Lihe Cao, Zhengyi Sui, Jiaqi Zhang, Yuqing Yan, and Yuhui Gu (6800 Fall 2021).
Introduction
When solving unconstrained optimization problems, the user need to supply a starting point for all algorithms. With the initial starting point, , optimization algorithms generate a sequence of iterates which terminates when an approximated solution has been achieved or no more progress can be made. Line Search is one of the two fundamental strategies for locating the new given the current point.
Generic Line Search Method
Basic Algorithm
- Pick an initial iterate point
- Do the following steps until is converged:
- Choose a descent direction from , which is defined as if , then
- Calculate a decent step length so that
- Set
Search Direction for Line Search
The direction of the line search should be chosen to make decrease moving from point to . The most obvious direction is the because it is the one to make decreases most rapidly. We can verify the claim by Taylor's theorem:
where
The rate of change in along the direction at is the coefficient of . Therefore, the unit direction of most rapid decrease is the solution to
subject to .
is the solution and this direction is orthogonal to the contours of the function. In the following sections, we will use this as the default direction of the line search.
Step Length
The step length is a non-negative value such that . When choosing the step length , we need to trade off between giving a substantial reduction of and not spending too much time finding the solution.If is too large, then the step will overshoot, while if the step length is too small, it is time consuming to find the convergent point. We have exact line search and inexact line search to find the value of and more detail about these approaches will be introduced in the next section.
Convergence
Exact Search
Steepest Descent Method
Given the intuition that the negative gradient can be an effective search direction, steepest descent follows the idea and establishes a systematic method for minimizing the objective function. Setting as the direction, steepest descent computes the step-length by minimizing a single-variable objective function. More specifically, the steps of Steepest Descent Method are as follows.
PSEUDOCODE HERE
One advantage of the steepest descent method is that it has a nice convergence theory. For a steepest descent method, it converges to a local minimal from any starting point.
Theorem: global convergence of steepest descent[1] Let the gradient of be uniformly Lipschitz continuous on . Then, for the iterates with steepest-descent search directions, one of the following situations occurs:
- for some finite
Steepest descent method is a special case of gradient descent in that the step-length is rigorously defined. Generalization can be made regarding the choice of .