Conjugate gradient methods: Difference between revisions

Revision as of 16:35, 6 December 2021

Author: Alexandra Roberts, Anye Shi, Yue Sun (SYSEN 6800 Fall 2021)

Introduction

Figure 1. A comparison of the convergence of gradient descent (in green) and conjugate vector (in red) for minimizing a quadratic function. In theory, the conjugate gradient method converges in at most $$ n $$ steps, where $$ n $$ is the size of the matrix of the system (here $$ n=2 $$ ).^[1]

The conjugate gradient method (CG) was originally invented to minimize a quadratic function:
$F(\textbf{x})=\frac{1}{2}\textbf{x}^{T}\textbf{A}\textbf{x}-\textbf{b}\textbf{x}$
where $\textbf{A}$ is an $n \times n$ symmetric positive definite matrix, $\textbf{x}$ and $\textbf{b}$ are $n \times 1$ vectors.
The solution to the minimization problem is equivalent to solving the linear system, i.e. determining $\textbf{x}$ when $\nabla F(x) = 0$ , i.e. $\textbf{A}\textbf{x}-\textbf{b} = \textbf{0}$

The conjugate gradient method is often implemented as an iterative algorithm and can be considered as being between Newton’s method, a second-order method that incorporates Hessian and gradient, and the method of steepest descent, a first-order method that uses gradient ^[2]. Newton’s Method usually reduces the number of iterations needed, but the calculation of the Hessian matrix and its inverse increases the computation required for each iteration. Steepest descent takes repeated steps in the opposite direction of the gradient of the function at the current point. It often takes steps in the same direction as earlier ones, resulting in slow convergence (Figure 1). To avoid the high computational cost of Newton’s method and to accelerate the convergence rate of steepest descent, the conjugate gradient method was developed.

The idea of the CG method is to pick $$ n $$ orthogonal search directions first and, in each search direction, take exactly one step such that the step size is to the proposed solution $$ x $$ at that direction. The solution is reached after $$ n $$ steps ^[3] as, theoretically, the number of iterations needed by the CG method is equal to the number of different eigenvalues of $\textbf{A}$ , i.e. at most $$ n $$ . This makes it attractive for large and sparse problems. The method can be used to solve least-squares problems and can also be generalized to a minimization method for general smooth functions ^[3].

Theory

The definition of A-conjugate direction

Let $\textbf{A}$ be a symmetric positive definite matrix. $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\},\textbf{d}_{i} \in\textbf{R}^{n},\textbf{d}_{i} \neq 0$ are the vectors that orthogonal (conjugate) to each other with respect to $\textbf{A}$ if
$\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_j = 0,\forall i\neq j$ .

Note that if $\textbf{A}=0$ , any two vectors will be conjugated to each other. If $\textbf{A}=\textbf{I}$ , conjugacy is equivalent to the conventional notion of orthogonality. If $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ are $\textbf{A}$ -conjugated to each other, then the set of vectors $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ are linearly independent.

The motivation of A-conjugacy^[4]

As $D=\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ is a set of $$ n $$ $\textbf{A}$ -conjugate vectors, then $$ D $$ can be used as a basis and express the solution $\textbf{x}^{\ast}$ to $\nabla F(\textbf{x}) = \textbf{Ax} - \textbf{b} = \textbf{0}$ is:
$\textbf{x}^{\ast} = \sum_{i=0}^{n-1}\alpha _{i} \textbf{d}_{i}$
$\textbf{A}\textbf{x}^{\ast} = \sum_{i=0}^{n-1}\alpha _{i} \textbf{A}\textbf{d}_{i}$
Then multiplying dKT on both sides
$\textbf{d}_{k}^{T}\textbf{A}\textbf{x}^{*} = \sum_{i=0}^{n-1}\alpha _{i} \textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{i}$
Because $\textbf{Ax} = \textbf{b}$ and the A-conjugacy of $$ D $$ , i.e. $\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_i = 0, \forall k \neq i$ , the multiplication will cancel out all the terms except for term k
$\textbf{d}_{k}^{T}\textbf{b} = \alpha _{k} \textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{k}$

$\alpha_{k}=\frac{\textbf{d}_{k}^{T}\textbf{b}}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{k}}$

Then the solution x* will be
$\textbf{x}^{*}= \sum_{i=0}^{n-1}\alpha _{i}\textbf{d}_{i}=\sum_{i=0}^{n-1}\frac{\textbf{d}_{i}^{T}\textbf{b}}{\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_{i}}\textbf{d}_{i}$

Because A is a symmetric and positive-definite matrix, so the term $\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_{i}$ defines an inner product and, therefore, no need to calculate the inversion of matrix A.

Conjugate Direction Theorem

Let $D=\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ be a set of n A-conjugate vectors, $\textbf{x}_0 \in \textbf{R}^n$ be a random starting point. Then
$\textbf{x}_{k+1} = \textbf{x}_{k} + \alpha_{k} \textbf{d}_{k}$
$\textbf{g}_{k} = \textbf{b}- \textbf{A}\textbf{x}_{k}$
$\alpha_{k} = \frac{\textbf{g}_{k}^{T}\textbf{d}_{k}}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{k}} =\frac{(\textbf{b}-\textbf{A}\textbf{x}_{k})^{T}\textbf{d}_{k}}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{k}}$
After n steps, xn = x*.

Proof:
Given
$\textbf{x}^* - \textbf{x}_0 = \alpha_0\textbf{d}_0 + \alpha_1\textbf{d}_1+...+\alpha_{n-1}\textbf{d}_{n-1}$
$\textbf{x}^k - \textbf{x}_0 = \alpha_0\textbf{d}_0 + \alpha_1\textbf{d}_1+...+\alpha_{k-1}\textbf{d}_{k-1}$
$\textbf{g}_k = \textbf{b}-\textbf{A}\textbf{x}_k = \textbf{A}\textbf{x}^{*}-\textbf{A}\textbf{x}_k = \textbf{A}(\textbf{x}^{*}-\textbf{x}_k)$
Therefore

$\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_0) = \textbf{d}_{k}^{T}\textbf{A}(\alpha_0\textbf{d}_0+\alpha_1\textbf{d}_1 +...+\alpha_{n-1}\textbf{d}_{n-1}) = \alpha_k\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k$
$\alpha_{k} = \frac{\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_0)}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k}$
$\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}_k-\textbf{x}_0) = \textbf{d}_{k}^{T}\textbf{A}(\alpha_0\textbf{d}_0+\alpha_1\textbf{d}_1 +...+\alpha_{k-1}\textbf{d}_{k-1} ) = 0$
$\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_0) = \textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_k + \textbf{x}_k - \textbf{x}_0) =\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_k)$
$\alpha_k = \frac{\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_0)}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k} = \frac{\textbf{d}_{k}^{T}\textbf{A}(\textbf{x}^*-\textbf{x}_k)}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k}=\frac{\textbf{d}_{k}^{T}\textbf{g}_k}{\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k}$

The conjugate gradient method

The conjugate gradient method is a conjugate direction method in which selected successive direction vectors are treated as a conjugate version of the successive gradients obtained while the method progresses. The conjugate directions are not specified beforehand but rather are determined sequentially at each step of the iteration ^[4]. If the conjugate vectors are carefully chosen, then not all the conjugate vectors may be needed to obtain the solution. Therefore, the conjugate gradient method is regarded as an iterative method. This also allows approximate solutions to systems where n is so large that the direct method requires too much time ^[3].

Algorithm^[3]

Given $D=\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ be a set of n A-conjugate vectors, then $F(\textbf{x}_0 +\alpha_1\textbf{d}_1 + \alpha_2\textbf{d}_2+...+\alpha_{n}\textbf{d}_{n})$ can be minimized by stepping from $\textbf{x}_0$ along $\textbf{d}_1$ to the minimum $\textbf{x}_1$ , stepping from $\textbf{x}_1$ along $\textbf{d}_2$ to the minimum $\textbf{x}_2$ , etc. And let $\textbf{x}_0 \in \textbf{R}_n$ be randomly chosen, then the algorithm is the following:

Alg 1: Pick $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ mutually A-conjugate, and from a random $\textbf{x}_0$ ,
For k = 1 to n
{

$\alpha_k = \textbf{d}_{k}^{T}(\textbf{b}-\textbf{A}\textbf{x}_{k-1})/\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k$ ;
$\textbf{x}_{k} = \textbf{x}_{k-1} + \alpha_{k}\textbf{d}_{k}$ ;

}
Return $\textbf{x}_n$

Here Alg 1 is with a particular choice of $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ . Let $\textbf{g}_{k} = \textbf{A}\textbf{x}_{k} - \textbf{b}$ be the gradient at $\textbf{x}_k$ . A practical way to enforce this is by requiring that the next search direction be built out of the current gradient and all previous search directions. The CG method picks $\textbf{d}_{k+1}$ as the component of $\textbf{g}_k$ A-conjugate to $\left\{ \textbf{d}_{0},\textbf{d}_{1},..., \textbf{d}_{n-1}\right\}$ :

$\textbf{d}_{k+1} = \textbf{g}_k-\sum_{i=0}^{k}\frac{\textbf{g}_{k}^{T}\textbf{A}\textbf{d}_{i}}{\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_{i}}\textbf{d}_i$

As $\textbf{g}_{k}^{T}\textbf{A}\textbf{d}_{i} = 0$ , for i = 1,...,k, giving the following CG algorithm:
Alg 2: From a random $\text{x}_0$ ,
For k = 1 to n
{

$\textbf{g}_{k-1} = \textbf{b}-\textbf{A}\textbf{x}_{k-1}$ ;
if $\textbf{g}_{k-1} = 0$ return $\textbf{x}_{k-1}$ ;
if (k > 1) $\beta_k = \textbf{d}_{k-1}^T\textbf{A}\textbf{g}_{k-1}/{\textbf{d}_{k-1}^T\textbf{A}\textbf{d}_{k-1}}$ ;
if (k = 1) $\textbf{d}_k = \textbf{g}_0$ ;
else $\textbf{d}_{k} = \textbf{g}_{k-1}-\beta_k\textbf{d}_{k-1}$ ;
$\alpha_{k} = \textbf{d}_{k}^{T}\textbf{g}_{k-1}/\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_{k}$ ;
$\textbf{x}_{k} = \textbf{x}_{k-1} + \alpha_{k}\textbf{d}_{k}$ ;

}
Return $\textbf{x}_n$

The formulas in the Alg 2 can be simplified as the following:
$\textbf{x}_i = \textbf{x}_{i-1}+\alpha_i\textbf{d}_i$
$\textbf{b}-\textbf{A}\textbf{x}_i = \textbf{b}-\textbf{A}\textbf{x}_{i-1}-\alpha_i\textbf{A}\textbf{d}_i$
$\textbf{g}_i = \textbf{g}_{i-1}-\alpha_i\textbf{A}\textbf{d}_i$
Then $\beta_i$ and $\alpha_i$ can be simplified by multiplying the above gradient formula by $\textbf{g}_i$ and $\textbf{g}_{i-1}$ as the following:
$\textbf{g}_{i}^{T}\textbf{g}_i = -\alpha_i\textbf{g}_{i}^{T}\textbf{A}\textbf{d}_i$
$\textbf{g}_{i-1}^{T}\textbf{g}_{i-1} = \alpha_i\textbf{g}_{i-1}^{T}\textbf{A}\textbf{d}_i$
As $\textbf{g}_{i-1} = \textbf{d}_i+\beta_i\textbf{d}_{i-1}$ ,
so we have
$\textbf{g}_{i-1}^{T}\textbf{g}_{i-1} = \alpha_i\textbf{g}_{i-1}^{T}\textbf{A}\textbf{d}_i=\alpha_i\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_i$
Therefore
$\beta_{i+1} = -\frac{\textbf{g}_{i}^{T}\textbf{g}_{i}}{\textbf{g}_{i-1}^{T}\textbf{g}_{i-1}}$
This gives the following simplified version of Alg 2:
Alg 3: From a random $\textbf{x}_0$ , and set $\textbf{g}_0 = \textbf{b} - \textbf{A}\textbf{x}_0$ ,
For k = 1 to n
{

if $\textbf{g}_{k-1} = 0$ return $\textbf{x}_{k-1}$ ;
if (k > 1) $\beta_k = -(\textbf{g}_{k-1}^{T}\textbf{g}_{k-1})/(\textbf{g}_{k-2}^{T}\textbf{g}_{k-2})$ ;
if (k = 1) $\textbf{d}_k = \textbf{g}_0$ ;
else $\textbf{d}_{k} = \textbf{g}_{k-1}-\beta_k\textbf{d}_{k-1}$ ;
$\alpha_k=(\textbf{g}_{k-1}^{T}\textbf{g}_{k-1})/(\textbf{d}_{k}^{T}\textbf{A}\textbf{d}_k)$ ;
$\textbf{x}_k = \textbf{x}_{i-1} + \alpha_i\textbf{d}_i$
$\textbf{g}_{i}=\textbf{g}_{i-1}-\alpha_i\textbf{A}\textbf{d}_i$ ;

}
Return $\textbf{x}_n$

Numerical example

Consider the linear system $\textbf{A}\textbf{x} = \textbf{b}$
$\textbf{A}\textbf{x} = \begin{bmatrix}5 & 1 \\1 & 8 \\\end{bmatrix}\begin{bmatrix}x_{1} \\x_{2}\end{bmatrix} = \begin{bmatrix}3 \\2\end{bmatrix}$ .
The initial starting point is set to be
$\textbf{x}_{0} = \begin{bmatrix}2 \\1\end{bmatrix}$ .
Implement the conjugate gradient method to approximate the solution to the system.

Solution:
The exact solution is given below for later reference:
$\textbf{x}_{*} = \begin{bmatrix}22/39 \\7/39\end{bmatrix}\approx \begin{bmatrix}0.5641\\0.1794\end{bmatrix}$ .

Step 1:
$\textbf{g}_{0}=\textbf{b}-\textbf{A}\textbf{x}_0 = \begin{bmatrix}3 \\2\end{bmatrix}-\begin{bmatrix}5 &1 \\1& 8\\\end{bmatrix}\begin{bmatrix}2\\1\end{bmatrix} =\begin{bmatrix}-8 \\-8\end{bmatrix}= \textbf{d}_1$

Step 2:
$\alpha_1 = \frac{\textbf{g}_{0}^{T}\textbf{g}_{0}}{\textbf{d}_{1}^{T}\textbf{A}\textbf{d}_1}=\frac{\begin{bmatrix}-8 &-8 \\\end{bmatrix}\begin{bmatrix}-8 \\-8\end{bmatrix}}{\begin{bmatrix}-8 &-8 \\\end{bmatrix}\begin{bmatrix}5 &1 \\1& 8 \\\end{bmatrix}\begin{bmatrix}-8 \\-8\end{bmatrix}}=\frac{2}{15}$

Step 3:
$\textbf{x}_1 = \textbf{x}_{0} + \alpha_{1}\textbf{d}_{1}=\begin{bmatrix}2\\1\end{bmatrix}+\frac{2}{15}\begin{bmatrix}-8\\-8\end{bmatrix}=\begin{bmatrix}0.9333 \\-0.0667\end{bmatrix}$

Step 4:
$\textbf{g}_{1}=\textbf{g}_{0}-\alpha_1\textbf{A}\textbf{d}_1 = \begin{bmatrix}-8\\-8 \end{bmatrix}-\frac{2}{15}\begin{bmatrix}5&1\\1&8\end{bmatrix}\begin{bmatrix}-8\\-8\end{bmatrix} = \begin{bmatrix}-1.6\\1.6\end{bmatrix}$

Step 5:
$\beta_2 =-\frac{\textbf{g}_{1}^T\textbf{g}_{1}}{{\textbf{g}_{0}^T\textbf{g}_{0}}}=- \frac{\begin{bmatrix}-1.6&1.6\\\end{bmatrix}\begin{bmatrix}-1.6 \\1.6\end{bmatrix}}{\begin{bmatrix}-8 &-8\\\end{bmatrix}\begin{bmatrix}-8 \\-8\end{bmatrix}}= -0.04$

Step 6:
$\textbf{d}_2 = \textbf{g}_1 - \beta_2\textbf{d}_1=\begin{bmatrix}-1.6\\1.6\end{bmatrix}+0.04\begin{bmatrix}-8\\-8\end{bmatrix}=\begin{bmatrix}-1.92 \\1.28\end{bmatrix}$

Step 7:
$\alpha_2 =\frac{\textbf{g}_{1}^T\textbf{g}_{1}}{{\textbf{d}_{2}^T\textbf{A}\textbf{d}_{2}}}=\frac{\begin{bmatrix}-1.6 &1.6\end{bmatrix}\begin{bmatrix}-1.6\\1.6\end{bmatrix}}{\begin{bmatrix}-1.92&1.28\\\end{bmatrix}\begin{bmatrix}5 &1 \\1& 8 \\\end{bmatrix}\begin{bmatrix}-1.92\\1.28\end{bmatrix}}=0.1923$

Step 8:
$\textbf{x}_2 = \textbf{x}_{1} +\alpha_{2}\textbf{d}_{2}=\begin{bmatrix}0.9333\\-0.0667\end{bmatrix}+0.1923\begin{bmatrix}-1.92\\1.28\end{bmatrix}=\begin{bmatrix}0.5641 \\0.1794\end{bmatrix}$

Therefore, $\textbf{x}_2$ is the approximation result of the system.

Application

Conjugate gradient methods have often been used to solve a wide variety of numerical problems, including linear and nonlinear algebraic equations, eigenvalue problems and minimization problems. These applications have been similar in that they involve large numbers of variables or dimensions. In these circumstances any method of solution which involves storing a full matrix of this large order, becomes inapplicable. Thus recourse to the conjugate gradient method may be the only alternative ^[5]. Here we show an example of image reconstruction.

Iterative image reconstruction

The conjugate gradient method is used to solve for the update in iterative image reconstruction problems. For example, in the magnetic resonance imaging (MRI) contrast known as quantitative susceptibility mapping (QSM), the reconstructed image $\chi$ is iteratively solved for from magnetic field data $\textbf{b}$ by the relation^[6]
$\textbf{b}=\textbf{D}\chi$

Where $$ D $$ is the matrix expressing convolution with the dipole kernel in the Fourier domain. Given that the problem is ill-posed, a physical prior is used in the reconstruction, which is framed as a constrained L1 norm minimization

$min_{\chi}\left\| f(\chi)\right\|_1$
$s.t. \left\| g(\chi)-c\right\|_{2}^{2}$

A detailed treatment of the function $f(\chi)$ and $g(\chi)$ can be found at ^[6]. This problem can be expressed as an unconstrained minimization problem via the Lagrange Multiplier Method

$min_{\chi,\lambda}E(\chi,\lambda)$

Where

$E(\chi,\lambda)\equiv \left\| f(\chi)\right\|_1+\lambda (\left\| g(\chi)-c\right\|_{2}^{2}-\varepsilon)$

The first-order conditions require $\nabla_{\chi}E(\chi,\lambda)=0$ and $\nabla_{\lambda}E(\chi,\lambda)=0$ . These conditions result in $\nabla_{\chi}f(\chi)+\nabla_{\chi}g(\chi) - \tilde{c}=0$ and $\left\| g(\chi)-c\right\|_{2}^{2}\approx \varepsilon$ , respectively. The update can be solved for $\nabla_{\chi}f(\chi)+\nabla_{\chi}g(\chi) - \tilde{c}=L(\chi)\chi-\tilde{c}=0$ via fixed point iteration ^[6].

$\chi_{n+1}=L^{-1}(\chi^n)\tilde{c}$

And expressed as the quasi-Newton problem, more robust to round-off error ^[6]

$\chi_{n+1}=\chi_n - L^{-1}(\chi_n)\nabla E(\chi_n,\lambda)$

Which is solved with the CG method until the residual $\left\|\chi_{n+1}-\chi\right\|_2/\left\|\chi_n\right\|_2\leq \theta$ where $\theta$ is a specified tolerance, such as $10^{-2}$ .

Conclusion

The conjugate gradient method was invented to avoid the high computational cost of Newton’s method and to accelerate the convergence rate of steepest descent. As an iterative method, each step only requires $\textbf{A}\textbf{d}_i$ multiplication free from the storage of matrix $\textbf{A}$ . And selected direction vectors are treated as a conjugate version of the successive gradients obtained while the method progresses. So it monotonically improves approximations $\textbf{x} _k$ to the exact solution and may reach the required tolerance after a relatively small (compared to the problem size) number of iterations in the absence of round-off error, which makes it widely used for solving large and sparse problems.

Reference

↑ Jonathan Shewchuk, “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain,” 1994.
↑ “Conjugate gradient method,” Wikipedia. Nov. 25, 2021. Accessed: Nov. 26, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Conjugate_gradient_method&oldid=1057033318
↑ ^3.0 ^3.1 ^3.2 ^3.3 W. Stuetzle, “The Conjugate Gradient Method.” 2001. [Online]. Available: https://sites.stat.washington.edu/wxs/Stat538-w03/conjugate-gradients.pdf
↑ ^4.0 ^4.1 A. Singh and P. Ravikumar, “Conjugate Gradient Descent.” 2012. [Online]. Available: http://www.cs.cmu.edu/~pradeepr/convexopt/Lecture_Slides/conjugate_direction_methods.pdf
↑ R. Fletcher, “Conjugate gradient methods for indefinite systems,” in Numerical Analysis, Berlin, Heidelberg, 1976, pp. 73–89. doi: 10.1007/BFb0080116.
↑ ^6.0 ^6.1 ^6.2 ^6.3 J. Liu et al., “Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map,” NeuroImage, vol. 59, no. 3, pp. 2560–2568, Feb. 2012, doi: 10.1016/j.neuroimage.2011.08.082.

[1] Jonathan Shewchuk, “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain,” 1994.

[2] “Conjugate gradient method,” Wikipedia. Nov. 25, 2021. Accessed: Nov. 26, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Conjugate_gradient_method&oldid=1057033318

[foo1-3] 3.0 ^3.1 ^3.2 ^3.3 W. Stuetzle, “The Conjugate Gradient Method.” 2001. [Online]. Available: https://sites.stat.washington.edu/wxs/Stat538-w03/conjugate-gradients.pdf

[foo55-4] 4.0 ^4.1 A. Singh and P. Ravikumar, “Conjugate Gradient Descent.” 2012. [Online]. Available: http://www.cs.cmu.edu/~pradeepr/convexopt/Lecture_Slides/conjugate_direction_methods.pdf

[5] R. Fletcher, “Conjugate gradient methods for indefinite systems,” in Numerical Analysis, Berlin, Heidelberg, 1976, pp. 73–89. doi: 10.1007/BFb0080116.

[foo43-6] 6.0 ^6.1 ^6.2 ^6.3 J. Liu et al., “Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map,” NeuroImage, vol. 59, no. 3, pp. 2560–2568, Feb. 2012, doi: 10.1016/j.neuroimage.2011.08.082.

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 7: / Line 7: @@
 <math>F(\textbf{x})=\frac{1}{2}\textbf{x}^{T}\textbf{A}\textbf{x}-\textbf{b}\textbf{x}</math><br>
 where <math>\textbf{A}</math> is an <math>n \times n</math> symmetric positive definite matrix, <math>\textbf{x}</math> and <math>\textbf{b}</math> are <math>n \times 1</math> vectors.<br>
-The solution to the minimization problem is equivalent to solving the linear system, i.e. determining <math>x</math> when <math>\nabla F(x) = 0</math>, i.e.<math>\textbf{A}\textbf{x}-\textbf{b} = \textbf{0}</math><br><br>
+The solution to the minimization problem is equivalent to solving the linear system, i.e. determining <math>\textbf{x}</math> when <math>\nabla F(x) = 0</math>, i.e.<math>\textbf{A}\textbf{x}-\textbf{b} = \textbf{0}</math><br><br>
 The conjugate gradient method is often implemented as an iterative algorithm and can be considered as being between [https://en.wikipedia.org/wiki/Newton%27s_method Newton’s method], a second-order method that incorporates Hessian and gradient, and the method of steepest descent, a first-order method that uses gradient <ref>“Conjugate gradient method,” Wikipedia. Nov. 25, 2021. Accessed: Nov. 26, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Conjugate_gradient_method&oldid=1057033318</ref>. Newton’s Method usually reduces the number of iterations needed, but the calculation of the Hessian matrix and its inverse increases the computation required for each iteration. Steepest descent takes repeated steps in the opposite direction of the gradient of the function at the current point. It often takes steps in the same direction as earlier ones, resulting in slow convergence (Figure 1). To avoid the high computational cost of Newton’s method and to accelerate the convergence rate of steepest descent, the conjugate gradient method was developed.<br><br>

Conjugate gradient methods: Difference between revisions

Revision as of 16:35, 6 December 2021

Contents

Introduction

Theory

The definition of A-conjugate direction

The motivation of A-conjugacy^[4]

Conjugate Direction Theorem

The conjugate gradient method

Algorithm^[3]

Numerical example

Application

Iterative image reconstruction

Conclusion

Reference

Navigation menu

Conjugate gradient methods: Difference between revisions

Revision as of 16:35, 6 December 2021

Introduction

Theory

The definition of A-conjugate direction

The motivation of A-conjugacy[4]

Conjugate Direction Theorem

The conjugate gradient method

Algorithm[3]

Numerical example

Application

Iterative image reconstruction

Conclusion

Reference

Navigation menu

Search

The motivation of A-conjugacy^[4]

Algorithm^[3]