Conjugate gradient methods: Difference between revisions

Revision as of 01:34, 28 November 2021

Author: Alexandra Roberts, Anye Shi, Yue Sun (SYSEN6800 Fall 2021)

Introduction

The conjugate gradient method (CG) was originally invented to minimize a quadratic function:
$F({\textbf {x}})={\frac {1}{2}}{\textbf {x}}^{T}{\textbf {A}}{\textbf {x}}-{\textbf {b}}{\textbf {x}}$
where A is an n × n symmetric positive definite matrix, x and b are n × 1 vectors.
The solution to the minimization problem is equivalent to solving the linear system, i.e. determining x when $\nabla F(x)=0$ , i.e. ${\textbf {A}}{\textbf {x}}-{\textbf {b}}={\textbf {0}}$

The conjugate gradient method is often implemented as an iterative algorithm and can be considered as being between Newton’s method, a second-order method that incorporates Hessian and gradient, and the method of steepest descent, a first-order method that uses gradient ^[1]. Newton’s Method usually reduces the number of iterations needed, but the calculation of the Hessian matrix and its inverse increases the computation required for each iteration. Steepest descent takes repeated steps in the opposite direction of the gradient of the function at the current point. It often takes steps in the same direction as earlier ones, resulting in slow convergence (Figure 1). To avoid the high computational cost of Newton’s method and to accelerate the convergence rate of steepest descent, the conjugate gradient method was developed.

The idea of the CG method is to pick n orthogonal search directions first and, in each search direction, take exactly one step such that the step size is to the proposed solution x at that direction. The solution is reached after n steps as, theoretically, the number of iterations needed by the CG method is equal to the number of different eigenvalues of A, i.e. at most n. This makes it attractive for large and sparse problems. The method can be used to solve least-squares problems and can also be generalized to a minimization method for general smooth functions^[2].

Theory

Let A be a symmetric positive definite matrix. Failed to parse (syntax error): {\displaystyle \left \{\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}, \textbf{d}_{i} ∈ \textbf{R}^{n}, \textbf{d}_{i} ≠ 0} are the vectors that orthogonal (conjugate) to each other with respect to A if
Failed to parse (syntax error): {\displaystyle \textbf{d}_{i}^{T}\textbf{A}\textbf{d}_j = 0, ∀i ≠ j} .

Note that if A = 0, any two vectors will be conjugated to each other. If A = I, conjugacy is equivalent to the conventional notion of orthogonality. If Failed to parse (syntax error): {\displaystyle \left \}\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}} are A-conjugated to each other, then the set of vectors Failed to parse (syntax error): {\displaystyle \left \}\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}} are linearly independent.

The conjugate gradient method

numerical example

Application

Conclusion

Reference

Jonathan Shewchuk, “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain,” 1994.

↑ “Conjugate gradient method,” Wikipedia. Nov. 25, 2021. Accessed: Nov. 26, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Conjugate_gradient_method&oldid=1057033318
↑ W. Stuetzle, “The Conjugate Gradient Method.” 2001. [Online]. Available: https://sites.stat.washington.edu/wxs/Stat538-w03/conjugate-gradients.pdf

[1] “Conjugate gradient method,” Wikipedia. Nov. 25, 2021. Accessed: Nov. 26, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Conjugate_gradient_method&oldid=1057033318

[2] W. Stuetzle, “The Conjugate Gradient Method.” 2001. [Online]. Available: https://sites.stat.washington.edu/wxs/Stat538-w03/conjugate-gradients.pdf

[1]

[2]

@@ Line 14: / Line 14: @@
 == Theory ==
-Let '''A''' be a symmetric positive definite matrix. <math>{\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}}, \textbf{d}_{i} ∈ \textbf{R}^{n}, \textbf{d}_{i} ≠ 0</math> are the vectors that orthogonal (conjugate) to each other with respect to '''A''' if <br>
+Let '''A''' be a symmetric positive definite matrix. <math>\left \{\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}, \textbf{d}_{i} ∈ \textbf{R}^{n}, \textbf{d}_{i} ≠ 0</math> are the vectors that orthogonal (conjugate) to each other with respect to '''A''' if <br>
 <math>\textbf{d}_{i}^{T}\textbf{A}\textbf{d}_j = 0, ∀i ≠ j</math>.<br>
-Note that if '''A''' = 0, any two vectors will be conjugated to each other. If '''A''' = '''I''', conjugacy is equivalent to the conventional notion of orthogonality. If <math>{\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}}</math> are '''A'''-conjugated to each other, then the set of vectors <math>{\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}}</math> are linearly independent.
+Note that if '''A''' = 0, any two vectors will be conjugated to each other. If '''A''' = '''I''', conjugacy is equivalent to the conventional notion of orthogonality. If <math>\left \}\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}</math> are '''A'''-conjugated to each other, then the set of vectors <math>\left \}\textbf{d}_{0}, \textbf{d}_{1}, …, \textbf{d}_{n-1}\right \}</math> are linearly independent.
 == The conjugate gradient method ==