Stochastic gradient descent - Revision history

Wc593 at 11:41, 21 December 2020

2020-12-21T11:41:40Z

← Older revision		Revision as of 07:41, 21 December 2020
Line 1:		Line 1:
	Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Mathews, Taiwo Olorunniwo (~~SYSEN~~ 5800 Fall 2020)~~<br>~~		Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Mathews, Taiwo Olorunniwo (SysEn 5800 Fall 2020)
	~~Steward: Fengqi You~~

	== Introduction ==		== Introduction ==

Jrp369: /* Gradient Computation and Parameter Update */

2020-12-09T02:12:14Z

Gradient Computation and Parameter Update

← Older revision		Revision as of 22:12, 8 December 2020
Line 81:		Line 81:
	Keep on updating the model through additional iterations to output [<math>w_1, w_2, b</math>] = [-19.021, -35.812, -1.232].		Keep on updating the model through additional iterations to output [<math>w_1, w_2, b</math>] = [-19.021, -35.812, -1.232].

	This is just a simple demonstration of the SGD process. In actual practice, more epochs can be utilized to run through the entire dataset enough times to ensure the best learning results based on the training dataset<ref name=":1">S. ~~Lawrence and~~ C. L. ~~Giles~~. Overfitting and neural networks: conjugate gradient and backpropagation. ''Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium'', ~~Como, Italy~~, ~~2000, pp~~. ~~114-119 vol~~.~~1, doi:~~ 10.1109/~~IJCNN~~.2000.857823.</ref>. But learning overly specific with the training dataset could sometimes also expose the model to the risk of overfitting<ref name=":1" />. Therefore, tuning such parameters is quite tricky and often costs days or even weeks before finding the best results.		This is just a simple demonstration of the SGD process. In actual practice, more epochs can be utilized to run through the entire dataset enough times to ensure the best learning results based on the training dataset<ref name=":1">Lawrence, S., & Giles, C. L. (2000). Overfitting and neural networks: conjugate gradient and backpropagation. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, 1, 114–119. https://doi.org/10.1109/ijcnn.2000.857823</ref>. But learning overly specific with the training dataset could sometimes also expose the model to the risk of overfitting<ref name=":1" />. Therefore, tuning such parameters is quite tricky and often costs days or even weeks before finding the best results.

	==Application==		==Application==

Too5: /* Logistic regression */

2020-12-08T04:22:05Z

Logistic regression

← Older revision		Revision as of 00:22, 8 December 2020
Line 90:		Line 90:

	===Logistic regression===		===Logistic regression===
	Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. Logistic regression has two phases: training, and testing: The system, specifically the weights w and b, is trained using ~~SGD~~ and the cross-entropy loss.		Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. The objective of training a machine learning model is to minimize the loss or error between ground truths and predictions by changing the trainable parameters. Logistic regression has two phases: training, and testing. The system, specifically the weights w and b, is trained using stochastic gradient descent and the cross-entropy loss.

	===Full Waveform Inversion (FWI)===		===Full Waveform Inversion (FWI)===

Too5: /* Application */

2020-12-08T04:16:52Z

Application

← Older revision		Revision as of 00:16, 8 December 2020
Line 84:		Line 84:

	==Application==		==Application==
	SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Neural networks make up the backbone of deep learning algorithms. A neural network that consists of more than three layers which would be inclusive of the inputs and the output can be considered a deep learning algorithm. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.		SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. [[wikipedia:Deep_learning\|Deep learning]] is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Neural networks make up the backbone of deep learning algorithms. A neural network that consists of more than three layers which would be inclusive of the inputs and the output can be considered a deep learning algorithm. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.

	===Support Vector Machine===		===Support Vector Machine===
Line 99:		Line 99:

	==References==		==References==

			<references />

Too5: /* Application */

2020-12-08T04:10:21Z

Application

← Older revision		Revision as of 00:10, 8 December 2020
Line 84:		Line 84:

	==Application==		==Application==
	SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.		SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Neural networks make up the backbone of deep learning algorithms. A neural network that consists of more than three layers which would be inclusive of the inputs and the output can be considered a deep learning algorithm. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.

	===Support Vector Machine===		===Support Vector Machine===

Too5: /* Logistic regression */

2020-12-08T03:46:45Z

Logistic regression

← Older revision		Revision as of 23:46, 7 December 2020
Line 84:		Line 84:

	==Application==		==Application==
	SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.		SGD, often referred to as the cornerstone for deep learning, is an algorithm for training a wide range of models in machine learning. Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Models are trained by using a large set of labeled data and neural network architectures that contain many layers. Due to SGD’s efficiency in dealing with large scale datasets, it is the most common method for training [https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks deep neural networks]. Furthermore, SGD has received considerable attention and is applied to text classification and [https://en.wikipedia.org/wiki/Natural_language_processing natural language processing]. It is best suited for unconstrained optimization problems and is the main way to train large linear models on very large data sets. Implementation of stochastic gradient descent include areas in [https://en.wikipedia.org/wiki/Tikhonov_regularization ridge regression] and regularized [https://en.wikipedia.org/wiki/Logistic_regression logistic regression]. Other problems, such as Lasso<ref name="Shwartz">Shalev-Shwartz, S. and Tewari, A. (2011) Stochastic methods for ℓ<math>_1</math>-regularized loss minimization. The Journal of Machine Learning Research, 12, 1865–1892. https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf</ref> and support vector machines<ref name=Menon>Menon, A. (2009, February). Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExamTalk.pdf</ref> can be solved by stochastic gradient descent.

	===Support Vector Machine===		===Support Vector Machine===
Line 90:		Line 90:

	===Logistic regression===		===Logistic regression===
	Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. Logistic regression has two phases: training, and testing: The system (specifically the weights w and b) is trained using ~~stochastic gradient descent~~ and the cross-entropy loss		Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. Logistic regression has two phases: training, and testing: The system, specifically the weights w and b, is trained using SGD and the cross-entropy loss.

	===Full Waveform Inversion (FWI)===		===Full Waveform Inversion (FWI)===

Too5: /* Logistic regression */

2020-12-08T03:02:47Z

Logistic regression

← Older revision		Revision as of 23:02, 7 December 2020
Line 90:		Line 90:

	===Logistic regression===		===Logistic regression===
	Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable.		Logistic regression models the [https://en.wikipedia.org/wiki/Probability probabilities] for classification problems with two possible outcomes. It's an extension of the linear regression model for classification problems. It is a statistical technique with the input variables as continuous variables and the output variable as a binary variable. It is a class of regression where the independent variable is used to predict the dependent variable. Logistic regression has two phases: training, and testing: The system (specifically the weights w and b) is trained using stochastic gradient descent and the cross-entropy loss

	===Full Waveform Inversion (FWI)===		===Full Waveform Inversion (FWI)===

Jrp369: /* Forward */

2020-12-07T17:36:09Z

Forward

← Older revision		Revision as of 13:36, 7 December 2020
Line 47:		Line 47:
	'''Dataset:'''		'''Dataset:'''

	In this problem, ~~we set 1 as~~ the batch size~~. And~~ the entire dataset of [ <math>x_1</math>, <math>x_2</math>, <math>y</math>] is given by:		For this problem, the batch size is set to 1 and the entire dataset of [ <math>x_1</math>, <math>x_2</math>, <math>y</math>] is given by:
	{\| class="wikitable"		{\| class="wikitable"
	~~\|+ Caption text~~
	\|-
	! <math>x_1</math> !! <math>x_2</math> !! <math>y</math>		! <math>x_1</math> !! <math>x_2</math> !! <math>y</math>
	\|-		\|-
Line 65:		Line 63:
	\| 6 \|\| 7 \|\| -8		\| 6 \|\| 7 \|\| -8
	\|}		\|}

	===== Gradient Computation and Parameter Update =====		===== Gradient Computation and Parameter Update =====
	The purpose of BP is to obtain the impact of the weights and bias terms for the entire model. The update of the model is entirely dependent on the gradient values. To minimize the loss during the process, the model needs to ensure the gradient is dissenting so that it could finally converge to a global optimal point. All the 3 partial differential equations are shown as:		The purpose of BP is to obtain the impact of the weights and bias terms for the entire model. The update of the model is entirely dependent on the gradient values. To minimize the loss during the process, the model needs to ensure the gradient is dissenting so that it could finally converge to a global optimal point. All the 3 partial differential equations are shown as:

Jrp369 at 17:34, 7 December 2020

2020-12-07T17:34:07Z

← Older revision		Revision as of 13:34, 7 December 2020
Line 48:		Line 48:

	In this problem, we set 1 as the batch size. And the entire dataset of [ <math>x_1</math>, <math>x_2</math>, <math>y</math>] is given by:		In this problem, we set 1 as the batch size. And the entire dataset of [ <math>x_1</math>, <math>x_2</math>, <math>y</math>] is given by:
	{\|		{\| class="wikitable"
	\|+		\|+ Caption text
	\|-		\|-
	~~\|1)~~		! <math>x_1</math> !! <math>x_2</math> !! <math>y</math>
	~~\| 4~~
	~~\| 1~~
	~~\| 2~~
	\|-		\|-
	\|2)		\| 4 \|\| 1 \|\| 2
	\| 2
	\| 8
	\| ~~-14~~
	\|-		\|-
	\|3)		\| 2 \|\| 8 \|\| -14
	\| 1
	\| 0
	\| 1
	\|-		\|-
	\|4)		\| 1 \|\| 0 \|\| 1
	\| 3
	\| 2
	\| -1
	\|-		\|-
	\|5)		\| 3 \|\| 2 \|\| -1
	\| 1
	\| 4
	\| -7
	\|-		\|-
	\|6)		\| 1 \|\| 4 \|\| -7
	\| 6		\|-
	\| 7		\| 6 \|\| 7 \|\| -8
	\| -8
	\|}		\|}
	~~</blockquote>~~

	===== Gradient Computation and Parameter Update =====		===== Gradient Computation and Parameter Update =====
	The purpose of BP is to obtain the impact of the weights and bias terms for the entire model. The update of the model is entirely dependent on the gradient values. To minimize the loss during the process, the model needs to ensure the gradient is dissenting so that it could finally converge to a global optimal point. All the 3 partial differential equations are shown as:		The purpose of BP is to obtain the impact of the weights and bias terms for the entire model. The update of the model is entirely dependent on the gradient values. To minimize the loss during the process, the model needs to ensure the gradient is dissenting so that it could finally converge to a global optimal point. All the 3 partial differential equations are shown as:

Joshua.mathews76: /* Conclusion */

2020-12-01T00:10:09Z

Conclusion

← Older revision		Revision as of 20:10, 30 November 2020
Line 3:		Line 3:

	== Introduction ==		== Introduction ==
	'''Stochastic gradient descent''' (abbreviated as '''SGD''') is an iterative method often used for [https://en.wikipedia.org/wiki/Machine_learning machine learning], optimizing the [https://en.wikipedia.org/wiki/Gradient_descent gradient descent] during each search once a random weight vector is picked. The gradient descent is a strategy that searches through a large or infinite hypothesis space whenever 1) there are hypotheses continuously being parameterized and 2) the errors are differentiable based on the parameters. The problem with gradient descent is that [https://en.wikipedia.org/wiki/Convergence_(logic) converging] to a [https://en.wikipedia.org/wiki/Maxima_and_minima local minimum] takes extensive time and determining a global minimum is not guaranteed.<ref name=McGrawHill2003>Mitchell, T. M. (1997). Machine Learning (1st ed.). McGraw-Hill Education. Page 92. ISBN 0070428077.</ref> The gradient descent ~~picks any random weight vector and~~ continuously updates it incrementally when an error calculation is completed to improve convergence.<ref name="Needell=">Needell, D., Srebro, N., & Ward, R. (2015, January). Stochastic gradient descent weighted sampling, and the randomized Kaczmarz algorithm. https://arxiv.org/pdf/1310.5715.pdf</ref> The method seeks to determine the steepest descent and it reduces the number of [https://en.wikipedia.org/wiki/Iteration iterations] and the time taken to search large quantities of data points. Over the recent years, the data sizes have increased immensely such that current processing capabilities are not enough.<ref name=Bottou1991>Bottou, L. (1991) Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91. https://leon.bottou.org/publications/pdf/nimes-1991.pdf</ref> Stochastic gradient descent is being used in [https://en.wikipedia.org/wiki/Neural_network neural networks] and decreases machine computation time while increasing complexity and performance for large-scale problems.<ref name=bottou2012>Bottou, L. (2012) Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, 421– 436. Springer.</ref>		'''Stochastic gradient descent''' (abbreviated as '''SGD''') is an iterative method often used for [https://en.wikipedia.org/wiki/Machine_learning machine learning], optimizing the [https://en.wikipedia.org/wiki/Gradient_descent gradient descent] during each search once a random weight vector is picked. The gradient descent is a strategy that searches through a large or infinite hypothesis space whenever 1) there are hypotheses continuously being parameterized and 2) the errors are differentiable based on the parameters. The problem with gradient descent is that [https://en.wikipedia.org/wiki/Convergence_(logic) converging] to a [https://en.wikipedia.org/wiki/Maxima_and_minima local minimum] takes extensive time and determining a global minimum is not guaranteed.<ref name=McGrawHill2003>Mitchell, T. M. (1997). Machine Learning (1st ed.). McGraw-Hill Education. Page 92. ISBN 0070428077.</ref> In SGD, the user initializes the weights and the process updates the weight vector using one data point<ref name="bishop" />. The gradient descent continuously updates it incrementally when an error calculation is completed to improve convergence.<ref name="Needell=">Needell, D., Srebro, N., & Ward, R. (2015, January). Stochastic gradient descent weighted sampling, and the randomized Kaczmarz algorithm. https://arxiv.org/pdf/1310.5715.pdf</ref> The method seeks to determine the steepest descent and it reduces the number of [https://en.wikipedia.org/wiki/Iteration iterations] and the time taken to search large quantities of data points. Over the recent years, the data sizes have increased immensely such that current processing capabilities are not enough.<ref name=Bottou1991>Bottou, L. (1991) Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91. https://leon.bottou.org/publications/pdf/nimes-1991.pdf</ref> Stochastic gradient descent is being used in [https://en.wikipedia.org/wiki/Neural_network neural networks] and decreases machine computation time while increasing complexity and performance for large-scale problems.<ref name=bottou2012>Bottou, L. (2012) Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, 421– 436. Springer.</ref>

	== Theory ==		== Theory ==
Line 112:		Line 112:

	===Full Waveform Inversion (FWI)===		===Full Waveform Inversion (FWI)===
	The Full Waveform Inversion (FWI) is a [https://en.wikipedia.org/wiki/Geophysical_imaging seismic imaging] process by drawing information from the physical parameters of samples. Companies use the process to produce high-resolution high velocity depictions of subsurface activities. ~~SDG~~ supports the process because it can identify the minima and the overall global minimum in less time as there are many local minimums.<ref name=witte>Witte, P., Louboutin, M., Lensink, K., Lange, M., Kukreja, N., Luporini, F., Gorman, G., Herrmann, F.J.; Full-waveform inversion, Part 3: Optimization. The Leading Edge ; 37 (2): 142–145. doi: https://doi.org/10.1190/tle37020142.1</ref>		The Full Waveform Inversion (FWI) is a [https://en.wikipedia.org/wiki/Geophysical_imaging seismic imaging] process by drawing information from the physical parameters of samples. Companies use the process to produce high-resolution high velocity depictions of subsurface activities. SGD supports the process because it can identify the minima and the overall global minimum in less time as there are many local minimums.<ref name=witte>Witte, P., Louboutin, M., Lensink, K., Lange, M., Kukreja, N., Luporini, F., Gorman, G., Herrmann, F.J.; Full-waveform inversion, Part 3: Optimization. The Leading Edge ; 37 (2): 142–145. doi: https://doi.org/10.1190/tle37020142.1</ref>

	==Conclusion==		==Conclusion==
	~~Stochastic Gradient Descent~~ is an algorithm that seeks to find the steepest descent during each iteration. The process decreases the time it takes to search large data sets and determine local minima immensely. The ~~process helps determine the global minimum. The SDG~~ provides many applications in machine learning, geophysics, least mean squares (LMS), and other areas.		SGD is an algorithm that seeks to find the steepest descent during each iteration. The process decreases the time it takes to search large data sets and determine local minima immensely. The SGD provides many applications in machine learning, geophysics, least mean squares (LMS), and other areas.

	==References==		==References==