Stochastic gradient descent: Difference between revisions

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 1: Line 1:
'''Stochastic gradient descent''' (abbreviated as '''SGD''') is an iterative method often used for machine learning, optimizing the gradient descent during each search once a random weight vector is picked. The gradient descent is a strategy that searches through a large or infinite hypothesis space whenever 1) there are hypotheses continuously being parameterized and 2) the errors are differentiable based on the parameters. The problem with gradient descent is that converging to a local minimum takes extensive time and determining a global minimum is not guaranteed (McGrawHill, 92). The gradient descent picks any random weight vector and continuously updates it incrementally when an error calculation is completed to improve convergence (Needell, Ward, Nati Srebro, 14). The method seeks to determine the steepest descent and it reduces the number of iterations and the time taken to search large quantities of data points. Over the recent years, the data sizes have increased immensely such that current processing capabilities are not enough (Bottou, 177). Stochastic gradient descent is being used in neural networks and decreases machine computation time while increasing complexity and performance for large-scale problems.
Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Matthews, Taiwo Olorunniwo (SYSEN 5800 Fall 2020)<br>
Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Matthews, Taiwo Olorunniwo (SYSEN 5800 Fall 2020)<br>
Steward: Fengqi You
Steward: Fengqi You
== Introduction ==
[Insert text]


== Problem Formulation ==
== Problem Formulation ==

Revision as of 14:27, 15 November 2020

Stochastic gradient descent (abbreviated as SGD) is an iterative method often used for machine learning, optimizing the gradient descent during each search once a random weight vector is picked. The gradient descent is a strategy that searches through a large or infinite hypothesis space whenever 1) there are hypotheses continuously being parameterized and 2) the errors are differentiable based on the parameters. The problem with gradient descent is that converging to a local minimum takes extensive time and determining a global minimum is not guaranteed (McGrawHill, 92). The gradient descent picks any random weight vector and continuously updates it incrementally when an error calculation is completed to improve convergence (Needell, Ward, Nati Srebro, 14). The method seeks to determine the steepest descent and it reduces the number of iterations and the time taken to search large quantities of data points. Over the recent years, the data sizes have increased immensely such that current processing capabilities are not enough (Bottou, 177). Stochastic gradient descent is being used in neural networks and decreases machine computation time while increasing complexity and performance for large-scale problems.

Authors: Jonathon Price, Alfred Wong, Tiancheng Yuan, Joshua Matthews, Taiwo Olorunniwo (SYSEN 5800 Fall 2020)
Steward: Fengqi You

Problem Formulation

Theory

[Insert text]

Methodology

[Insert text]

Algorithmic Discussion

[Insert text]

Numerical Example

[Insert text]

Application

[Insert text]

Conclusion

[Insert text]

References

  1. [Insert reference]