Difference between revisions of "RMSProp"

From Cornell University Computational Optimization Open Textbook - Optimization Wiki
Jump to navigation Jump to search
m
Line 8: Line 8:
  
 
==Theory and Methodology==
 
==Theory and Methodology==
 +
  
  
 
'''Artificial Neural Network'''
 
'''Artificial Neural Network'''
  
Artificial Neural Network can be regarded as the human brain and conscious center of Ariticial Intelligence(AI), presenting the imitation of what the mind will be when human thinking. Scientists are trying to build the concept of ANN close real neurons with their biological ‘parent’.
+
Artificial Neural Network can be regarded as the human brain and conscious center of Aritifical Intelligence(AI), presenting the imitation of what the mind will be when human thinking. Scientists are trying to build the concept of ANN close real neurons with their biological ‘parent’.
[[File:Neuron.png|thumb|A single neuron presented as a mathematic function ]]
+
[[File:Neuron.png|thumb|A single neuron presented as a mathematic function ]]And the function of neurons can be presented as:
<gallery>
 
</gallery>
 
  
And the function of neurons can be presented as:
 
  
 
<math>f (x_{1},x_{2}) = max(0,  w_{1} x_{1} + w_{2}  x_{2}) </math>
 
<math>f (x_{1},x_{2}) = max(0,  w_{1} x_{1} + w_{2}  x_{2}) </math>
 +
  
 
Where <math>x_{1},x_{2} </math> are two inputs numbers, and function <math>f (x_{1},x_{2}) </math> will takes these fixed inputs and create an output of single number. If <math>w_{1} x_{1} + w_{2}  x_{2} </math> is greater than 0, the function will return this positive value, or return 0 otherwise. Therefore, the neural network can be replaced as a coupled mathematical function, and its output of a previous function can be used as the next function input.
 
Where <math>x_{1},x_{2} </math> are two inputs numbers, and function <math>f (x_{1},x_{2}) </math> will takes these fixed inputs and create an output of single number. If <math>w_{1} x_{1} + w_{2}  x_{2} </math> is greater than 0, the function will return this positive value, or return 0 otherwise. Therefore, the neural network can be replaced as a coupled mathematical function, and its output of a previous function can be used as the next function input.
Line 26: Line 25:
  
 
RProp, or we call Resilient Back Propagation, is the widely used algorithm for supervised learning with multi-layered feed-forward networks in the past. Besides, its concepts is the foundation of RMSPRop development t. The derivatives equation of error function can be represented as:
 
RProp, or we call Resilient Back Propagation, is the widely used algorithm for supervised learning with multi-layered feed-forward networks in the past. Besides, its concepts is the foundation of RMSPRop development t. The derivatives equation of error function can be represented as:
 +
 +
 +
<math> \frac{\partial E}{\partial w_{ij}} =  \frac{\partial E}{\partial s_{i}} \frac{\partial s_{i}}{\partial net_{i}} \frac{\partial net_{i}}{\partial w_{ij}}</math>
 +
 +
 +
Where <math>w_{ij}</math> is the weight from neuron <math>j</math> to neuron <math>i</math>, <math>s_{i}</math> is the output , and <math>net_{i}</math> is the weighted sum of the inputs of neurons <math>i</math>. Once the weight of each partial derivatives is known, the error function can be presented by performing a simple gradient descent:
 +
 +
 +
<math>w_{ij}(t+1) = w_{ij}(t) - \epsilon \frac{\partial E}{\partial w_{ij}}(t)</math>
 +
 +
(reference required)
 +
 +
Obviously, the choice of the learning rate <math>\epsilon</math>, which scales the derivative, has an important effect on the time needed until convergence is reached. If it is set too small, too many steps are needed to reach an acceptable solution; on the contrary a large learning rate will possibly lead to oscillation, preventing the error to fall below a certain value.
 +
 +
In addition, RProp can combine the method with momentum method, to prevent above problem and to accelerate the convergence rate, the equation can rewrite as:
 +
 +
<math> \Delta w_{ij}(t) = \epsilon \frac{\partial E}{\partial w_{ij}}(t) +  \Delta w_{ij}(t-1) </math>

Revision as of 04:02, 19 November 2020

Author: Jason Huang (SysEn 6800 Fall 2020)

Steward: Allen Yang, Fengqi You

Introduction

RMSProp, so call root mean square propagation, is an optimization algorithm/method dealing with Artificial Neural Network (ANN) for machine learning. It is also a currently developed algorithm compared to the Stochastic Gradient Descent (SGD) algorithm, momentum method. And even one of the foundations of Adam algorithm development. It is an unpublished optimization algorithm, using the adaptive learning rate method, first proposed in the Coursera course “Neural Network for Machine Learning” lecture six by Geoff Hinton. Astonished is that this informally revealed, an unpublished algorithm is intensely famous nowadays.

Theory and Methodology

Artificial Neural Network

Artificial Neural Network can be regarded as the human brain and conscious center of Aritifical Intelligence(AI), presenting the imitation of what the mind will be when human thinking. Scientists are trying to build the concept of ANN close real neurons with their biological ‘parent’.

A single neuron presented as a mathematic function

And the function of neurons can be presented as:



Where are two inputs numbers, and function will takes these fixed inputs and create an output of single number. If is greater than 0, the function will return this positive value, or return 0 otherwise. Therefore, the neural network can be replaced as a coupled mathematical function, and its output of a previous function can be used as the next function input.

RProp

RProp, or we call Resilient Back Propagation, is the widely used algorithm for supervised learning with multi-layered feed-forward networks in the past. Besides, its concepts is the foundation of RMSPRop development t. The derivatives equation of error function can be represented as:



Where is the weight from neuron to neuron , is the output , and is the weighted sum of the inputs of neurons . Once the weight of each partial derivatives is known, the error function can be presented by performing a simple gradient descent:


(reference required)

Obviously, the choice of the learning rate , which scales the derivative, has an important effect on the time needed until convergence is reached. If it is set too small, too many steps are needed to reach an acceptable solution; on the contrary a large learning rate will possibly lead to oscillation, preventing the error to fall below a certain value.

In addition, RProp can combine the method with momentum method, to prevent above problem and to accelerate the convergence rate, the equation can rewrite as: