Stochastic Gradient Descent algorithm #3

jairodelgado · 2019-06-04T20:19:32Z

Gradient based algorithms are the default training algorithms for ANN. Hence, providing support for such algorithms (SGD, ADAM, RMSProp, etc.) is critical in order to provide out of the box benchmarking capabilities. Our suggestion is to proceed as follows:

Create a new class in base package (e.g. class derivable : public solution {} that will inherit from solution class). derivable should have an array of floats (e.g. derivable::m_df) that represents the derivative of the solution::fitness() function respect each parameter in solution::get_params(). Hence, derivable::get_df() size will be solution::size().
Define a getter in derivable class (e.g. derivable:df()) that, in case the solution was modified will calculate the derivative of the fitness function and store the result in derivable::m_df array. In case the solution is not modified, it will simply return derivable::m_df. The implementation of this method should be the same as in the current solution::fitness() method.
As in the case of solution::fitness() and solution::calculate_fitness() consider an implementation of derivable:df() and a protected virtual derivable:calculate_df() = 0 method in derivable. It is probably a good idea to do not create derivable::m_df before the first derivable:df() call just in case the derivative gets never used.
Each child of derivable in solutions package should re-implement its own version of virtual derivable:calculate_df() = 0 according to the fitness function (only if the fitness function is derivable of course). This means that the network class should inherit from derivableinstead of solution and implement virtual derivable::calculate_df() = 0.
The network:calculate_df() implementation will call a layer::backprop() method defined in the layer class, passing the position in the derivable::m_df array where the layer will store the derivative of its corresponding parameters. layer::backprop() method should be similar to the current layer::prop() method.
Each child of layer in package layers should re-implement its own version of virtual layer:backprop() = 0. Currently there should be a single layer fc (fully connected layers) implemented in the library.
Create a new class in algorithms package class sgd : public algorithm that using the derivative and the fitness function of a derivable solution can implement Stochastic Gradient Descent Algorithm.

The text was updated successfully, but these errors were encountered:

jairodelgado added enhancement help wanted labels Jun 4, 2019

jairodelgado removed the help wanted label Jan 26, 2020

jairodelgado changed the title ~~Support for gradient based optimization algorithms~~ Stochastic Gradient Descent algorithm Jan 26, 2020

jairodelgado added this to the v.1.1 milestone Feb 8, 2020

jairodelgado assigned ghost Feb 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic Gradient Descent algorithm #3

Stochastic Gradient Descent algorithm #3

jairodelgado commented Jun 4, 2019 •

edited

Loading

Stochastic Gradient Descent algorithm #3

Stochastic Gradient Descent algorithm #3

Comments

jairodelgado commented Jun 4, 2019 • edited Loading

jairodelgado commented Jun 4, 2019 •

edited

Loading