Long Short Term Memory (LSTM) architecture

Implementation of a LSTM (Long Short Term Memory) - Recurrent Neural Network (RNN) architecture
None
Share this article:

bla bla

 

An artificial neuron ..

A Recurent Neural Network (RNN) is a ...

More info will come soon ...

The figure below shows a LSTM cell. It processes data sequentially and keeps its hidden state through time. A LSTM cell can be split into vectors h_t; the short-term state ("h" stands for "hidden"), and c_t; the long-term state ("c" stands for "cell").

 

In the above figure we point with arrows to the 3 LTSM gates. The forget_gate ⓧ is responsible whether to forget the cell (i.e. it controls which parts of the long-term state should be erased), The input_gate ⓧ is responsible whether to write the cell (i.e. it controls which part should be added to the long-term state). The output_gate ⓧ is responsible on how much is revealed (i.e.it controls which parts of the long-term state should be read and output as time step).

Both input_gate ⓧ and output_gate ⓧ use a tanh activation function. tanh activation function is chosen (versus / in addition to sigmoid) in order to overcome sigmoid's vanishing gradient problem. Here a function is needed whose second derivate can sustain for a long range before going to zero.

 

More info will come soon ...