Long Short Term Memory(LSTM)
Before we learn about LSTM lets understand, why we use LSTM instead of
feed forward network and RNN?.
Problem with feed forward networks:
Feed forward networks won't consider previous output to get the new output, but in time series data like bitcoin price prediction, one need to know the previous prices to calculate future price, so feed forward networks won’t work
well on the time series and other sequential data, as their outputs are
dependent on the past values.
Recurrent neural network:
RNN has multiple states, each acts as a temporary memory and stores the output of it. This output along with the new input at time t is
considered while calculating the output of new state. Each state occurs in a sequential time stamp and stores previous information for a short period of time.
RNN works with recursive formula:
`h_t = tanh(w_s \cdot h_{t-1} + w_x \cdot x_t)`
`o_t = w_y \cdot h_t`
where, `h_t` is the new state.
`h_{t-1}` is the previous state.
`x_t` is the input at time t.
tanh is the activation function.
`w_s`, `w_x` and `w_y` are corresponding weights.
`o_t` is the output.
Recurrent Neural network learn through back propagation through time
(i.e) uses back propagation for every time stamp so, we calculate the loss and
go back to each state and update the weights by multiplying gradient (i.e) we
will find change in weight and add it with the old weight. If the gradient is
too small then update in weight is negligible, so that the RNN won’t learn
at all, this is problem is called vanishing gradient and to solve this we use
Long Short Term Memory(LSTM) model.
Long Short Term Memory:
LSTM has three gates such as Forget gate, Input gate and output gate
and an intermediate cell state. When new input comes in, the forget gate
identifies those information which are not required from the previous output and excludes it by adding the previous output and the new input which
are multiplied by its corresponding weights and applying sigmoid activation
function to the sum. The input gate stores the new information given by the
user. Input gate and output gate are calculated in similar way as forget gate
but the weights are different for each gate. In intermediate cell state we do
the same thing what we did in input state but instead of sigmoid activation
function we use tanh activation function, then this cell is updated by adding
forget gate and input gate which are multiplied by the corresponding cell
states. By doing so, the required information is stored for a long period of time.
`f_t = \sigma(w_h1 \cdot h_{t-1} + w_x1 \cdot x_t)`
`i_t = \sigma(w_h2 \cdot h_{t-1} + w_x2 \cdot x_t)`
`c_t = tanh(w_h3 \cdot h_{t-1} + w_x3 \cdot x_t)`
`c_t = f_t * c_{t-1} +i_t * c_t`
`o_t = \sigma(w_h4 \cdot h_{t-1} + w_x4 \cdot x_t)`
`h_t = o_t * tanh(c_t)`
Where `f_t` is the forget gate.
`i_t` is the input state.
`c_t` is the intermediate cell state.
`o_t` is the output gate.
`h_t` is the output.
The LSTM can be used in NLP tasks like finding missing sentences and other sequential data.
we are here to explain about the booming technical field Artificial intellingence and the fields such as data science, machine learning, deep learning etc. which concerns with AI
ConversionConversion EmoticonEmoticon