Gated Recurrent Units (GRUs) for Natural Language Processing

In the previous articles on Recurrent Neural Networks and Long Short-Term Memory networks, we have seen how these networks work efficiently to solve problems related to NLP. In this article, we focus on yet another variation of RNNs called Gated Recurrent Units.

Introduction to GRUs

Gated Recurrent Units (GRUs) is another popular variant of the Recurrent Neural Networks. GRUs just like LSTMs have gating units (gates) that help the network to store previous state information and use it to make accurate predictions.

While discussing LSTMs in a previous post, when we printed out the summary of model parameters, we saw that there were lakhs of parameters. These many parameters mean that the computation power required to run the model is pretty expensive.

GRUs try to address this problem. How does GRU do this? If you recall, in LSTMs we had three gates in an LSTM cell. A GRU cell modifies this architecture a bit to include only two gates instead of three.

A GRU cell

To understand the working of GRUs better, we have to focus on the architecture of a single GRU cell. A GRU cell is shown in the image below.

Gru Cell

GRUs make use of only two gates instead of three as we had seen in the LSTM. The two gates used in GRUs are the update gate and the reset gate.

If we compare this to the LSTM cell architecture, GRUs combine the forget and input gate into a single gate called the update gate. The other gate is the update gate. Thus by using only two gates, GRUs substantially reduce the parameters and in turn the computation power required.

Based on the output of these two gates, it is decided what to send across as the output from this cell and how the hidden state is to be updated. The content state holds the new information. The number of parameters used in the model is substantially reduced.

The update gate

The update gate is shown in the cell image of the GRU. The two inputs to the GRU cell just like the LSTM cell are the input we provide at the current or present state and the hidden state input from the previous time step. xt and ht-1 are the two inputs.

These inputs are then summed up (shown by a plus sign in the image) and then they are passed through the sigmoid (shown by sigma) function.

The main job of the update gate is to update the memory cell when the output is 1 and don’t update anything when it is 0. 1 and 0 are the two outputs we generally get from the sigmoid function. Essentially, it helps the model decide how much of the information should be passed to the next step.

The reset gate

The reset gate in GRU is basically used to help the model decide how much of the information to forget from the past.

The reset gate is shown in the diagram. If you observe the output rt then we have two inputs that are being added and then passed to the sigmoid function.

Applications of Gated Recurrent Units

Gated Recurrent Units solve the problem of vanishing gradients faced by traditional RNNs. They also do perform better than LSTMs on smaller datasets. There are many applications of GRUs, some of which are

  • Polyphonic music modelling
  • Speech signal modelling
  • Handwriting recognition

Final Thoughts

In this article, we discussed how Gated Recurrent Units (GRUs) that are a variation of RNNs help us overcome the vanishing gradient problem, we then looked at the architecture of the GRU cell, we discussed its gates and working and finally, we saw some applications of GRUs.