A Recurrent Neural Network (RNN) is a type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or spoken words. A distinguishing feature of RNNs is their ability to use their internal memory to process arbitrary sequences of inputs, which makes them applicable in tasks such as unsegmented, connected handwriting recognition, or speech recognition.
Here are some keywords and definitions associated with RNNs:
- Sequences: RNNs are specifically designed to work with sequence data. In this context, a sequence is a list of numbers or items in a particular order. For instance, a sentence can be considered a sequence of words.
- Hidden State: This is the “memory” of the RNN. It holds information about previous inputs. The hidden state can influence the network’s output and the next hidden state. It is called “hidden” because it’s not visible in the network’s inputs or outputs.
- Backpropagation Through Time (BPTT): This is the training algorithm for RNNs. It involves unrolling the entire sequence, then applying regular backpropagation.
- Vanishing/Exploding Gradient Problem: This refers to the challenge in training RNNs, where the gradients can become very small (vanish) or very large (explode) due to the nature of the computations involved in BPTT.
- Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU): These are special types of RNNs that address the vanishing gradient problem. They have gates: parts of the network that decide how much of the new input and previous hidden state should be stored in the current hidden state.
- Unidirectional and Bidirectional RNNs: A unidirectional RNN only uses past information. In contrast, a bidirectional RNN uses both past and future information.
Here’s an example of how you might implement a simple RNN in Python using the PyTorch library:
import torch
import torch.nn as nn
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
n_input = 50
n_hidden = 100
n_output = 10
rnn = SimpleRNN(n_input, n_hidden, n_output)
# create a random tensor for the inputs
input = torch.randn(1, n_input)
# initialize the hidden state
hidden = torch.zeros(1, n_hidden)
output, next_hidden = rnn(input, hidden)
This is a very simple example. Real-world applications typically use LSTMs or GRUs and often in several layers. They may also incorporate dropout layers or batch normalization.
Reference: https://www.ibm.com/topics/recurrent-neural-networks