# Recurrent Neural Network¶

Note

This part of the documentation is largely referring to Aalto CS-E4890.

RNN is a specialized NN for processing sequential data \(x^{(1)}, \cdots, x^{(\mathcal{T})}\). RNN employs parameter sharing just as CNN does. In CNN it is a kernel applied to a grid within images, and in RNN an n-gram sequence on sentences. RNNs are able to preserve information about the sequence order.

## Vanishing & exploding gradient problem¶

If a computatinal graph is deep and a shared parameters are repeatedly multiplied, as in RNN, there may be either a vanishing or exploding gradient problem. Here’s a simple recurrence without input or activation function:

This can be also presented as several multiplications by the same weight matrix

**W** can be factorized as

We can thus conclude that:

Since the eigenvalues are raised to the power of t, the gradient will explode if the largest eigenvalue is >1, and vanish if the largest eigenvalue is < 1

You can solve this issue by clipping gradients; just clip the gradient if it is larger than the threshold. Clipping can be done element-wise or in vectorized way.