d388: Recurrent neural network using attention

Recurrent neural network using attention: “Machine Learning on Sequential Data Using a Recurrent Weighted Average” https://arxiv.org/abs/1703.01253 (PDF)

Source: https://github.com/jostmey/rwa | Download: zip

Discussion: https://news.ycombinator.com/item?id=13817017

Abstract:

Recurrent neural network using attention

Recurrent Neural Networks (RNN) are a type of statistical model designed to handle sequential data. The model reads a sequence one symbol at a time. Each symbol is processed based on information collected from the previous symbols. With existing RNN architectures, each symbol is processed using only information from the previous processing step. To overcome this limitation, we propose a new kind of RNN model that computes a recurrent weighted average (RWA) over every past processing step. Because the RWA can be computed as a running average, the computational overhead scales like that of any other RNN. The approach essentially reformulates the attention mechanism into a stand-alone model. When assessing a RWA model, it is found to train faster and generalize better than a standard LSTM model when performing the variable copy problem, the adding problem, classification of artificial grammar, classification of sequences by length, and classification of MNIST handwritten digits (where the pixels are read sequentially one at a time).