Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush (Harvard NLP)
In progress | arxiv | code, Pytorch |
Current sota on IWSLT De-En
The current approach of learning soft alignment (attention) does not marginalize over latent alignments in a probabilistic sense. Unlike with hard attention, it's difficult to compare the soft attention to other alignment models, yet it's easier to train. Altratively to both soft and hard attention, this paper introduce a 'variational attention' model based on Amortized Variational Inference (AVI).
img
img
img
The distibution \(\mathcal D\) can be a categorical over \(\{1, ...T\}\) or a Dirichlet (relaxed alignment)
The prediction function \(f\) is a softmax conditioned on \(X,z\): [ f(x, z) = softmax (W Xz) ]
img
img
NMT: IWSLT'14 De-En (Edunov's setup. |bpe types|=14k)
img