Latent Alignment and Variational Attention

Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush (Harvard NLP)

In progress | arxiv | code, Pytorch |

Current sota on IWSLT De-En

The current approach of learning soft alignment (attention) does not marginalize over latent alignments in a probabilistic sense. Unlike with hard attention, it's difficult to compare the soft attention to other alignment models, yet it's easier to train. Altratively to both soft and hard attention, this paper introduce a 'variational attention' model based on Amortized Variational Inference (AVI).

img

The distibution \(\mathcal D\) can be a categorical over \(\{1, ...T\}\) or a Dirichlet (relaxed alignment)

The prediction function \(f\) is a softmax conditioned on \(X,z\): [ f(x, z) = softmax (W Xz) ]

img

Experiments:

NMT: IWSLT'14 De-En (Edunov's setup. |bpe types|=14k)

img

Latent Alignment and Variational Attention

Experiments:

Issues & comments: