Translating Phrases in Neural Machine Tanslation

Xing Wang, Zhaopeng Tu, Deyi Xiong, Min Zhang

Problematic:

NMT uses mostly word-by-word (or char) generation; difficult to translate mutli-word expression/phrases/axioms... meaning of the phrase > sum of the words' meanings.

Proposed solution:

Integrate a phrase-based SMT into the NMT model. The SMT guided by the NMT propose a set of relevant phrases, The NMT scores the propsed phrases and select the most probable.

Syntactic chunk inforation is integrated into the encoder to assist the NMT decoder.

A sequence \(y\) can be decomposed with words \((w_1,....w_K)\) generated by the NMT and phrases \((p_1,...p_L)\) generated by the SMT.

The probability of generating the sequence is defined as: \[ p(y) = \prod_w (1-\lambda_{t(w)})P_{word}(w) \times \prod_p \lambda_{t(p)} P_{phrase}(p) \] \(t(.)\) is the decoding step corresponding to the word (resp. the phrase) \(\lambda\) is estimated the balancer an MLP taking for input the NMT's context vector, the previous decoding stat and the previously generated word. Intuitively, it's the importance weight of phrase over the word.

## To be continued...

## Check the followings: - Encoder-Decoder models with attached external structures (Gulcehre at al 2016, Gu et al 2016, Tang et al 2016 and Wang et al. 2017)