Multimodal Pivots for Image Caption Translation

Julian Hitschler, Shigehiko Schamoni, Stefan Riezler

ACL 2016 | arxiv |

A serie of models for translation that use an intermediate reprensentation (pivot): Setup: From parallel data between (X,Z) and (Z,Y) jointly learn a model to translate from X to Y.

== So far outperformed by the use of two separate models in the case of a single pivot.==

Reference: A correlational Encoder Decoder Architecture for pivot based Sequence generation, Saha et al. 2016.

Multimodal pivots for image caption translation, Hitschler et al. 2016

Check: reranking to leverage text data in the target language: > improving Image Captioning by Concept-based Sentence Reranking