Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu; Jiatao Gu; Naman Goyal; Xian Li; Sergey Edunov; Marjan Ghazvininejad; Mike Lewis; Luke Zettlemoyer

doi:10.1162/tacl_a_00343

Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu(Bircham International University), Jiatao Gu(Meta (Israel)), Naman Goyal(Meta (Israel)), Xian Li(Meta (Israel)), Sergey Edunov(Meta (Israel)), Marjan Ghazvininejad(Meta (Israel)), Mike Lewis(Meta (Israel)), Luke Zettlemoyer(Meta (Israel))

Transactions of the Association for Computational Linguistics

November 25, 2020

10.1162/tacl_a_00343

Cited by 1,015Open Access

Full Text

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019 ). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training. 1

Related Papers

No related papers found

Powered by citation graph analysis