UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Huaishao Luo(Southwest Jiaotong University), Ming Zhou, Botian Shi, Taroon Bharti, Tianrui Li(Southwest Jiaotong University), Lei Ji, Haoyang Huang, Jason Li(Queen's University), Nan Duan(Microsoft Research Asia (China))

arXiv (Cornell University)

February 15, 2020

Cited by 169

Related Papers

|Unknown|2020|2.5k

|Neurocomputing|2022|676

|Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)|2022|554

|Artificial Intelligence|2018|517

|Unknown|2015|472