UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo(Southwest Jiaotong University), Ming Zhou, Botian Shi, Taroon Bharti, Tianrui Li(Shanghai Normal University), Lei Ji, Haoyang Huang, Jason Li, Nan Duan(Microsoft Research Asia (China))
Cited by 169
Related Papers
CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning
|Neurocomputing|2022|676
Predicting citywide crowd flows using deep spatio-temporal residual networks
|Artificial Intelligence|2018|517
Forecasting Fine-Grained Air Quality Based on Big Data
|Unknown|2015|472
Deep Air Quality Forecasting Using Hybrid Deep Learning Framework
|IEEE Transactions on Knowledge and Data Engineering|2019|467
Deep Distributed Fusion Network for Air Quality Prediction
|Unknown|2018|325