CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioningHuaishao Luo, Tianrui Li, Wen Lei et al.|Neurocomputing|2022Cited by 676
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and GenerationHuaishao Luo, Ming Zhou, Haoyang Huang et al.|arXiv (Cornell University)|2020Cited by 169