MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsDeyao Zhu, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 477
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image CaptioningJun Chen, Mohamed Elhoseiny, Han Guo et al.|2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)|2022Cited by 172
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningJun Chen, Mohamed Elhoseiny, Vikas Chandra et al.|arXiv (Cornell University)|2023Cited by 66
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual DescriptionsDeyao Zhu, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 40
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior UnderstandingJun Chen, Mohamed Elhoseiny, Michael L. Berumen et al.|Unknown|2023Cited by 36