MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsDeyao Zhu, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 477
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningJun Chen, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 66
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual DescriptionsDeyao Zhu, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 40
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation OnlyJun Chen, Mohamed Elhoseiny, Zhicheng Yan et al.|Unknown|2023Cited by 26