MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsDeyao Zhu, Mohamed Elhoseiny, Jun Chen et al.|arXiv (Cornell University)|2023Cited by 477
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningJun Chen, Mohamed Elhoseiny, Vikas Chandra et al.|arXiv (Cornell University)|2023Cited by 66
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual DescriptionsDeyao Zhu, Mohamed Elhoseiny, Xiaoqian Shen et al.|arXiv (Cornell University)|2023Cited by 40