BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsJunnan Li, Steven C. H. Hoi|arXiv (Cornell University)|2023Cited by 913