Towards Generalist Biomedical AI

Tao Tu(Google (United States)), Shekoofeh Azizi(Google (United States)), Danny Driess(Google (United States)), Mike Schaekermann(Google (United States)), Mohamed Amin(Google (United States)), Pi-Chuan Chang(Google (United States)), Andrew Carroll(Google (United States)), Charles T. Lau(Google (United States)), Ryutaro Tanno(Google (United States)), Sofia Ira Ktena(Google (United States)), Anil Palepu(Google (United States)), Basil Mustafa(Google (United States)), Aakanksha Chowdhery(Google (United States)), Yun Liu(Google (United States)), Simon Kornblith(Google (United States)), David J. Fleet(Google (United States)), P. Mansfield(Google (United States)), Sushant Prakash(Google (United States)), Renee Wong(Google (United States)), Sunny Virmani(Google (United States)), Christopher Semturs(Google (United States)), S. Sara Mahdavi(Google (United States)), Bradley Green(Google (United States)), Ewa Dominowska(Google (United States)), Blaise Agüera y Arcas(Google (United States)), Joëlle Barral(Google (United States)), Dale R. Webster(Google (United States)), Greg S. Corrado(Google (United States)), Yossi Matias(Google (United States)), K. K. Singhal(Google (United States)), Pete Florence(Google (United States)), Alan Karthikesalingam(Google (United States)), Vivek Natarajan(Google (United States))
NEJM AI
February 22, 2024
Cited by 344

Abstract

BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.ResultsWe observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.ConclusionsAlthough considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. (Funded by Alphabet Inc. and/or a subsidiary thereof.)


Related Papers

No related papers found

Powered by citation graph analysis