A multimodal generative AI copilot for human pathology

Ming Y. Lu; Bowen Chen; Drew F. K. Williamson; Richard J. Chen; Melissa Zhao; Aaron K. Chow; Kenji Ikemura; Ahrong Kim; Dimitra Pouli; Ankush Patel; Amr Soliman; Chengkuan Chen; Tong Ding; Judy J. Wang; Georg K. Gerber; Ivy Liang; Long P. Le; Anil V. Parwani; Luca L. Weishaupt; Faisal Mahmood

doi:10.1038/s41586-024-07618-3

A multimodal generative AI copilot for human pathology

Ming Y. Lu(Broad Institute), Bowen Chen(Brigham and Women's Hospital), Drew F. K. Williamson(Broad Institute), Richard J. Chen(Broad Institute), Melissa Zhao(Brigham and Women's Hospital), Aaron K. Chow(The Ohio State University Wexner Medical Center), Kenji Ikemura(Brigham and Women's Hospital), Ahrong Kim(Brigham and Women's Hospital), Dimitra Pouli(Brigham and Women's Hospital), Ankush Patel(Mayo Clinic in Arizona), Amr Soliman(The Ohio State University Wexner Medical Center), Chengkuan Chen(Brigham and Women's Hospital), Tong Ding(Brigham and Women's Hospital), Judy J. Wang(Brigham and Women's Hospital), Georg K. Gerber(Brigham and Women's Hospital), Ivy Liang(Brigham and Women's Hospital), Long P. Le(Harvard University), Anil V. Parwani(The Ohio State University Wexner Medical Center), Luca L. Weishaupt(Brigham and Women's Hospital), Faisal Mahmood(Broad Institute)

Nature

June 12, 2024

10.1038/s41586-024-07618-3

Cited by 345Open Access

Full Text

Abstract

Abstract Computational pathology 1,2 has witnessed considerable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders 3,4 . However, despite the explosive growth of generative artificial intelligence (AI), there have been few studies on building general-purpose multimodal AI assistants and copilots 5 tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We built PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and fine-tuning the whole system on over 456,000 diverse visual-language instructions consisting of 999,202 question and answer turns. We compare PathChat with several multimodal vision-language AI assistants and GPT-4V, which powers the commercially available multimodal general-purpose AI assistant ChatGPT-4 (ref. 6 ). PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases with diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive vision-language AI copilot that can flexibly handle both visual and natural language inputs, PathChat may potentially find impactful applications in pathology education, research and human-in-the-loop clinical decision-making.

Related Papers

No related papers found

Powered by citation graph analysis