A multimodal whole-slide foundation model for pathology

Tong Ding(Broad Institute), Sophia J. Wagner(Harvard University), Andrew H. Song(Broad Institute), Richard J. Chen(Broad Institute), Ming Y. Lu(Broad Institute), Andrew Zhang(Broad Institute), Anurag Vaidya(Broad Institute), Guillaume Jaume(Broad Institute), Muhammad Shaban(Broad Institute), Ahrong Kim(Harvard University), Drew F. K. Williamson(Harvard University), Harry Robertson(Broad Institute), Bowen Chen(Broad Institute), Cristina Almagro-Pérez(Broad Institute), Paul Doucet(Broad Institute), Sharifa Sahai(Broad Institute), Chengkuan Chen(Broad Institute), Christina S. Chen(Broad Institute), Daisuke Komura(The University of Tokyo), Akihiro Kawabe(The University of Tokyo), Mieko Ochi(Kanagawa Prefectural Hospital Organization), Shinya Sato(Kanagawa Prefectural Hospital Organization), Tomoyuki Yokose(Kanagawa Prefectural Hospital Organization), Yohei Miyagi(Kanagawa Prefectural Hospital Organization), Shumpei Ishikawa(Harvard University), Georg K. Gerber(Harvard University), Tingying Peng(Center for Environmental Health), Long P. Le(Broad Institute), Faisal Mahmood(Broad Institute)
Nature Medicine
November 1, 2025
Cited by 56Open Access
Full Text

Abstract

The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose Transformer-based pathology Image and Text Alignment Network (TITAN), a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any fine-tuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that it outperforms both ROI and slide foundation models across machine learning settings, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval and pathology report generation. Pretrained using 335,645 whole-slide images, a foundation model is developed to provide representations for slide- and patient-level tasks. It is capable of performing clinical tasks and generating reports even in data-scarce scenarios, such as rare cancer diagnosis and survival prediction, without requiring further fine-tuning.


Related Papers