A clinical benchmark of public self-supervised pathology foundation models

Gabriele Campanella(Icahn School of Medicine at Mount Sinai), Shengjia Chen(Icahn School of Medicine at Mount Sinai), Manbir Singh(Icahn School of Medicine at Mount Sinai), Ruchika Verma(Icahn School of Medicine at Mount Sinai), Silke Muehlstedt(Icahn School of Medicine at Mount Sinai), Jennifer Zeng(Icahn School of Medicine at Mount Sinai), Aryeh Stock(Icahn School of Medicine at Mount Sinai), Matt Croken(Icahn School of Medicine at Mount Sinai), Brandon Veremis(Icahn School of Medicine at Mount Sinai), Abdülkadir Elmas(Icahn School of Medicine at Mount Sinai), Ivan Shujski(Sahlgrenska University Hospital), Noora Neittaanmäki(Sahlgrenska University Hospital), Kuan‐lin Huang(Icahn School of Medicine at Mount Sinai), Ricky Kwan(Icahn School of Medicine at Mount Sinai), Jane Houldsworth(Icahn School of Medicine at Mount Sinai), Adam J. Schoenfeld(Memorial Sloan Kettering Cancer Center), Chad Vanderbilt(Memorial Sloan Kettering Cancer Center)
Nature Communications
April 16, 2025
Cited by 69Open Access
Full Text

Abstract

The use of self-supervised learning to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from three medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training foundation models and selecting appropriate pretrained models. To enable the community to evaluate their models on our clinical datasets, we make available an automated benchmarking pipeline for external use. Self-supervised learning (SSL) is increasingly used to train pathology foundation models. Here, the authors introduce a pathology benchmark set generated during standard clinical workflows that includes multiple cancer and disease types; then leverage it to assess the performance of multiple public SSL pathology foundation models and to provide best practices for model training and selection.


Related Papers

No related papers found

Powered by citation graph analysis