Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Babak Ehteshami Bejnordi(Radboud University Nijmegen), Mitko Veta(Eindhoven University of Technology), Paul Johannes van Diest(University Medical Center Utrecht), Bram van Ginneken(Radboud University Nijmegen), Nico Karssemeijer(Radboud University Nijmegen), Geert Litjens(Radboud University Nijmegen), Jeroen van der Laak(Radboud University Nijmegen), Meyke Hermsen(Radboud University Nijmegen), Quirine F. Manson(University Medical Center Utrecht), Maschenka Balkenhol(Radboud University Nijmegen), Oscar Geessink(Radboud University Nijmegen), Nikolas Stathonikos(University Medical Center Utrecht), Marcory CRF van Dijk(Rijnstate Hospital), Peter Bult(Radboud University Nijmegen), Francisco Beça(Beth Israel Deaconess Medical Center), Andrew H. Beck(Beth Israel Deaconess Medical Center), Dayong Wang(Beth Israel Deaconess Medical Center), Aditya Khosla(Critical Path Institute), Rishab Gargeya, Humayun Irshad(Beth Israel Deaconess Medical Center), Aoxiao Zhong(Harvard University), Qi Dou(Harvard University), Quanzheng Li(Harvard University), Hao Chen(Chinese University of Hong Kong), Huangjing Lin(Chinese University of Hong Kong), Pheng‐Ann Heng(Chinese University of Hong Kong), Christian Haß, Elia Bruni, Q. K. Wong(Munich Business School), Uğur Halıcı(Middle East Technical University), Mustafa Ümit Öner(Middle East Technical University), Rengül Çetin-Atalay(Middle East Technical University), Matt Berseth, Vitali Khvatkov(Heart Imaging Technologies (United States)), Alexei Vylegzhanin(Heart Imaging Technologies (United States)), Oren Kraus(University of Toronto), Muhammad Shaban(University of Warwick), Nasir Rajpoot(National Health Service), Ruqayya Awan(Qatar University), Korsuk Sirinukunwattana(University of Warwick), Talha Qaiser(University of Warwick), Yee‐Wah Tsang(National Health Service), David Tellez(Radboud University Nijmegen), Jonas Annuscheit(HTW Berlin - University of Applied Sciences), Peter Hufnagl(HTW Berlin - University of Applied Sciences), Mira Valkonen(Tampere University), Kimmo Kartasalo(HTW Berlin - University of Applied Sciences), Leena Latonen(Tampere University), Pekka Ruusuvuori(HTW Berlin - University of Applied Sciences), Kaisa Liimatainen(HTW Berlin - University of Applied Sciences), Shadi Albarqouni(Technical University of Munich), Bharti Mungal(Technical University of Munich), Ami George(Technical University of Munich), Stefanie Demirci(Technical University of Munich), Nassir Navab(Technical University of Munich), Seiryo Watanabe(The University of Osaka), Shigeto Seno(The University of Osaka), Yoichi Takenaka(The University of Osaka), Hideo Matsuda(The University of Osaka), Hady Ahmady Phoulady(University of South Florida), Vassili Kovalev(United Institute of Informatics Problems), Alexander Kalinovsky(United Institute of Informatics Problems), Vitali Liauchuk(United Institute of Informatics Problems), Gloria Bueno(University of Castilla-La Mancha), M. Milagro Fernández-Carrobles(University of Castilla-La Mancha), Ismael Serrano(University of Castilla-La Mancha), Óscar Déniz(University of Castilla-La Mancha), Daniel Racoceanu(Inserm), Rui Venâncio(Sorbonne Université)
JAMA
December 12, 2017
Cited by 3,275Open Access
Full Text

Abstract

Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.


Related Papers

No related papers found

Powered by citation graph analysis