Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms

Thomas Schaffter(Sage Bionetworks), Diana S.M. Buist(Kaiser Permanente Washington Health Research Institute), Christoph I. Lee(University of Washington), Yaroslav Nikulin, Dezső Ribli(Eötvös Loránd University), Yuanfang Guan(Michigan Medicine), William Lotter, Zequn Jie(Tencent (China)), Hao Du(National University of Singapore), Sijia Wang(Agency for Integrated Care), Jiashi Feng(National University of Singapore), Mengling Feng(National University Health System), Hyoeun Kim, F. Albiol(Instituto de Física Corpuscular), Alberto Albiol(Universitat Politècnica de València), Stephen Morrell(University College London), Zbigniew Wojna, Mehmet Eren Ahsen(University of Illinois Urbana-Champaign), Umar Asif(IBM Research - Australia), Antonio Jimeno Yepes(IBM Research - Australia), Shivanthan A.C. Yohanandan(IBM Research - Australia), Simona Rabinovici‐Cohen(University of Haifa), Darvin Yi(Stanford University), Bruce Hoff(Sage Bionetworks), Thomas Yu(Sage Bionetworks), Elias Chaibub Neto(Sage Bionetworks), Daniel L. Rubin(Stanford University), Peter Lindholm(Karolinska Institutet), Laurie R. Margolies(Icahn School of Medicine at Mount Sinai), Russell B. McBride(Icahn School of Medicine at Mount Sinai), Joseph H. Rothstein(Icahn School of Medicine at Mount Sinai), Weiva Sieh(Icahn School of Medicine at Mount Sinai), Rami Ben‐Ari(IBM Research - Haifa), Stefan Harrer(IBM Research - Australia), Andrew D. Trister(Fred Hutch Cancer Center), Stephen Friend(Sage Bionetworks), Thea Norman(Bill & Melinda Gates Foundation), Berkman Sahiner(Center for Devices and Radiological Health), Fredrik Strand(Karolinska Institutet), Justin Guinney(Sage Bionetworks), Gustavo Stolovitzky(IBM Research - Thomas J. Watson Research Center), Lester Mackey(Microsoft (United States)), Joyce Cahoon(North Carolina State University), Li Shen(Icahn School of Medicine at Mount Sinai), Jae Ho Sohn(University of California, San Francisco), Hari Trivedi(Emory University), Yiqiu Shen(New York University), Ljubomir Buturović(Palo Alto Institute), José Costa Pereira(INESC TEC), Jaime S. Cardoso(INESC TEC), Eduardo Castro(INESC TEC), Karl Trygve Kalleberg, Obioma Pelka(Essen University Hospital), Imane Nedjar(University of Abou Bekr Belkaïd), Krzysztof J. Geras(New York University), Felix Nensa(Essen University Hospital), Ethan Goan(Queensland University of Technology), Sven Koitka(Dortmund University of Applied Sciences and Arts), L. Caballero(Instituto de Física Corpuscular), David Cox(IBM (United States)), Pavitra Krishnaswamy(Agency for Science, Technology and Research), Gaurav Pandey(Icahn School of Medicine at Mount Sinai), Christoph M. Friedrich(Dortmund University of Applied Sciences and Arts), Dimitri Perrin(Queensland University of Technology), Clinton Fookes(Queensland University of Technology), Bibo Shi(Duke University), Gerard Cardoso Negrie, Michael Kawczynski(University of California, San Francisco), Kyunghyun Cho(New York University), Can Son Khoo(University College London), Joseph Y. Lo(Duke University), A. Gregory Sorensen, Hwejin Jung(Korea University)
JAMA Network Open
March 2, 2020
Cited by 411Open Access
Full Text

Abstract

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. Design, Setting, and Participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Main Outcomes and Measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Results: Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. Conclusions and Relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.


Related Papers

No related papers found

Powered by citation graph analysis