NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

Mohamed Amgad; Lamees A Atteya; Hagar Hussein; Kareem Hosny Mohammed; Ehab Hafiz; Maha AT Elsebaie; Ahmed M. Alhusseiny; Mohamed Atef AlMoslemany; Abdelmagid M. Elmatboly; Philip A. Pappalardo; Rokia Sakr; Pooya Mobadersany; Ahmad Rachid; Anas M. Saad; Ahmad Mahmoud Alkashash; Inas A. Ruhban; Anas Alrefai; Nada M. Elgazar; Ali Abdulkarim; Abo-Alela Farag; Amira Etman; Ahmed G. Elsaeed; Yahya Alagha; Yomna A. Amer; Ahmed M. Raslan; Menatalla K. Nadim; Mai Alaaeldin Temraz Elsebaie; Ahmed Ayad; Liza E. Hanna; Ahmed Gadallah; Mohamed Elkady; Bradley Drumheller; David L. Jaye; David Manthey; David A. Gutman; Habiba Elfandy; Lee Cooper

doi:10.1093/gigascience/giac037

NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer

Mohamed Amgad(Northwestern University), Lamees A Atteya(Ministry of Health and Population), Hagar Hussein(Nasser Institute hospital), Kareem Hosny Mohammed(University of Pennsylvania), Ehab Hafiz(Theodor Bilharz Research Institute), Maha AT Elsebaie(Ain Shams University), Ahmed M. Alhusseiny(Baystate Medical Center), Mohamed Atef AlMoslemany(Menoufia University), Abdelmagid M. Elmatboly(Al-Azhar University), Philip A. Pappalardo(George Mason University), Rokia Sakr(Menoufia University), Pooya Mobadersany(Northwestern University), Ahmad Rachid(Ain Shams University), Anas M. Saad(Cleveland Clinic), Ahmad Mahmoud Alkashash(Indiana University – Purdue University Indianapolis), Inas A. Ruhban(Damascus University), Anas Alrefai(Ain Shams University), Nada M. Elgazar(Mansoura University), Ali Abdulkarim(Cairo University), Abo-Alela Farag(Ain Shams University), Amira Etman(Menoufia University), Ahmed G. Elsaeed(Mansoura University), Yahya Alagha(Cairo University), Yomna A. Amer(Menoufia University), Ahmed M. Raslan(Menoufia University), Menatalla K. Nadim(Ain Shams University), Mai Alaaeldin Temraz Elsebaie(Ain Shams University), Ahmed Ayad(Hematology Oncology Consultants), Liza E. Hanna(Nasser Institute hospital), Ahmed Gadallah(Ain Shams University), Mohamed Elkady(Riverside Technology (United States)), Bradley Drumheller(Emory University), David L. Jaye(Emory University), David Manthey(Kitware (United States)), David A. Gutman(Emory University), Habiba Elfandy(National Cancer Institute), Lee Cooper(Northwestern University)

GigaScience

January 1, 2022

10.1093/gigascience/giac037

Cited by 104Open Access

Full Text

Abstract

BACKGROUND: Deep learning enables accurate high-resolution mapping of cells and tissue structures that can serve as the foundation of interpretable machine-learning models for computational pathology. However, generating adequate labels for these structures is a critical barrier, given the time and effort required from pathologists. RESULTS: This article describes a novel collaborative framework for engaging crowds of medical students and pathologists to produce quality labels for cell nuclei. We used this approach to produce the NuCLS dataset, containing >220,000 annotations of cell nuclei in breast cancers. This builds on prior work labeling tissue regions to produce an integrated tissue region- and cell-level annotation dataset for training that is the largest such resource for multi-scale analysis of breast cancer histology. This article presents data and analysis results for single and multi-rater annotations from both non-experts and pathologists. We present a novel workflow that uses algorithmic suggestions to collect accurate segmentation data without the need for laborious manual tracing of nuclei. Our results indicate that even noisy algorithmic suggestions do not adversely affect pathologist accuracy and can help non-experts improve annotation quality. We also present a new approach for inferring truth from multiple raters and show that non-experts can produce accurate annotations for visually distinctive classes. CONCLUSIONS: This study is the most extensive systematic exploration of the large-scale use of wisdom-of-the-crowd approaches to generate data for computational pathology applications.

Related Papers

No related papers found

Powered by citation graph analysis