Unleashing the strengths of unlabelled data in deep learning-assisted pan-cancer abdominal organ quantification: the FLARE22 challenge

Jun Ma(University Health Network), Yao Zhang(Lenovo (China)), Song Gu(Nanjing University of Science and Technology), Ge Cheng(Ocean University of China), Shihao Mae(University Health Network), Adamo Young(Vector Institute), Cheng Zhu(Medical Technologies (Czechia)), Xin Yang(Shenzhen University Health Science Center), Kangkang Meng(University of Science and Technology Beijing), Ziyan Huang(Shanghai Institute of Hematology), Fan Zhang(Fosun Pharma (China)), YuanKe Pan(Shenzhen Technology University), Shoujin Huang(Shenzhen Technology University), Jiacheng Wang(Xiamen University), Mingze Sun(Tsinghua–Berkeley Shenzhen Institute), Rongguo Zhang(Capital Normal University), Dengqiang Jia(Department of Health), Jae Won Choi(Seoul National University Hospital), Natália Alves(Radboud University Nijmegen), Bram de Wilde(Radboud University Nijmegen), Gregor Koehler(German Cancer Research Center), Haoran Lai(Southern Medical University), Ershuai Wang(Shenzhen Metro (China)), Manuel Wiesenfarth(German Cancer Research Center), Qiongjie Zhu(University of Shanghai for Science and Technology), Guoqiang Dong(The Second Affiliated Hospital of Bengbu Medical College), Jian He(Nanjing Drum Tower Hospital), Junjun He(University Health Network), Hua Yang, Bingding Huang, Mengye Lyu, Yongkang Ma, Heng Guo, Weixin Xu, Klaus Maier-Hein, Yajun Wu, Bo Wang(University Health Network)
The Lancet Digital Health
October 23, 2024
Cited by 100Open Access
Full Text

Abstract

Deep learning has shown great potential to automate abdominal organ segmentation and quantification. However, most existing algorithms rely on expert annotations and do not have comprehensive evaluations in real-world multinational settings. To address these limitations, we organised the FLARE 2022 challenge to benchmark fast, low-resource, and accurate abdominal organ segmentation algorithms. We first constructed an intercontinental abdomen CT dataset from more than 50 clinical research groups. We then independently validated that deep learning algorithms achieved a median dice similarity coefficient (DSC) of 90·0% (IQR 87·4-91·3%) by use of 50 labelled images and 2000 unlabelled images, which can substantially reduce manual annotation costs. The best-performing algorithms successfully generalised to holdout external validation sets, achieving a median DSC of 89·4% (85·2-91·3%), 90·0% (84·3-93·0%), and 88·5% (80·9-91·9%) on North American, European, and Asian cohorts, respectively. These algorithms show the potential to use unlabelled data to boost performance and alleviate annotation shortages for modern artificial intelligence models.


Related Papers

No related papers found

Powered by citation graph analysis