AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?Jun Ma, Yao Zhang, Song Gu et al.|IEEE Transactions on Pattern Analysis and Machine Intelligence|2021 With the unprecedented developments in deep learning, automatic segmentation of main abdominal organs seems to be a solved problem as state-of-the-art (SOTA) methods have achieved comparable results with inter-rater variability on many benchmark datasets. However, most of the existing abdominal datasets only contain single-center, single-phase, single-vendor, or single-disease cases, and it is unclear whether the excellent performance can generalize on diverse datasets. This paper presents a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods.
Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challengeJun Ma, Yao Zhang, Song Gu et al.|Medical Image Analysis|2022 Automatic segmentation of abdominal organs in CT scans plays an important role in clinical practice. However, most existing benchmarks and datasets only focus on segmentation accuracy, while the model efficiency and its accuracy on the testing cases from different medical centers have not been evaluated. To comprehensively benchmark abdominal organ segmentation methods, we organized the first Fast and Low GPU memory Abdominal oRgan sEgmentation (FLARE) challenge, where the segmentation methods were encouraged to achieve high accuracy on the testing cases from different medical centers, fast inference speed, and low GPU memory consumption, simultaneously. The winning method surpassed the existing state-of-the-art method, achieving a 19× faster inference speed and reducing the GPU memory consumption by 60% with comparable accuracy. We provide a summary of the top methods, make their code and Docker containers publicly available, and give practical suggestions on building accurate and efficient abdominal organ segmentation models. The FLARE challenge remains open for future submissions through a live platform for benchmarking further methodology developments at https://flare.grand-challenge.org/.
Unleashing the strengths of unlabelled data in deep learning-assisted pan-cancer abdominal organ quantification: the FLARE22 challengeJun Ma, Yao Zhang, Song Gu et al.|The Lancet Digital Health|2024 Deep learning has shown great potential to automate abdominal organ segmentation and quantification. However, most existing algorithms rely on expert annotations and do not have comprehensive evaluations in real-world multinational settings. To address these limitations, we organised the FLARE 2022 challenge to benchmark fast, low-resource, and accurate abdominal organ segmentation algorithms. We first constructed an intercontinental abdomen CT dataset from more than 50 clinical research groups. We then independently validated that deep learning algorithms achieved a median dice similarity coefficient (DSC) of 90·0% (IQR 87·4-91·3%) by use of 50 labelled images and 2000 unlabelled images, which can substantially reduce manual annotation costs. The best-performing algorithms successfully generalised to holdout external validation sets, achieving a median DSC of 89·4% (85·2-91·3%), 90·0% (84·3-93·0%), and 88·5% (80·9-91·9%) on North American, European, and Asian cohorts, respectively. These algorithms show the potential to use unlabelled data to boost performance and alleviate annotation shortages for modern artificial intelligence models.