Why rankings of biomedical image analysis competitions should be interpreted with care

Lena Maier‐Hein(German Cancer Research Center), Matthias Eisenmann(German Cancer Research Center), Annika Reinke(German Cancer Research Center), Sinan Onogur(German Cancer Research Center), Marko Stankovic(German Cancer Research Center), Patrick Godau(German Cancer Research Center), Tal Arbel(McGill University), Hrvoje Bogunović(Christian Doppler Laboratory for Thermoelectricity), Andrew P. Bradley(Queensland University of Technology), Aaron Carass(Johns Hopkins University), Carolin Feldmann(German Cancer Research Center), Alejandro F. Frangi(University of Leeds), Peter M. Full(German Cancer Research Center), Bram van Ginneken(Radboud University Nijmegen), Allan Hanbury(TU Wien), Katrin Honauer(Heidelberg University), Michal Kozubek(Masaryk University), Bennett A. Landman(Vanderbilt University), Keno März(German Cancer Research Center), Oskar Maier(University of Lübeck), Klaus Maier‐Hein(German Cancer Research Center), Bjoern Menze(Technical University of Munich), Henning Müller(HES-SO University of Applied Sciences and Arts Western Switzerland), Peter Neher(German Cancer Research Center), Wiro J. Niessen(Erasmus MC), Nasir Rajpoot(University of Warwick), G Sharp(Massachusetts General Hospital), Korsuk Sirinukunwattana(University of Oxford), Stefanie Speidel(National Center for Tumor Diseases), Christian Stock(German Cancer Research Center), Danail Stoyanov(University College London), Abdel Aziz Taha(Research Studios Austria), Fons van der Sommen(Eindhoven University of Technology), Ching‐Wei Wang(National Taiwan University of Science and Technology), Marc-André Weber(University of Rostock), Guoyan Zheng(University of Bern), Pierre Jannin(Inserm), Annette Kopp‐Schneider(German Cancer Research Center)
Nature Communications
November 30, 2018
Cited by 362Open Access
Full Text

Abstract

International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.


Related Papers