A Dempster-Shafer Approach to Trustworthy AI With Application to Fetal Brain MRI Segmentation

Lucas Fidon; Michaël Aertsen; Florian Kofler; Andrea Bink; Anna L. David; Thomas Deprest; Doaa Emam; Frédéric Guffens; András Jakab; Gregor Kasprian; Patric Kienast; Andrew Melbourne; Bjoern Menze; Nada Mufti; Ivana Pogledić; Daniela Prayer; Marlene Stuempflen; Esther Van Elslander; Sébastien Ourselin; Jan Deprest; Tom Vercauteren

doi:10.1109/tpami.2023.3346330

A Dempster-Shafer Approach to Trustworthy AI With Application to Fetal Brain MRI Segmentation

Lucas Fidon(King's College London), Michaël Aertsen(KU Leuven), Florian Kofler(Technical University of Munich), Andrea Bink(University of Zurich), Anna L. David(Institute for Reproductive Health), Thomas Deprest(KU Leuven), Doaa Emam(Tanta University Hospital), Frédéric Guffens(King's College London), András Jakab(University of Zurich), Gregor Kasprian(Medical University of Vienna), Patric Kienast(Medical University of Vienna), Andrew Melbourne(King's College London), Bjoern Menze(Technical University of Munich), Nada Mufti(King's College London), Ivana Pogledić(Medical University of Vienna), Daniela Prayer(Medical University of Vienna), Marlene Stuempflen(Medical University of Vienna), Esther Van Elslander(KU Leuven), Sébastien Ourselin(King's College London), Jan Deprest(KU Leuven), Tom Vercauteren(King's College London)

IEEE Transactions on Pattern Analysis and Machine Intelligence

January 10, 2024

10.1109/tpami.2023.3346330

Cited by 35Open Access

Full Text

Abstract

Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of four backbone AI models for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.

Related Papers

No related papers found

Powered by citation graph analysis