AutoMorph: Automated Retinal Vascular Morphology Quantification Via a Deep Learning PipelineYukun Zhou, Siegfried K. Wagner, Mark A. Chia et al.|Translational Vision Science & Technology|2022 Purpose: To externally validate a deep learning pipeline (AutoMorph) for automated analysis of retinal vascular morphology on fundus photographs. AutoMorph has been made publicly available, facilitating widespread research in ophthalmic and systemic diseases. Methods: AutoMorph consists of four functional modules: image preprocessing, image quality grading, anatomical segmentation (including binary vessel, artery/vein, and optic disc/cup segmentation), and vascular morphology feature measurement. Image quality grading and anatomical segmentation use the most recent deep learning techniques. We employ a model ensemble strategy to achieve robust results and analyze the prediction confidence to rectify false gradable cases in image quality grading. We externally validate the performance of each module on several independent publicly available datasets. Results: The EfficientNet-b4 architecture used in the image grading module achieves performance comparable to that of the state of the art for EyePACS-Q, with an F1-score of 0.86. The confidence analysis reduces the number of images incorrectly assessed as gradable by 76%. Binary vessel segmentation achieves an F1-score of 0.73 on AV-WIDE and 0.78 on DR HAGIS. Artery/vein scores are 0.66 on IOSTAR-AV, and disc segmentation achieves 0.94 in IDRID. Vascular morphology features measured from the AutoMorph segmentation map and expert annotation show good to excellent agreement. Conclusions: AutoMorph modules perform well even when external validation data show domain differences from training data (e.g., with different imaging devices). This fully automated pipeline can thus allow detailed, efficient, and comprehensive analysis of retinal vascular morphology on color fundus photographs. Translational Relevance: By making AutoMorph publicly available and open source, we hope to facilitate ophthalmic and systemic disease research, particularly in the emerging field of oculomics.
Domain-Adaptive Few-Shot LearningExisting few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples. However, in practice, this assumption is often invalid -the target classes could come from a different domain. This poses an additional challenge of domain adaptation (DA) with few training samples. In this paper, the problem of domain-adaptive few-shot learning (DA-FSL) is tackled, which is expected to have wide use in real-world scenarios and requires solving FSL and DA in a unified framework. To this end, we propose a novel domain-adversarial prototypical network (DAPN) model. It is designed to address a specific challenge in DA-FSL: the DA objective means that the source and target data distributions need to be aligned, typically through a shared domain-adaptive feature embedding space; but the FSL objective dictates that the target domain per class distribution must be different from that of any source domain class, meaning aligning the distributions across domains may harm the FSL performance. How to achieve global domain distribution alignment whilst maintaining source/target per-class discriminativeness thus becomes the key. Our solution is to explicitly enhance the source/target per-class separation before domain-adaptive feature embedding learning, to alleviate the negative effect of domain alignment on FSL. Extensive experiments show that our DAPN outperforms the state-of-the-arts. The code is available at https://github.com/dingmyu/DAPN.
Zero and Few Shot Learning With Semantic Feature Synthesis and Competitive LearningJiechao Guan, Zhiwu Lu, Tao Xiang et al.|IEEE Transactions on Pattern Analysis and Machine Intelligence|2020 Zero-shot learning (ZSL) is made possible by learning a projection function between a feature space and a semantic space (e.g., an attribute space). Key to ZSL is thus to learn a projection that is robust against the often large domain gap between the seen and unseen class domains. In this work, this is achieved by unseen class data synthesis and robust projection function learning. Specifically, a novel semantic data synthesis strategy is proposed, by which semantic class prototypes (e.g., attribute vectors) are used to simply perturb seen class data for generating unseen class ones. As in any data synthesis/hallucination approach, there are ambiguities and uncertainties on how well the synthesised data can capture the targeted unseen class data distribution. To cope with this, the second contribution of this work is a novel projection learning model termed competitive bidirectional projection learning (BPL) designed to best utilise the ambiguous synthesised data. Specifically, we assume that each synthesised data point can belong to any unseen class; and the most likely two class candidates are exploited to learn a robust projection function in a competitive fashion. As a third contribution, we show that the proposed ZSL model can be easily extended to few-shot learning (FSL) by again exploiting semantic (class prototype guided) feature synthesis and competitive BPL. Extensive experiments show that our model achieves the state-of-the-art results on both problems.
Face-Focused Cross-Stream Network for Deception Detection in VideosAutomated deception detection (ADD) from real-life videos is a challenging task. It specifically needs to address two problems: (1) Both face and body contain useful cues regarding whether a subject is deceptive. How to effectively fuse the two is thus key to the effectiveness of an ADD model. (2) Real-life deceptive samples are hard to collect; learning with limited training data thus challenges most deep learning based ADD models. In this work, both problems are addressed. Specifically, for face-body multimodal learning, a novel face-focused cross-stream network (FFCSN) is proposed. It differs significantly from the popular two-stream networks in that: (a) face detection is added into the spatial stream to capture the facial expressions explicitly, and (b) correlation learning is performed across the spatial and temporal streams for joint deep feature learning across both face and body. To address the training data scarcity problem, our FFCSN model is trained with both meta learning and adversarial learning. Extensive experiments show that our FFCSN model achieves state-of-the-art results. Further, the proposed FFCSN model as well as its robust training strategy are shown to be generally applicable to other human-centric video analysis tasks such as emotion recognition from user-generated videos.
Aircraft Recognition Based on Landmark Detection in Remote Sensing ImagesAn Zhao, Kun Fu, Siyue Wang et al.|IEEE Geoscience and Remote Sensing Letters|2017 Aircraft type recognition of remote sensing images is critical both in civil and military applications. In this letter, we propose a novel landmark-based aircraft recognition method which is highly accurate and efficient. First, we propose a new idea to address the aircraft type recognition problem by aircraft's landmark detection. Its advantages are two folds. On the one hand, it needs fewer labeled data and alleviates the work of human annotation. On the other hand, a trained model has strong expansibility because it can be used for any type of aircraft that not contained in training data set without retraining. Then, we use a variant of a convolutional neural network called vanilla network for all landmarks regression at the same time. Therefore, it can avoid bad local minimum effectively by encoding the geometric constraints among landmarks implicitly. To handle aircrafts in myriads of poses, rotation jittering is used for data augmentation in preprocessing and multicrop fusion is used in postprocessing. Thus, an 80% reduction in error rate could be reached. Finally, we use the landmark template matching to recognize the aircraft. Our method shows a competitive performance both in accuracy and efficiency.