Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFsPredicting the depth (or surface normal) of a scene from single monocular color images is a challenging task. This paper tackles this challenging and essentially underdetermined problem by regression on deep convolutional neural network (DCNN) features, combined with a post-processing refining step using conditional random fields (CRF). Our framework works at two levels, super-pixel level and pixel level. First, we design a DCNN model to learn the mapping from multi-scale image patches to depth or surface normal values at the super-pixel level. Second, the estimated super-pixel depth or surface normal is refined to the pixel level by exploiting various potentials on the depth or surface normal map, which includes a data term, a smoothness term among super-pixels and an auto-regression term characterizing the local structure of the estimation map. The inference problem can be efficiently solved because it admits a closed-form solution. Experiments on the Make3D and NYU Depth V2 datasets show competitive results compared with recent state-of-the-art methods.
Multi-scale 3D deep convolutional neural network for hyperspectral image classificationResearch in deep neural network (DNN) and deep learning has great progress for 1D (speech), 2D (image) and 3D (3D-object) recognition/classification problems. As HSI that with 2D spatial and 1D spectral information is quite different from 3D object image, the existing DNN cannot be directly extended to hyperspectral image (HSI) classification. A Multiscale 3D deep convolutional neural network (M3D-DCNN) is proposed for HSI classification, which could jointly learn both 2D Multi-scale spatial feature and 1D spectral feature from HSI data in an end-to-end approach, promising to achieve better results with large-scale dataset. Although without any hand-craft features or pre/post-processing like PCA, sparse coding etc, we achieve the state-of-the-art results on the standard datasets, which shows the technical validity and advancement of our method.
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNNWe present an image classification based approach to large scale action recognition from 3D skeleton videos. Firstly, we map the 3D skeleton videos to color images, where the transformed action images are translation-scale invariance and dataset independent. Secondly, we propose a multi-scale deep convolutional neural network (CNN) for the image classification task, which could enhance the temporal frequency adjustment of our model. Even though the action images are very different from natural images, the fine-tune strategy still works well. Finally, we exploit various kinds of data augmentation methods to improve the generalization ability of the network. Experimental results on the largest and most challenging benchmark NTU RGB-D dataset show that our method achieves the state-of-the-art performance and outperforms other methods by a large margin.
Ship Detection From Optical Satellite Images Based on Sea Surface AnalysisGuang Yang, Bo Li, Shufan Ji et al.|IEEE Geoscience and Remote Sensing Letters|2013 Automatic ship detection in high-resolution optical satellite images with various sea surfaces is a challenging task. In this letter, we propose a novel detection method based on sea surface analysis to solve this problem. The proposed method first analyzes whether the sea surface is homogeneous or not by using two new features. Then, a novel linear function combining pixel and region characteristics is employed to select ship candidates. Finally, Compactness and Length-width ratio are adopted to remove false alarms. Specifically, based on the sea surface analysis, the proposed method cannot only efficiently block out no-candidate regions to reduce computational time, but also automatically assign weights for candidate selection function to optimize the detection performance. Experimental results on real panchromatic satellite images demonstrate the detection accuracy and computational efficiency of the proposed method.
A Multiscale Framework With Unsupervised Learning for Remote Sensing Image RegistrationYuanxin Ye, Tengfeng Tang, Bai Zhu et al.|IEEE Transactions on Geoscience and Remote Sensing|2022 Registration for multisensor or multimodal image pairs with a large degree of distortions is a fundamental task for many remote sensing applications. To achieve accurate and low-cost remote sensing image registration, we propose a multiscale framework with unsupervised learning, named MU-Net. Without costly ground truth labels, MU-Net directly learns the end-to-end mapping from the image pairs to their transformation parameters. MU-Net stacks several deep neural network (DNN) models on multiple scales to generate a coarse-to-fine registration pipeline, which prevents the backpropagation from falling into a local extremum and resists significant image distortions. We design a novel loss function paradigm based on structural similarity, which makes MU-Net suitable for various types of multimodal images. MU-Net is compared with traditional feature-based and area-based methods, as well as supervised and other unsupervised learning methods on the optical-optical, optical-infrared, optical-synthetic aperture radar (SAR), and optical-map datasets. Experimental results show that MU-Net achieves more comprehensive and accurate registration performance between these image pairs with geometric and radiometric distortions. We share the code implemented by Pytorch at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/yeyuanxin110/MU-Net</uri> .