Y

Yicheng Qiao

Tsinghua University

Publishes on Advanced Neural Network Applications, Advanced Image and Video Retrieval Techniques, Autonomous Vehicle Technology and Safety. 10 papers and 257 citations.

10Publications
257Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

FMDNet: Feature-Attention-Embedding-Based Multimodal-Fusion Driving-Behavior-Classification Network
Wenzhuo Liu, Jianli Lu, Junbin Liao et al.|IEEE Transactions on Computational Social Systems|2024
Cited by 22

Driving behavior classification is a critical component of social transportation systems and advanced driver assistance systems, and it has gained increasing attention in recent years. Accurate classification algorithms for driving behavior play a significant role in enhancing traffic safety, energy conservation, and related fields. In this article, we propose a novel driving behavior classification network named feature-attention-embedding-based multimodal-fusion driving-behavior-classification network (FMDNet). FMDNet incorporates eight types of data, including acceleration along the x-axis, y-axis, z-axis, roll angle, pitch angle, yaw angle, roadside image, and vehicle speed, to classify driving behavior. To effectively fuse features extracted from different modalities, taking into account their varying importance, we introduce the feature attention embedding-based fusion module (FAEF) as our fusion strategy. This fusion strategy enhances the network's capability to capture meaningful features by incorporating two feature attention embedding units that delve deeper into the interplay between different modes. Furthermore, we provide further validation of the effectiveness of our approach through extensive ablation experiments to investigate and analyze the impact of various modal data on the classification of driving behavior. Our proposed FMDNet achieves state-of-the-art performance on the public UAH-DriveSet dataset, demonstrating its effectiveness with an impressive F1-score of 99.0%. Additionally, the robustness of our model is confirmed on distracted dataset, achieving a remarkable F1-score of 99.7%. The model's outstanding performance on both the UAH-DriveSet dataset and the distracted-dataset highlights its capabilities and potential for real-world applications. <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Wenzhuo-Liu/FMDNet</uri>

SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images
Yicheng Qiao, Wei Liu, Bin Liang et al.|Unknown|2023
Cited by 7

With the development of remote sensing, semantic segmentation of high-resolution remote sensing images (RSIs) is increasingly essential. At the same time, the characteristics of objects in RSIs, such as large size, variation in object scales, and complex details, make it necessary to capture both long-range context and local information. There are some methods such as Fully Convolutional Networks (FCN) and Pyramid Scene Parsing Network (PSPNet) lack the ability to capture long-range dependencies, due to the limited receptive field of Convolutional Neural Network (CNN). However, the self-attention mechanism to capture the correlation between pixels in Transformer models has remarkable capability in capturing long-range context. One of the most outstanding Transformer models is the Masked-attention Mask Transformer (Mask2Former) which adopts the mask classification method. We propose a model SeMask-Mask2Former with boundary loss. Semantically Masked (Se-Mask) is the model's backbone and Mask2Former is the decoder. Concretely, the mask classification that generates one or even more masks for specific categories to perform the elaborate segmentation is especially suitable for handling the characteristic of large within-class and small inter-class variance of RSIs. Above all, extensive experimental results show that SeMask-Mask2Former obtains better results in semantic segmentation of high-resolution RSIs on the ISPRS Potsdam dataset compared to CNN-based methods and other state-of-the-art transformer-based methods. Extensive ablation studies conducted on the Potsdam dataset verifies the contribution of each component or optimization strategy in SeMask-Mask2Former.

DLAFNet: Direct LiDAR-Aerial Fusion Network for Semantic Segmentation of 2-D Aerial Image and 3-D LiDAR Point Cloud
Wei Liu, He Wang, Yicheng Qiao et al.|IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing|2024
Cited by 4Open Access

High-resolution remote sensing image segmentation has advanced significantly with 2-D convolutional neural networks and transformer-based models like SegFormer and Swin Transformer. Concurrently, the rapid development of 3-D convolution techniques has driven advancements in methods like PointNet and Kernel Point Convolution for 3-D LiDAR point cloud segmentation. Traditional fusion of aerial imagery and LiDAR data often relies on digital surface models or other features extracted from LiDAR point clouds, incorporating them as depth channels into image data. In this article, we propose a novel approach called Direct LiDAR-Aerial Fusion Network, which directly integrates multispectral images (RGB) and LiDAR point cloud data for semantic segmentation. Experiments on the modified GRSS18 dataset demonstrate that our method achieves an overall accuracy (OA) of 79.88%, outperforming conventional approaches. By fusing RGB and LiDAR features, our technique improves OA by 1.77% and mean Intersection over Union by 0.83%.