Central South University
ORCID: 0009-0005-8404-9489Publishes on Advanced Neural Network Applications, Recommender Systems and Techniques, Smart Agriculture and AI. 49 papers and 258 citations.
Add your photo, update your bio, and get notified when your ranking changes.
Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation . This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information , while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released. • We first utilize conditional diffusion with GPT prompts to generate diverse images for augmenting WSSS with image-level labels. • We introduce an image selection strategy to retain high-quality data and filter out noisy images, preventing negative training impact. • We propose data source tokens to distinguish original and generated images, enriching ViT tokens for better adaptation. • Our framework outperforms SOTA, achieving a 5.3% segmentation improvement on PASCAL VOC 2012 with 5% training data.
Cross-Domain Sequential Recommendation (CDSR) aims to predict future user interactions based on historical interactions across multiple domains. The key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions. In this paper, we propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences. Our approach integrates a frozen CLIP model to generate image embeddings, enriching original item embeddings with visual data from both intra-sequence and inter-sequence interactions. Additionally, we employ a multiple attention layer to capture cross-domain interests, enabling joint learning of single-domain and cross-domain user preferences. To validate the effectiveness of IFCDSR, we re-partitioned four e-commerce datasets and conducted extensive experiments. Results demonstrate that IFCDSR significantly outperforms existing methods.
With the advancement of social life, the aging of building walls has become an unavoidable phenomenon. Due to the limited efficiency of manually detecting cracks, it is especially necessary to explore intelligent detection techniques. Currently, deep learning has garnered growing attention in crack detection, leading to the development of numerous feature learning methods. Although the technology in this area has been progressing, it still faces problems such as insufficient feature extraction and instability of prediction results. To address the shortcomings in the current research, this paper proposes a new Adaptive Attention-Enhanced Yolo. The method employs a Swin Transformer-based Cross-Stage Partial Bottleneck with a three-convolution structure, introduces an adaptive sensory field module in the neck network, and processes the features through a multi-head attention structure during the prediction process. The introduction of these modules greatly improves the performance of the model, thus effectively improving the precision of crack detection.